Skip navigation.

Ambassador

The Royal C++ Embassy

Posts tagged with "garbage collection"

Concurrency, garbage collection and C++, a naive approach - 2

, ,

Lock-free, continued
I have realized that my previous post sounds a bit scientific, now I will try to sum up and bind result to C++.

We have got couple of works, scientific works, upon lock-free containers and data structures, some of them are already patented. These are, however, still too complicated and very limited.

Hardware support
Today, Intel i386 family processors facilitates some atomic operations in three ways. My understanding:
  • Prefixing instructions with LOCK may not assert a LOCK# signal on bus, but locks cache on the CPU, if requested area is already in the cache.
  • CPUs guarantee cache coherence through a special protocol (only on P6)
  • CPU asserts LOCK# signal on bus, modifies memory.

(By the way, I have studied electronics engineering, not computer science)

Bottom line is that we need hardware support to perform lock-free operations correct and quick, and atomic operations are still costly today.

C++ standard, of course, does not mention lock-free techniques. I happen to believe it will not mention it in upcoming standards, too. Because the subject is not in language's scope, in my opinion.

Back to threads
Thread synchronization, as very briefly discussed here, is part of our lives if we have to modify shared data in a multi-threaded environment.

Most Win32 programmers already know there are couple of C++ classes on the web, wrapping handles and providing C++-ish ways to deal with threads and synchronization objects. I am not developing multi-threaded applications on UNIX so I don't know existing libraries but as far as I have heard, there are really cool libraries for UNIX as well.

Software people at Intel was working on a library, I couldn't find the library's URL, but there was a discussion on comp.lang.c++.moderated.

Managed run-time environments, or virtual machines, like JVM (Java Virtual Machine) and .NET CLR have sort of "thread object", which boils down to a wrapper (component) over a native thread. I am not a Java expert, and I have heard that not all threads in JVM are native threads - I don't know how true it is, it sounds like fibers scheduled by VM. In fact, everything in Java has to be encapsulated by an object, anyway.

C++ and concurrency
For a couple of years, C++ people discussed language support for concurrency. Herb thinks "concurrency will be the next revolution in software engineering, much like object oriented programming did". C++, as a mainstream language, would like to, not just catching this train but, being a part of the locomotive that brings us to concurrency world. It's not just about wrapper thread classes this time, but proper compiler support, real thread objects, asynchronous functions, lock-free containers, Open MP support (well, not quite sure) and similar stuff are considered to be in upcoming C++ standards.

It's not impossible, but quite difficult to write correct (or reasonably correct) threading libraries today. What we will, hopefully, have are an International standard and compliant compilers that facilitate moving multi-threaded applications to mainstream. In last year's PDC, Herb has given a talk about some concurrency additions to C++.

Garbage Collection
Some mainstream languages and (virtual) platforms today utilize an old invention, called "garbage collection". C++, from standard's point of view, did not have it. There are just a couple of third party libraries working pretty good.

Garbage collection technology is fairly difficult to implement and some C++ people have some concerns; "does GC degrade performance of my C++ application?" "what if I want to do extreme programming and GC restricts my moves?" and a few more, probably arose because of garbage collection implementations they see in the market.

GC is, in its nature, essentially designed for performance concerns. But it introduces some essential problems as well. Today's mainstream managed run-time environments (.NET and JVM) sometimes suffer their compacting garbage collection designs. First problem is that they do not have a deterministic destruction but a non-deterministic finalization. Every application running on VM has at least 2 threads; one for process' main thread and one for finalizer.

How did C++ come today without garbage collection? We've had idioms and patterns, such as smart pointers (e.g. std::auto_ptr, boost::weak_ptr, etc) and reference counting (e.g. COM's IUnknown). These tools and techniques facilitate managing memory.

Compacting GC may improve performance, in particular in long-running applications, because it causes less or even fixes memory fragmentation problems.

First of all, one should not take Java and .NET as "the only GC implementations" around, but should look a bit wider vision. A GC implementation may not collect all objects, but may take it one step further and distinguish "GC types" and "non-GC types" (e.g. "this type of objects are managed by GC"). Non-deterministic finalization is not necessary. As Herb pointed out and implemented in C++/CLI as well is "delete first, collect later".

SomeGCType^ h = new SomeGCType;
// ...
delete h;  // call destructor, reclaim memory later.


In other words: "call destructor deterministically and I don't care when you collect the memory". That is quite suitable for all C++ programmers.

One common misconception is that "GC manages all resources". RAII pattern is widely used in C++ and every C++ programmer knows how and when resources they have acquired will be closed/freed. For a Java or a .NET programmer, well, they don't know/care what is (deterministic) resource management. Finalizers have one "feature"; a finalizer can run 0 or more times and they do weird things in finalizers. Nonetheless, a GC'ed C++, which has features implemented in C++/CLI today (I have mentioned above; distinct types, deterministic destruction), works very cool.

There are some discussions and concerns about garbage collection, but it seems like we will have garbage collection in upcoming C++ standard(s). ISO C++ will most probably have a GC close to C++/CLI design rationale.

Concurrency, garbage collection and C++, a naive approach - 1

, ,

C++, as a mainstream, modern and a strong language, has evolved and is still evolving to deal with today's programming challenges. We have started using garbage collection enabled virtual run-time environments for a more than a decade in mainstream production, although (compacting) garbage collection has been implemented in 1970s. Similarly, parallel computing and concurrency discoveries have been made decades ago. C++ has been invented in early 80s and standardized in 1998. C++ has refused to have garbage collection, probably because people were treating C++ "a better C" or "C with classes" for a long time. C++ has refused threads, locks and similar concepts as well.

I will refer primarily to Herb's thoughts, because I happen to believe he explains both of these concerns very clearly.

Threads
The "thread" word appear exactly once in ISO/IEC 14882:2003 (Programming Languages - C++), in exception handling section. There are some reasons behind it, and they are primarily:
  • Threads are not portable across platforms.
  • Thread is a very low level term (so is pointer, right?)
  • C++ has been used in single-threaded applications, primarily (arguably)
  • Due to portability problems, OS vendors and third-party libraries already provide a C API, and one can build a C++ abstraction layer upon them, if required.


Each system has its own threading mechanism. Some systems may not even honor threads, and go without them. Never mind required hardware support. C++ is a mainstream language on embedded platforms as well. Actually, as far as I know, even .NET people can't do without touching some C++ code on embedded platforms. I know compilers that do not comply with standard, support exceptions, templates and things like those that we use for first-class application development, but they have to go without those due to platform's restrictions and complexity introduced to implement those features in compiler. Since thread is conceptually very low level, it's really hard to standardize it. Because this may require threading and synchronization API to be standard or at least equivalent across operating systems. I think, these were primary reasons why threads have not been mentioned in the standard - well, because thread itself is really hard to standardize.

Thread Synchronization
Theory is simple: one CPU (single core) can execute exactly one instruction at any given point in time. n CPUs can execute n instructions at any given point in time (no pipelining). Our purpose is sharing processing power to execute series of instructions that belong different tasks. Because different tasks, usually, require different things. For example, when performing a slow IO operating, CPU can switch to another task until IO operation completes.

The problem starts when we have some data that we need to share among threads. When we want to change this shared data, we need to guard it from being accessed by other tasks running "concurrently". Therefore, we need to synchronize (some peope say "serialize") access to this shared resource with a set of syncronization objects provided by either operating system or third-party library vendor. These are, today, primarily lock based solutions. They are widely used, as lock based applications are simpler to write. Even with lock based designs, we run into problems.

Lock-based synchronization
Example (Herb's very good example):

If we had a piece of code, say A, and we can prove that this code has implemented locks 100% correctly. If we also had a piece of code, say B, and we can prove that it is also 100% correct. We cannot say the result will be 100% correct, if we use them together; they simply are not composable.


Because code A may take a lock, call something in code B, it takes a lock and calls something in A, which is guarded with a lock and there we have a deadlock.

Locks are simpler to understand, but still hard to understand. In lock based synchronization, we usually make system calls, because it's operating system kernel that deals with locks. "Hey, critical sections are user space objects", I hear. Yes, but critical section's scope is restricted to threads of the same process, and as far as I know, there is no such concept in POSIX standard, so no UNIX variant implements it.

The logic behind the locks are simple (pseudo code):
while (!AcquireLock())
      WaitForRelease();

// .. rest of the code


A piece of code tries to acquire a lock. If it can acquire the lock, it continues execution. Otherwise, it waits (infinitely or arbitrary time or never waits and returns - depends on design and OS support). Code repeats these steps to continue its execution.

Problems with above (or similar, lock based) code:
  • Too many system calls
  • No prioritization
  • Contention
  • Potential deadlock (i.e. task cannot proceed, gets stuck)


People believe system calls, especially on today's modern, fast and cheap hardware, does not cause any performance degredation. I am against this theory. There should always be a performance concern in software - we cannot rely on fast and cheap hardware - this can't go like that and all this breeds is bad design and implementation.

Too many system calls introduce a performance problem. In priority scheduling, when locks are used, prioritization loses its meaning. Some systems perform priority inversion, which simply increases priority of thread that owns a lock. But it's still not the proper solution. If threads try to acquire an already held lock, lock contention begins. The longer the lock is held, the more likely there is another task trying to acquire it.

There are some certain and simple rules in lock based approach and one of them is "never call unknown code while holding a lock", never call virtual functions and others' code. As I mentioned above, quoted from Herb's talk, we cannot prove that two correct implementations of lock-based synchronization are still correct after we use them together. This may lead a deadlock, a lock that will never open because two or more tasks are waiting each other to release a lock that other holds (similarly, one single thread can deadlock itself, too).

In fact, it has never been too simple to write a correct lock based code. Computer scientists then developed some "lock-free" algorithms that do not employ locks. This is what Herb says;

...you need scientists to write correct lock based programs. But you need rocket scientists to write lock-free based programs... everytime you write a piece of lock-free code, you actually prepare a (academic) paper, and expect corrections...



Lock-free Synchronization
There are a couple of approaches to this subject, but I will primarily focus on transactions and simple CAS (compare and swap, or, compare and exchange).

The purpose is solving shared data problem by introducing atomic operations. Atomicity, by putting simply, means that an operation either completes or ends as if it's never begun. In transactional concurrency, which's an idea heavily used in database systems, too; system defines an invariant for end state and if at the end of an operation this invariant is not satisfied, transaction rolls-back (undoes its changes, if made any - never mind catastrophic scenerios) and restarts. Actually, I have got a set of pretty thick books discuss transactions and concurrency and it really is complicated, believe me. However, it's our job to challenge our very own complexities.

A lock free system does not become Alice's world, by just not to use locks, but tossing coins at the gates of "The Matrix" with Agent Smith - whether you should enter or not.

Lock based systems are like Newton mechanics - you can roughly estimate something. Lock free systems are like quantum physics - nothing is certain enough


as Herb says.

A lock-free system does not necessarily mean a wait-free system, although a wait-free system is lock-free by its definition. Transactional concurrency I will mention will try not to deal about catastrophic cases.

A transaction can be seen as an operation that either completes or returns to its initial state, as if it's never begun. There are several approaches to this issue, too. Basically, this is done by getting a snapshot (copy) of the data, modify its copy, check invariants and write it to its original location (commit). If another transaction tries to commit its own changes and sees its invariants, for example target object's value, are not satisfied, rolls back and restarts its operation.
Download Opera, the fastest and most secure browser
December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31