Concurrency, garbage collection and C++, a naive approach - 1
Saturday, 12. August 2006, 00:17:31
I will refer primarily to Herb's thoughts, because I happen to believe he explains both of these concerns very clearly.
Threads
The "thread" word appear exactly once in ISO/IEC 14882:2003 (Programming Languages - C++), in exception handling section. There are some reasons behind it, and they are primarily:
- Threads are not portable across platforms.
- Thread is a very low level term (so is pointer, right?)
- C++ has been used in single-threaded applications, primarily (arguably)
- Due to portability problems, OS vendors and third-party libraries already provide a C API, and one can build a C++ abstraction layer upon them, if required.
Each system has its own threading mechanism. Some systems may not even honor threads, and go without them. Never mind required hardware support. C++ is a mainstream language on embedded platforms as well. Actually, as far as I know, even .NET people can't do without touching some C++ code on embedded platforms. I know compilers that do not comply with standard, support exceptions, templates and things like those that we use for first-class application development, but they have to go without those due to platform's restrictions and complexity introduced to implement those features in compiler. Since thread is conceptually very low level, it's really hard to standardize it. Because this may require threading and synchronization API to be standard or at least equivalent across operating systems. I think, these were primary reasons why threads have not been mentioned in the standard - well, because thread itself is really hard to standardize.
Thread Synchronization
Theory is simple: one CPU (single core) can execute exactly one instruction at any given point in time. n CPUs can execute n instructions at any given point in time (no pipelining). Our purpose is sharing processing power to execute series of instructions that belong different tasks. Because different tasks, usually, require different things. For example, when performing a slow IO operating, CPU can switch to another task until IO operation completes.
The problem starts when we have some data that we need to share among threads. When we want to change this shared data, we need to guard it from being accessed by other tasks running "concurrently". Therefore, we need to synchronize (some peope say "serialize") access to this shared resource with a set of syncronization objects provided by either operating system or third-party library vendor. These are, today, primarily lock based solutions. They are widely used, as lock based applications are simpler to write. Even with lock based designs, we run into problems.
Lock-based synchronization
Example (Herb's very good example):
If we had a piece of code, say A, and we can prove that this code has implemented locks 100% correctly. If we also had a piece of code, say B, and we can prove that it is also 100% correct. We cannot say the result will be 100% correct, if we use them together; they simply are not composable.
Because code A may take a lock, call something in code B, it takes a lock and calls something in A, which is guarded with a lock and there we have a deadlock.
Locks are simpler to understand, but still hard to understand. In lock based synchronization, we usually make system calls, because it's operating system kernel that deals with locks. "Hey, critical sections are user space objects", I hear. Yes, but critical section's scope is restricted to threads of the same process, and as far as I know, there is no such concept in POSIX standard, so no UNIX variant implements it.
The logic behind the locks are simple (pseudo code):
while (!AcquireLock())
WaitForRelease();
// .. rest of the code
A piece of code tries to acquire a lock. If it can acquire the lock, it continues execution. Otherwise, it waits (infinitely or arbitrary time or never waits and returns - depends on design and OS support). Code repeats these steps to continue its execution.
Problems with above (or similar, lock based) code:
- Too many system calls
- No prioritization
- Contention
- Potential deadlock (i.e. task cannot proceed, gets stuck)
People believe system calls, especially on today's modern, fast and cheap hardware, does not cause any performance degredation. I am against this theory. There should always be a performance concern in software - we cannot rely on fast and cheap hardware - this can't go like that and all this breeds is bad design and implementation.
Too many system calls introduce a performance problem. In priority scheduling, when locks are used, prioritization loses its meaning. Some systems perform priority inversion, which simply increases priority of thread that owns a lock. But it's still not the proper solution. If threads try to acquire an already held lock, lock contention begins. The longer the lock is held, the more likely there is another task trying to acquire it.
There are some certain and simple rules in lock based approach and one of them is "never call unknown code while holding a lock", never call virtual functions and others' code. As I mentioned above, quoted from Herb's talk, we cannot prove that two correct implementations of lock-based synchronization are still correct after we use them together. This may lead a deadlock, a lock that will never open because two or more tasks are waiting each other to release a lock that other holds (similarly, one single thread can deadlock itself, too).
In fact, it has never been too simple to write a correct lock based code. Computer scientists then developed some "lock-free" algorithms that do not employ locks. This is what Herb says;
...you need scientists to write correct lock based programs. But you need rocket scientists to write lock-free based programs... everytime you write a piece of lock-free code, you actually prepare a (academic) paper, and expect corrections...
Lock-free Synchronization
There are a couple of approaches to this subject, but I will primarily focus on transactions and simple CAS (compare and swap, or, compare and exchange).
The purpose is solving shared data problem by introducing atomic operations. Atomicity, by putting simply, means that an operation either completes or ends as if it's never begun. In transactional concurrency, which's an idea heavily used in database systems, too; system defines an invariant for end state and if at the end of an operation this invariant is not satisfied, transaction rolls-back (undoes its changes, if made any - never mind catastrophic scenerios) and restarts. Actually, I have got a set of pretty thick books discuss transactions and concurrency and it really is complicated, believe me. However, it's our job to challenge our very own complexities.
A lock free system does not become Alice's world, by just not to use locks, but tossing coins at the gates of "The Matrix" with Agent Smith - whether you should enter or not.
Lock based systems are like Newton mechanics - you can roughly estimate something. Lock free systems are like quantum physics - nothing is certain enough
as Herb says.
A lock-free system does not necessarily mean a wait-free system, although a wait-free system is lock-free by its definition. Transactional concurrency I will mention will try not to deal about catastrophic cases.
A transaction can be seen as an operation that either completes or returns to its initial state, as if it's never begun. There are several approaches to this issue, too. Basically, this is done by getting a snapshot (copy) of the data, modify its copy, check invariants and write it to its original location (commit). If another transaction tries to commit its own changes and sees its invariants, for example target object's value, are not satisfied, rolls back and restarts its operation.










