Absolute Basics of Concurrency Correctness
Monday, 30. November 2009, 00:05:16
So, a correct multi-threaded program implies a correct program in general. But in this post, I will only focus on the part that is specific to correct concurrency.
Getting concurrency right takes knowledge and analytical skills. I am an avid TDD practitioner, and I try to test-drive as much as I possibly can. However, in the world of concurrency we are faced with the problem that some bugs are simply impossible to write a test case for. Sometimes the timings required to surface a bug are just too tight and the chances too small, other times a bug is just does not exist on our specific combination of hardware and runtime.
Therefore, analyzing the code in terms of the concurrency guarantees provided by the underlying platform, is a key element of the path towards concurrency correctness. And it cannot be done without knowledge of the guarantees provided by the platform, whether it be the JVM, CLR or x86.
There are lots of failure modes specific to mutli-threaded programs. Many are quite esoteric, but three things stand out as the most basic problems: shared mutable state, publication and locking.
The first problem of concurrency is shared mutable state. There are cases where you can have deliberate under-synchronized shared mutable state, but those are as few as they are special. There are basically three ways to deal with shared mutable state:
- Make state immutbale
- Don't share state
- Guard the shared mutable state by the same lock (two different locks cannot guard the same piece of state, unless you always use exactly these two locks)
The second problem of concurrency is safe publication. How do you make data available to other threads? How do you perform a hand-over? You will find the solution to this problem in the memory model of your platform and (hopefully) in the API. Java, for instance, has many ways to publish state and the java.util.concurrent package contains tools specifically designed to handle inter-thread communication. Make sure to give this problem ample focus; the bugs produced by unsafe publication can be extremely subtle, and it is easy to gloss over this problem and focus on locking instead.
The third (and harder) problem of concurrency is locking. Mismanaged lock-ordering is the source of dead-locks. Some dead-locks are practically guaranteed to always occur, while others require highly improbable timing. This problem can actually be generalized to everything that prevents a thread from making progress. Blocking operations, waits on conditions, spin-locks - these all have the same potential to stall a thread. And in this highly networked world, two threads don't even have to be on the same machine to dead-lock each other.
You can get pretty far with carefully constructed automatic test cases - and you should, but analysis is by far the most important step towards correctness in a multi-threaded program. However, you need to design and write your code with that in mind, otherwise the complexity of the code can quickly render such an analysis impossible to perform in practice.


