Parallel vs. Concurrent
Thursday, 8. October 2009, 04:01:55
When speaking in general terms, both can be used interchangeably in many cases. The expression "running in parallel" is especially troublesome because it simply means that multiple things are executing at the same time. There's a reason for this use, but we can back up a little and start simple.
Parallel usually means several instances of the same thing executing at the same time (usually independently of each other).
Concurrency usually means several things executing at the same time and working together toward a common goal.
Where it gets blurry is that concurrency can use parallelism. For example, all the partitions at a certain level in QuickSort can be sorted in parallel. But the overall algorithm would be a concurrent one. So each partition sorting algorithm would run in parallel while the QuickSort algorithm itself is a concurrent one (because it uses the output of each partition sorting algorithm together toward a common goal). And note how each partition sorting instance is independent from each other. They each operate on different data. Yet, in a weird case of irony, we still call the overall algorithm Parallel QuickSort.
As to concurrent programming, it doesn't need to use parallelism. P2P software is concurrent and distributed. Each client can be different and they would still work together.
Where it gets even MORE blurry is when talking about different levels with respect to your point of view and when parallel threads start to communicate with each other. When you don't know if the next lower level is the same or not, you usually just say "in parallel". For example, when multiple processors are executing, one would say they are running in parallel. Can you say they are executing concurrently? Well, we don't know if they're working toward a common goal.
At the next level, one would say that threads are executing in parallel if you're designing a scheduler for example, where you treat all threads the same. But the application developer would see those threads differently and say they are executing concurrently because they were designed to work together by exchanging information (if that was actually the design). At the instruction level, vector instructions blur the situation even more since they operate on independent data, but your program will likely use all the results together. At the same time, no one says they have a concurrent program (or a parallel one for that matter) just because they use vector instructions.
So if concurrency is when multiple things work together and parallelism is when those instances are identical and independent of each other, what do we call instances that are different from each other and are also independent? Depends on your POV. Do you care WHAT those threads are doing? If so, it's concurrent. If not, it's parallel. And what about threads that work together, but are identical in code? Usually concurrent, but parallel would be acceptable too.
I hope the reader is seeing where I'm going with this. We shouldn't be TOO fussy with these definitions, but I've seen too many flat out wrong uses of these terms. For example, determinism has nothing to do with the definition of parallel and concurrent. Is determinism important with respect to the order of execution (and how it affects your data)? Of course. But it's not what defines the two terms at hand. Another example of bad usage of these terms is saying that using threads is using concurrency (or parallelism depending on who you ask). Threads can execute in parallel or concurrently. Depends on what you're looking at. Depends on the context.
This isn't a problem restricted to these definitions alone. It's an indication of the larger problem of only being able to have one point of view. In fact, it's my belief that it's one and the same problem.
When dealing with things like encapsulation, I was stunned to find out that programmers only dealt with a single level. It boggled my mind. So all one had to do was declare something as implementation and all rules about encapsulation were out the window. My view was that one level's interface is another level's implementation. So you need to organize your code so that it's clear what uses whatever else and encapsulate appropriately at each level. Project V has no choice in the matter. You have to define an implementation by using interfaces (unless building native components). Built-in (and native) components are the only entities that will be pure implementation because you cannot go any lower in detail.
Unfortunately, this seems like a difficult concept to grasp by a great deal many people. They are even actively repulsed by it. That there must be something inherently wrong with this setup. This repulsion is what I've said from long ago that programmers don't deal with abstractions, but rather deal in concrete. Every little detail must be implemented. What I'm trying to do is not have to do this anymore. That we can have libraries of components available for different levels instead of just the standard libraries and what the language gives us. Libraries can only deal with the leaf part of the execution chain. Even if you believe you have multiple levels, a library function must handle the last levels, no matter how many. Sure, you can have callbacks or pass functions around, but that's just another version of the same thing.
I suck at analogies, so get ready. Think of a pyramid. What I want to do is be able to swap out any layer without having the rest of the pyramid be affected. So we can have libraries of components that only deal with that one layer. How does this work? By having the library implement the interfaces found at the next higher level. But the implementation (the algorithm itself) would only use interfaces found at lower levels. Hopefully, most of the components will use interfaces from the immediately next lower level. With current code, we can't do this because every function states in explicit terms what functions to invoke (and often what objects to use). This causes the execution path to be defined all the way down to the leaf node (because the invoked functions will also list its invoked functions and so on). Even when using interfaces, the customary use is simply to be able to invoke the same function in different objects. It's not used the way I'm thinking about. I want to actually be able to replace the implementation all together with a faster, more efficient or otherwise better version (especially at runtime while it's executing).
To be clearer.
1. Current techniques involve defining an interface so that the same function in different objects can do DIFFERENT things.
2. I want different implementations of the same interface to do exactly the SAME thing no matter what.
What does this have to do with parallelism and concurrency? Well, my main objective was to demonstrate how point of view is the main obstacle. We need more tools and power to be able to go up or down to different levels. Blocking access isn't working for me. I don't think it works for a lot of people. Another point is that being able to go at any level means that the execution of your software may also be done at any level. Just imagine. If there are certain high level components that become so prevalent as to be found in a significant portion of software, they could be included directly in hardware. And all software using this component would automatically start using this new implementation. But less powerful machines would still be able to execute the software, only at a lower level. It'd also enable the automatic use of completely different processors within the same machine (as long as the hardware allows this).
I'll stop here, but there is so much more involved in what makes computing possible, especially when talking about parallel and concurrent programming. What people often forget is that it's not the data processing that's an obstacle to parallel processing, it's connecting the data from one operation to the next. In fact, you only need one data processing instruction and one conditional instruction on any given processor to compute anything that is theoretically computable. The rest of the instructions are used for shuffling data around. Once you see it with this point of view, the terms parallel and concurrent become rather moot. Threads themselves become pointless.



How to use Quote function: