Project V: Test Application For Concurrency
Tuesday, September 11, 2007 4:22:30 AM
What really makes it a good test is that it is conducive to parallelism because you can split up the image in multiple parts, not to mention that the same image needs to be recoded multiple times. And there are multiple stages, so concurrency in the form of pipelining is an inherent part of the algorithm. First, there's the conversion to YUV color space. Second, there's the lifting scheme. Then there's quantization. And finally, there's the encoding. Lifting schemes are perfectly suited for network construction. You can literally lay out the network in an exact representation of the lifting diagram. Quantization is nothing more than filtering a stream. For those of you who like the map functionality, then you'll see the full power of that concept in pure data flow fashion as it was meant to be. I'll probably use a simple encoding scheme, but I'm not sure yet.
Here are the details of all four stages.
1. Conversion. This is a simple RGB to YUV color space conversion. Maybe I'll use floating point just to make things easier. I'm thinking I'll use 32x32 blocks. This is exactly one page of memory (4K) and fits in the cache and both U and V components can fit in another half page.
2. Then we apply the lifting scheme. This is done in two passes. Once for the vertical pass and once for the horizontal pass. I'm not sure if I'll use a transpose operation or find a way to be able to apply operations on a certain axis. Makes sense I think. For the first run, I think it'd be best to keep it simple and just transpose and reuse the same component. This operation can be done in parallel for each 32x32 block.
3. Quantization is an operation that is done in two sub steps. First, it has to find the min and max values of a set of data. Then it alters that data to fit within the specified number of bits. This is the quantization part. This last sub part can be done completely in parallel. So instead of storing number that range from 0-70 in 7 bits, you can decide to use 4 bits and scale all your values accordingly. The max value with 4 bits is 15. So you would scale 0-70 to be 0-15 instead. You need to store the scaling value as well. A third step that can optionally be done is setting all values that are below a cutoff to zero. Usually, the cutoff should not be larger than 2 or 3 and only on the higher frequency bands.
4. Encoding is storing all the values coming out of the quantization step in a certain order. Higher frequency bands should have less meaningful values and should be quantized excessively. Many values will go to zero without any noticeable visual effect. This is how you get compression. When you encode, you encode the lower frequencies first. You will likely always reach a point where the rest of the data is just zeros. So you need only insert and end of block marker instead of storing all the zeros. So first the quantization step reduces the number of bits and then the encoding process reduces the number of values to store.
I'll just used predefined techniques everywhere. I'm not looking for high compression or something terribly useful. Note what is happening here though. Put all together, this will be a conversion application. The entire application can be used as a single component that converts from an Image format to a wavelet compressed format. I have some decoding routines already written, so I'll use that to check if it worked correctly.
I really like this test. All the steps are basic math. The YUV conversion is simply three math equations. The lifting scheme is two really simple equations (high pass and low pass). Quantization is scaling. And for encoding, I think I'll use a predefined pattern and follow that pattern until I reach all zeroes. Basically, all it does is convert an array into a stream and chop off the end. This will require a filtering component to tell where exactly to chop it off.
If Project V can handle this (on multiple machines, some with dual cores), then I'll be fairly happy that the core of the system is good enough to start writing simple applications with it. I'm not going to implement the entire test network all at once. I'll start with one piece at a time. The cool thing is that all parts are invertible. So I may end up implementing decoding components as well just for the heck of it.
I don't think I'll use data redundancy for this test. I may eventually implement a simple caching system with feedback notifications. If machine X send data to machine Y, processes it and then sends it to machine Z, this last machine (Z) will notify the first machine (X) of what it received. This way, machine X can clear some of its cache. It's still not completely safe, but if any ONE machine goes down, the network can keep operating. Note that machine Y will keep a cache also until a further machine sends it feedback. So there are always two copies of the data around. This would also be useful for load monitoring so that heavy processing can be switched around. Lots of cool and easy stuff to do once the basic system is up and running.
If anyone has any further ideas on this little test project, please leave a comment. I'm looking for input. And if you can volunteer a machine, please contact me directly. I'm hoping to be testing within a few weeks. I want to get components done this week and then move on to implementing step 1 next week.


Unregistered user # Tuesday, September 11, 2007 5:38:33 AM