Intel 48 Core Processor (Update: Larrabee canceled)
Thursday, December 3, 2009 3:40:02 AM
I like what they did with the routing. I wonder how it works at the low level. They have 24 dual core IA32 processing units arranged in tiles. There is a 2D grid routing system. From what I gather, there is a buffer for incoming data (and outgoing?) and each L2 cache is independent. There is a central (up to) 64GB of RAM.
I like the buffering mechanism. I wonder how it works exactly. Is it some kind of DMA scheme? How do you program it? What is the bandwidth between each router? They say 256GB/s of bandwidth, but doesn't qualify it. What's that mean exactly? Also, can you receive data from 4 different tiles at once at full bandwidth or no?
Personally, I wish that the 2D grid was replaced with one that was hierarchical. Where you would have 4 or 9 tiles connected together in a 2D grid, but where one of the tiles was also part of a higher level 2D grid and so on. The reason for this is that you can use group of cores together to work on common tasks and then be able to send the output to another group for further processing. This would eliminate a lot of connections between nodes that I think wouldn't be used anyhow other than to pass data along until it reaches the proper node. I believe this will be the buffering will be the bottleneck anyhow and making sure that any node can be reached in log(n) "jumps" would be beneficial. Still, I'm not exactly sure what all the features are and how they work. I wish I could get my hands on one.
Update Dec 5, 2009:
Via Slashdot, I got the article mentioning the cancellation of Larrabee.
It's kind of funny that over and over, it seems like the stuff I post that tick off a lot of people ends up being true. I foretold the problems and issues with this kind of technology despite Tim Sweeney's assertions. I was called crazy. For those thinking that I was wrong on CUDA should note that it was well in use a year ago. Here's a review from Tom's Hardware that has much more information than I could ever bring to the table. The point I made is still valid as seen from this quote in Tom's Hardware's article.
CUDA isn’t a magic wand that can accelerate everything. Even within the specific field of transcoding, only certain types of operations, such as motion compensation and discrete cosine transform (DCT), lend themselves to rampant parallelization. Many functions don’t. Developers don’t simply say, “hey, let’s coda for CUDA,” have a good chuckle over their wit, and get a 20x performance boost two or three weeks later. The application must contain functions that can leverage parallelism in a way that jibes with CUDA’s architecture.
Exactly right! But tell that to Tim Sweeney. Oh, and Larrabee apparently hasn't been 100% cancelled. It simply won't be available to the public just yet. If it ever will be, I don't know. People are saying this is the first nail in the coffin.
Do I think CUDA should NEVER be used? THAT is crazy. Personally, I'd rather see something like the Intel 48 core processor mentioned in this article instead of something promoted as a GPGPU. I don't like Tim Sweeney's assertion that having general processing is as fast or scalable or easier than custom hardware for a specific task. So if you're going to go the general processing route, then go that way. Just be honest with the advantages and disadvantages. I will admit that having technology like CUDA on a video card is a plus because it makes this kind of parallel processing available sooner than the alternative.


Unregistered user # Thursday, December 3, 2009 10:41:34 PM
Unregistered user # Thursday, December 3, 2009 10:42:47 PM
Unregistered user # Friday, December 4, 2009 5:00:49 AM
Unregistered user # Friday, December 4, 2009 7:49:52 PM
Unregistered user # Monday, December 7, 2009 2:19:46 PM