Chris At Home

A Jawa American Living in Mindanao

Microprocessors

My old life was at Intel. My job was to both analyze the performance of software on new (using real hardware) and future microprocessors (using models) and optimizing code for those processors. My group worked on games (Unreal, Half-Life, etc.), image processing apps (Photoshop, for example), video (everything done by Sony, many apps done by other companies), and benchmarks (primarily SPEC, but other benchmarks which showed AMD approaching our performance). We did some good work, obttaining huge performance gains in some cases and feeding processor fixes back to the architects for next versions. (For anyone who remembers, I was the guy who found the compiler changes needed to make the PIII beat the K7 on the SPEC benchmark when it was first released. I didn't sleep for 3 days, then the compiler guys didn't sleep for a day or two, then we had a compiler that eaked out a small performance win and a huge propaganda win against the AMD chip.)

My first major processor was the Pentium III. We applied the SSE instructions to video apps and analyzed performance of a huge array of apps. It was a beautiful processor, but we learned about the flaws in our modeling software quickly when the actual chip was released: some instructions were expected to be homeruns based on modeling, but were worthless with real hardware.

When we began work on the Pentium IV, the software model looked like the fastest thing you can imagine. Unfortunately, it had a few serious weaknesses: it was built around executing code in the best case scenario. In other words, it was screaming fast if you didn't have to get data from memory or the hard drive, if your math code was very good, if you had a good mix of logical instructions and data movement instructions. If you had bad code, you had a bad, bad, bad processor. Most code was bad, unfortunately and the PIV picked up a reputation as a bad design. I would judge it a very sensitive design.

Since we were stuck with the PIV core for 5 or 6 years, the only solution was to make the clock run faster to decrease the time spent on delays inside the chip. This meant using a lot of power and more compromises inside the chip. Eventually, it became a losing game. The 3.8GHZ Prescott, even if it had not sucked power, would have barely performed better than than a 3.2GHz Pentium IV, and in many applications it would have performed worse.

AMD caught us with the K7 and K8, which I consider lackluster designs. Both are advanced clones of the Pentium III. There is little innovation (although AMD fanboys will tell you they are revolutionary) just the same basic core with extra stuff tacked on. On-chip memory controller? Yeah, Intel thought about it and I asked for it...our designers didn't like the idea because it ties the processor to a specific memory technology, instead of allowing the processor to work with anything supported by the motherboard design. The memory controller is the primary reason AMD is making in-roads into servers and showed better performance on most apps versus the lates PIV chips...it is faster to do something yourself instead of explaining to someone else what to do and then waiting for them to do it, right? That is why moving the memory controller into the processor is faster. It took Herculean effort by the design teams and performance analysis teams to keep the PIV competitive with the K7 and K8.

Which brings us to the Intel Core processor which will be the base design for the next few years. What is it? An advanced Pentium III. Penalties for bad code are low, access to data is fast, it contains enough performance tweaks to beat the Pentium IV. It is designed to be changed to include multiple processors on a single chip, with 2-way and 4-way coming soon. Plus a lot of gee-whiz marketing features to make reviewers happy. Slashdot thinks it is probably the best Intel chip ever.

AMD is making noises about their next generation processor. It is designed to allow a bunch of processors to look like only one processor to Windows. This is important for one reason: processors are now so cheap that we can throw processors at problems. Every process running on your computer (there are dozens) will be able to have access to a processor without waiting. Will it make your applications run faster? No. Will it be impressive in benchmarks? Not so much. But it will let you get more throughput (more stuff will get done per second, but each single task will not be done faster, if you get what I mean).

In any case, microprocessors are entering the era of apathy. Further performance increases in a processor are possible, but they are power hungry by their nature: running redundant code, loading data that many not be needed, performing work which will be discarded. We have found and utilized all of the instruction level parallelism that is available. Now, performance must come from higher level parallelism, which means more processors.

And, by the way, I need to start looking for a job. Who needs a guy who can make hardware and software scream? Is your application a dog? Hire me. Is your processor slow? I can help. Unfortunately, my job skills probably limit me to either AMD or Intel. Austin, San Jose, Portland, Phoenix, here I come...

P.S. -- 80% of what you read on the CPU review sites is wrong. Some of them know about processors, but they don't have a clue what the real reasons for performance issues are. In most cases, the issue is very subtle and complicated or it would have been fixed before the chip was released. Replay cyclone while waiting for the senior store to complete, which was delayed by a lack of available load/store buffers, anyone?

Must Read of the DayUK Chief Constable Makes Joke