The other day, I was cleaning out some old boxes that I had stored away. In one of them were some computer books, games and magazines. Found same old Amiga games like Secret of Monkey Island and Monkey Island 2 that I had bought at an auction (they were new at the time). Still had all the documentation in it. I also found an old Game Developer magazine from May 1998. Guess what the programming tutorial was?
Building an Inline Performance Monitoring System. IOW, how to manually profile your assembly code.
It starts,
Techniques used by today's microprocessors to achieve unprecedented performance have also resulted in unprecedented complexity in the way that assembly routines are optimized. Minimizing the number of instruction execution cycles no longer guarantees the fastest solution. As a developer, you have to consider additional factors, such as cache and translation lookaside buffer (TLB) states, as well as the state of pipelines and execution units. You also have to account for the architectural aspects of the processor, such as support for speculative instruction execution, reordering, and retirement. The alignment of code and data can also make a huge difference.
Aside from multicore, the above paragraph could be written today. It even goes on to say that most games are written in C/C++, but that sometimes you need to go lower and profile your code.
That was 13 years ago. Now consider 13 years before 1998. Could you write the same thing in 1985? Look
here if you want to know what was going in 1985 computer-wise. It was the year that Nintendo would take over the console market from the likes of Coleco, Intellivision and Atari after the
crash of 1983. Commodore would come out with the Amiga and the C64 was king in the personal computer department until its demise around 1992.
It's 26 years later and you'd think that the days of assembly were over. Remember all the hype about RISC? Please don't misunderstand. RISC has had a profound impact on processor design. But I remember in the early 90's a total PR dominance of the RISC methodology. You were a heretic if you said anything even remotely against it. Everyone was predicting the demise of all current chips that used CISC. For a while, some of these things seemed to come true. The M68K series was eventually abandoned. Apple switched to the PowerPC. But the switch to RISC processors that could be seen at the assembly level were few and far between when it came to personal computers.
What happened instead was that the Intel processors in the PC became even more popular with the famous "Intel Inside" campaign. The 486 was produced from 1989 to 2006. It's the processor that brought us games like Doom and the numerous derivatives. Not only that, but only about 6 instructions were added to the 486 from the 386. Adding CISC instructions was out of favour. Everyone thought that the existing instructions would become faster and that more complicated CISC instructions would become obsolete. Eventually, such a thing did happen with the Pentium. At least when it came to the underlying architecture (not at the assembly level). It wasn't until the advent of MMX that a substantial amount of new instructions were introduced.
So we have the transition from CISC instructions that take X cycles before the next instruction can execute. The we have the transition to superscalar and pipelined architecture where multiple instructions can be executed at once as well as splitting up each CISC instruction into smaller execution units that can be pipelined. During this transition, almost no new instructions were introduced on the Intel processors used in PC's. And like I said earlier, the M68K was being phased out and the PowerPC was introduced.
What happened next was a complete reversal. MMX, SSE and SSE2 are each completely new instruction sets that don't work on the same registers. Yes, the instructions all do similar things, but all on different sets of registers. MMX operated on 64bits of the floating point registers, but did vector operations on integers (operated on multiple integers at once). SSE did floating point vector operations on a new set of 128bit registers. Intel decided that MMX wasn't the way to go. It would be better to use integer vector operations on the SSE registers which are twice as wide. So that's where SSE2 comes into the picture. SSE2 is essentially MMX on the SSE registers (with other additional instructions added in). Today, there are new AVX, FMA3 and FMA4 instructions sets either in the planning stage or in development.
And these instructions are getting more complex at each new release. We can have instructions that can do CRC calculations. With x64, the general purpose registers are twice as wide and twice as numerous. SSE registers have also double in count. There is talk about expanding them to 256, 512 and even 1024 bits wide in the future.
There is a message to be learned here other than a history lesson. It's that everything old is new again, but with added layers. The first two blocks of code in
this article I wrote in October 2008 demonstrates this quite clearly. They both do the same thing. But one is now. The other is from 26 years ago.
Another article I wrote was about video cards and the trend to go back to general computing. I got a lot of heat for that article. Ultimately, any idiot could see what was going to happen. I'm not saying it's not nice to have. But it's not anywhere close to the the silver bullet they were promising.
Back to the Game Developer magazine I found, there is one other article in there that I really wanted to discuss. It's on page 27 titled simply
3D Graphics Hardware. This is a look at what the situation was in 1998. There's a pie chart at the bottom showing the market share of each company.
1. 3Dfx @ 33%
2. Rendition @ 27%
3. NEC (PCX2) @ 27%
4. 3Dlabs (Permedia 2) @ 13%
Funny thing is that nVidia isn't even on the map yet, but just released the RIVA 128 which combines 2D and 3D on the same card. The article mentions this as it has a blurb about each company releasing 3D hardware. They even say, because of what I just paraphrased, "In some ways, this makes nVidia as a more interesting company to watch than 3Dfx, since nVidia is going after both the retail upgrade market, exemplified by PCI-based systems, and the new AGP enabled machines of brand-name PC OEMs such as Gateway 2000." They go on to mention how 3Dfx will have a difficult time transitioning into the combined 2D and 3D market if they were to do so.
Even Game Developer knew at the time that having one card that does both is better than the 3Dfx cards that only did 3D. Today, this seems unimaginable that you needed two cards, one for 2D and one for 3D. But it was indeed so in 1998 and this is what made it possible for 3Dfx to "keep the company ahead of the pack". Everything I've mentioned here is in the article.
I won't go into why 3Dfx went away. But we do know that having one card instead of two is better if it accomplishes the same thing. With this, we can see how the future unfolds by looking at the past.
Stage 1With processors, we had single instructions at a time.
With 3D, general purpose code was too slow.
Stage 2Processors implement superscalar and pipelined architectures.
3D hardware use a pipeline and multiple texture units.
Stage 3Processors implement more vector instructions and multiple cores.
3D hardware uses a multitude of cores to implement general purpose programming.
I'm afraid to say that the verdict is not one of simplicity. RISC did not win out at the assembly level. If there ever was a trend to go that way, it's been hidden away in the hardware. Market insiders have said for a long time that hardware manufacturers are waiting for developers to tell them what they need out of the hardware. That's not exactly true. They're waiting for the market to break a certain way. And it's not breaking because it's a catch-22 this time around.
In the past, it was always about speed. Stage one and two are all about that. Stage 1 can go faster by having more cycles per second. Stage 2 can go faster by pipelining and superscalar architecture where you can operate on more than one instruction at a time. Obviously, the next stage was multicore, but that was never new. It's always been around. What I'm saying is that the hardware companies are putting in what they already knew.
So what's the next stage? Stage 4?
Re-simplification. We can look back at stage 2 for some hints. One would be hard pressed to think that Stage 2 was about making it simple. And they'd be right. While the instruction set did not simplify, the idea of RISC was alive and well. So don't expect any actual simplification for the user this time around either. But expect to see different technologies coalesce into a more consistent overall design.
The first hint of this is software ray tracers. I fully expect that in the future, custom hardware will be handled by general purpose hardware. I also expect that code that is done in 3D or custom hardware today will eventually have to be redone in software. So dust off your software renderers that you wrote back in the 90's. You're gonna need them. No more dumping the tasks to hardware and waiting for the results.
Then you know what Stage 5 will be? The re-introduction of specific hardware that can implement common tasks. Pluggable devices for specific functionality will make a comeback. I expect this to be especially lucrative in the mobile market. Turn your mobile device into anything you want it to be.
I could be wrong of course and I expect that the hardware industry will play its waiting game a little longer. What does this mean for right now? More of the same for a while yet.
The real bottleneck is and always has been sequential code on multiple cores. Will developers change their ways? History answers that quite clearly. NO! Even when developers think they've changed, it's only because old has become new again. What I envision is that the only way for hardware to be able to properly split up parallel code is if the developer states up front what can be split up. Like I said, they won't do that. So I fully expect a trend toward modularization that looks similar what we're doing now, with the exception that you need to specify inter-module dependencies. You know what that is, right? So do I. It's already begun. Those that see it will survive. Those that don't will disappear. Now you know why hardware companies are cautious, lest they become another 3Dfx. This time, it's not up to them. It's up to the developers. At the same time, they could miss it if they act too late.
Stage 4 is nothing more than Stage 2 on top of Stage 3.
Both hardware and software are needed to make it work.
And it's unavoidable.