Software Development

Correcting The Future

Transition

The other day, I was cleaning out some old boxes that I had stored away. In one of them were some computer books, games and magazines. Found same old Amiga games like Secret of Monkey Island and Monkey Island 2 that I had bought at an auction (they were new at the time). Still had all the documentation in it. I also found an old Game Developer magazine from May 1998. Guess what the programming tutorial was? Building an Inline Performance Monitoring System. IOW, how to manually profile your assembly code.

It starts,


Techniques used by today's microprocessors to achieve unprecedented performance have also resulted in unprecedented complexity in the way that assembly routines are optimized. Minimizing the number of instruction execution cycles no longer guarantees the fastest solution. As a developer, you have to consider additional factors, such as cache and translation lookaside buffer (TLB) states, as well as the state of pipelines and execution units. You also have to account for the architectural aspects of the processor, such as support for speculative instruction execution, reordering, and retirement. The alignment of code and data can also make a huge difference.



Aside from multicore, the above paragraph could be written today. It even goes on to say that most games are written in C/C++, but that sometimes you need to go lower and profile your code.

That was 13 years ago. Now consider 13 years before 1998. Could you write the same thing in 1985? Look here if you want to know what was going in 1985 computer-wise. It was the year that Nintendo would take over the console market from the likes of Coleco, Intellivision and Atari after the crash of 1983. Commodore would come out with the Amiga and the C64 was king in the personal computer department until its demise around 1992.

It's 26 years later and you'd think that the days of assembly were over. Remember all the hype about RISC? Please don't misunderstand. RISC has had a profound impact on processor design. But I remember in the early 90's a total PR dominance of the RISC methodology. You were a heretic if you said anything even remotely against it. Everyone was predicting the demise of all current chips that used CISC. For a while, some of these things seemed to come true. The M68K series was eventually abandoned. Apple switched to the PowerPC. But the switch to RISC processors that could be seen at the assembly level were few and far between when it came to personal computers.

What happened instead was that the Intel processors in the PC became even more popular with the famous "Intel Inside" campaign. The 486 was produced from 1989 to 2006. It's the processor that brought us games like Doom and the numerous derivatives. Not only that, but only about 6 instructions were added to the 486 from the 386. Adding CISC instructions was out of favour. Everyone thought that the existing instructions would become faster and that more complicated CISC instructions would become obsolete. Eventually, such a thing did happen with the Pentium. At least when it came to the underlying architecture (not at the assembly level). It wasn't until the advent of MMX that a substantial amount of new instructions were introduced.

So we have the transition from CISC instructions that take X cycles before the next instruction can execute. The we have the transition to superscalar and pipelined architecture where multiple instructions can be executed at once as well as splitting up each CISC instruction into smaller execution units that can be pipelined. During this transition, almost no new instructions were introduced on the Intel processors used in PC's. And like I said earlier, the M68K was being phased out and the PowerPC was introduced.

What happened next was a complete reversal. MMX, SSE and SSE2 are each completely new instruction sets that don't work on the same registers. Yes, the instructions all do similar things, but all on different sets of registers. MMX operated on 64bits of the floating point registers, but did vector operations on integers (operated on multiple integers at once). SSE did floating point vector operations on a new set of 128bit registers. Intel decided that MMX wasn't the way to go. It would be better to use integer vector operations on the SSE registers which are twice as wide. So that's where SSE2 comes into the picture. SSE2 is essentially MMX on the SSE registers (with other additional instructions added in). Today, there are new AVX, FMA3 and FMA4 instructions sets either in the planning stage or in development.

And these instructions are getting more complex at each new release. We can have instructions that can do CRC calculations. With x64, the general purpose registers are twice as wide and twice as numerous. SSE registers have also double in count. There is talk about expanding them to 256, 512 and even 1024 bits wide in the future.

There is a message to be learned here other than a history lesson. It's that everything old is new again, but with added layers. The first two blocks of code in this article I wrote in October 2008 demonstrates this quite clearly. They both do the same thing. But one is now. The other is from 26 years ago.

Another article I wrote was about video cards and the trend to go back to general computing. I got a lot of heat for that article. Ultimately, any idiot could see what was going to happen. I'm not saying it's not nice to have. But it's not anywhere close to the the silver bullet they were promising.

Back to the Game Developer magazine I found, there is one other article in there that I really wanted to discuss. It's on page 27 titled simply 3D Graphics Hardware. This is a look at what the situation was in 1998. There's a pie chart at the bottom showing the market share of each company.

1. 3Dfx @ 33%
2. Rendition @ 27%
3. NEC (PCX2) @ 27%
4. 3Dlabs (Permedia 2) @ 13%

Funny thing is that nVidia isn't even on the map yet, but just released the RIVA 128 which combines 2D and 3D on the same card. The article mentions this as it has a blurb about each company releasing 3D hardware. They even say, because of what I just paraphrased, "In some ways, this makes nVidia as a more interesting company to watch than 3Dfx, since nVidia is going after both the retail upgrade market, exemplified by PCI-based systems, and the new AGP enabled machines of brand-name PC OEMs such as Gateway 2000." They go on to mention how 3Dfx will have a difficult time transitioning into the combined 2D and 3D market if they were to do so.

Even Game Developer knew at the time that having one card that does both is better than the 3Dfx cards that only did 3D. Today, this seems unimaginable that you needed two cards, one for 2D and one for 3D. But it was indeed so in 1998 and this is what made it possible for 3Dfx to "keep the company ahead of the pack". Everything I've mentioned here is in the article.

I won't go into why 3Dfx went away. But we do know that having one card instead of two is better if it accomplishes the same thing. With this, we can see how the future unfolds by looking at the past.

Stage 1
With processors, we had single instructions at a time.
With 3D, general purpose code was too slow.

Stage 2
Processors implement superscalar and pipelined architectures.
3D hardware use a pipeline and multiple texture units.

Stage 3
Processors implement more vector instructions and multiple cores.
3D hardware uses a multitude of cores to implement general purpose programming.

I'm afraid to say that the verdict is not one of simplicity. RISC did not win out at the assembly level. If there ever was a trend to go that way, it's been hidden away in the hardware. Market insiders have said for a long time that hardware manufacturers are waiting for developers to tell them what they need out of the hardware. That's not exactly true. They're waiting for the market to break a certain way. And it's not breaking because it's a catch-22 this time around.

In the past, it was always about speed. Stage one and two are all about that. Stage 1 can go faster by having more cycles per second. Stage 2 can go faster by pipelining and superscalar architecture where you can operate on more than one instruction at a time. Obviously, the next stage was multicore, but that was never new. It's always been around. What I'm saying is that the hardware companies are putting in what they already knew.

So what's the next stage? Stage 4?

Re-simplification. We can look back at stage 2 for some hints. One would be hard pressed to think that Stage 2 was about making it simple. And they'd be right. While the instruction set did not simplify, the idea of RISC was alive and well. So don't expect any actual simplification for the user this time around either. But expect to see different technologies coalesce into a more consistent overall design.

The first hint of this is software ray tracers. I fully expect that in the future, custom hardware will be handled by general purpose hardware. I also expect that code that is done in 3D or custom hardware today will eventually have to be redone in software. So dust off your software renderers that you wrote back in the 90's. You're gonna need them. No more dumping the tasks to hardware and waiting for the results.

Then you know what Stage 5 will be? The re-introduction of specific hardware that can implement common tasks. Pluggable devices for specific functionality will make a comeback. I expect this to be especially lucrative in the mobile market. Turn your mobile device into anything you want it to be.

I could be wrong of course and I expect that the hardware industry will play its waiting game a little longer. What does this mean for right now? More of the same for a while yet.

The real bottleneck is and always has been sequential code on multiple cores. Will developers change their ways? History answers that quite clearly. NO! Even when developers think they've changed, it's only because old has become new again. What I envision is that the only way for hardware to be able to properly split up parallel code is if the developer states up front what can be split up. Like I said, they won't do that. So I fully expect a trend toward modularization that looks similar what we're doing now, with the exception that you need to specify inter-module dependencies. You know what that is, right? So do I. It's already begun. Those that see it will survive. Those that don't will disappear. Now you know why hardware companies are cautious, lest they become another 3Dfx. This time, it's not up to them. It's up to the developers. At the same time, they could miss it if they act too late.

Stage 4 is nothing more than Stage 2 on top of Stage 3.

Both hardware and software are needed to make it work.

And it's unavoidable.

Project V RefactoredWhat I Learned About Programming or Computers From Watching Movies

Comments

Unregistered user Sunday, April 3, 2011 4:52:57 AM

Dan writes: "Pluggable devices for specific functionality will make a comeback. I expect this to be especially lucrative in the mobile market. Turn your mobile device into anything you want it to be." Spot on! We can already see this happening, in a way, with devices such as Always Innovating's Smart Book[1] and Mobile Internet Device - the MID plugs into a touchscreen, which turns the mobile device into a tablet and/or netbook. I've heard that HP (I think it was, might be wrong) are releasing something similar. I think, in time, mobile devices will be the main computing device, with pluggable extensions for everything that doesn't fit into a mobile device: need a larger screen - plug it in! Need more storage space - plug it in! Need more number crunching capabilities - plug it in! Need XYZ - plug it in! Side note, I worked on the RenrakuOS project (sadly abandoned, at least for now) for a little while and one of the "ultimate goals" was to accommodate this kind of future by allowing programs to seamlessly "move" from device to device. [1] http://www.alwaysinnovating.com/products/smartbook.htm PS: Would love some more Project V blog posts. I understand that you may not have the time to do so, but... I bet all your readers are eagerly awaiting more Project V related news :)

Unregistered user Sunday, April 3, 2011 4:59:50 AM

Dan writes: Just had a thought, on this: "So I fully expect a trend toward modularization that looks similar what we're doing now, with the exception that you need to specify inter-module dependencies. You know what that is, right? So do I." I've thought about this before and the wording used here, "you need to specify inter-module dependencies", made me remember it. I think an interesting way of "selling" dataflow to the average "enterprise" programmer is to pretend their using dependency injection when in fact they're writing dataflow code (either visually, in a graphical editor, or textually using XML *shudder* or whatever is popular with these types today).

Unregistered user Sunday, April 3, 2011 5:01:00 AM

Dan writes: Sorry about spamming your comments, but.. man my.opera.com comments suck. It killed the formatting of both my comments :(

Vorlath Tuesday, April 5, 2011 12:26:56 AM

Originally posted by anonymous:

PS: Would love some more Project V blog posts. I understand that you may not have the time to do so, but... I bet all your readers are eagerly awaiting more Project V related news smile



Yeah, I'll try and write something soon. I've been really busy with all sorts of things. I really want to release something more than I want to write about it, but my time is really limited lately. So I don't have much time left for Project V.

Originally posted by anonymous:

I think an interesting way of "selling" dataflow to the average "enterprise" programmer is to pretend their using dependency injection when in fact they're writing dataflow code



Yeah, this is what I'm thinking will happen anyways. A central point of this article is that I don't see any way forward but through dataflow. Of course, they'll never admit to it. Functional programming already took a step in that direction. I'm expecting imperative (and OO) languages to also go that way, but in a more insidious manner.

In a way, I'm tempted to put aside what I have and replace the runtime engine with something much simpler with pre-defined types and simple connections. It wouldn't have any fancy type tools and it'd be lacking how I wanted to handle implementations, but I could get something working much sooner. Later, I could write an import module for older networks.

Might be the way to go.

As for a topic on Project V, not sure what I could talk about that I haven't already discussed. The advanced topics would require a full implementation and that's still a ways off. I'm busy for at least the next two weeks. So I'll see if I can't get something released for May even if it's just a NAND gate and a connection. I'm getting really close to getting the connections working and the single processor runtime engine is mostly complete. MAYBE this time, I'll actually be able to keep one projection on target.

Vorlath Tuesday, April 5, 2011 2:04:26 AM

Actually, it's not the runtime engine or anything with dataflow that's holding me up. I still have some work to do with the interface.

Unregistered user Tuesday, April 5, 2011 9:53:05 AM

Anonymous writes: Im wondering how you plan to handle clutter. in other visual programming people offtain complain that the work space of an advanced function gets cluttered with connections leading off every where. What are your plans to handle that? this only thing i can think of off the top of my head is some type of visual code folding where you draw a box around code modules and it compresses them into an shell/icon that you can dive into. might be a good way of making modules while you do code cleanup as well.

Unregistered user Tuesday, April 5, 2011 2:06:50 PM

Anonymous writes: http://rebelscience.blogspot.com/2009/08/why-i-hate-all-computer-programming.html I just want to build it. I don’t want to have to use a complex language to describe my intentions to a compiler. Here is what I want to do: I want to look into my bag of components, pick out the ones that I need and snap them together, and that’s it! That’s all I want to do. -I agree with him

Unregistered user Tuesday, April 5, 2011 8:54:28 PM

Dan writes: re: comment #5, take a look at SynthMaker http://synthmaker.co.uk/index.html which is basically a dataflow language for programming musical synthesizers (though some users have done much more complex things with it!). I've had fun playing around with it - its pretty easy to use. In it, components are basically modules built up from other components. A high level component could be an oscillator where you can select a waveform, frequency and amplitude. As an input, it may receive MIDI events and the output could be an audio stream which can then be filtered or whatever before being output to the audio output component. By double clicking on the oscillator component, you can look inside and see that it is composed out of another set of components and you can double click them to see whats inside. At the lowest level, there are a set of primitives for addition, subtraction, feedback loops, type conversions, triggering events and so on. There is also a scripting component where you can write formulas graphically, which can often be more convenient for highly mathematical code (eg, implementing a complex audio filter). Finally, there is also an assembly component, where (should you want to) you can write a component in x86 assembly (including SSE - for example, for fun I wrote a component that mixes and amplifies two audio streams, making use of assembly http://oi52.tinypic.com/2nb6dc9.jpg ). Anyway, without having seen what Vorlath has planned, I envision Project V to be somewhat similar - components can be nested, so you can build complex components out of simpler components. At the very bottom, there would be primitive components, implemented in a more traditional language (Vorlath mentioned supporting exisitng languages for this), potentially implementing some performance-critical components in optimized assembly. For a lot of code, especially glue code, value routing code, event based code (like dealing with user input) and such works very well in the graphical dataflow model (as proven by synthmaker, Max/MSP, the dataflow language artists use in Blender etc), but sometime it is more convenient to write a mathematical formula in a more traditional mathematical formula. The good news is, I envision languages like Project V to actually be BETTER at this than textual languages because a "math" component could be built to actually display real mathematical formulas, instead of the rough approximations of such used in traditional textual languages.

Unregistered user Wednesday, April 6, 2011 2:17:24 PM

Dan writes: Err, error corrections in the previous post: I was replying to post #6, not 5, and "There is also a scripting component where you can write formulas *graphically*, which can often be more convenient for highly mathematical code" should have read "textually", not "graphically". Also, an example of SynthMaker nested components, there is a panel at the top of the window that shows a thumbnail of the component you are currently editing and shows which nested component you are working in. Heres a screenshot of the thumbnail: http://synthmaker.co.uk/images/depth%20L.png and a screenshot of the full SynthMaker window: http://synthmaker.co.uk/images/s2.png

Unregistered user Thursday, May 5, 2011 1:04:48 PM

Anonymous writes: After reading about project v and how you want to have a set of basic functions as nodes that can be linked and build even more complex nodes and being able to run as a flow based approach, it sounds to be you are describing something we already have that you can model your programming language from. Hardware. From TV's, to toys, to super computers they all use the same basic parts. even a CPU can be broken down into the same parts where only the structure differs. if a programming language is based on hardware i also don't see why you couldn't use genetic algorithms to develop better programs they already use it for finding better circuit layouts and such things. Also unlike real hardware you wont have the same constraints. I'm not saying to model a programming language after hardware 100%, but still look at the modularity we have with today's hardware. plug in a keyboard and it just works almost 100% of the time. Its the closest thing describing what project v sounds like to me. And like i said its not like you would have to model a programing language to hardware 100%

Vorlath Thursday, May 5, 2011 9:39:07 PM

Yeah, dataflow certainly has similarities with hardware. The advantages of having it in software are numerous. So much so that it starts to be very different from hardware. For one thing, CPU's use von Neumann architecture to reuse existing components (which also means they have to be use sequentially). With software dataflow, there's no need to only have X amount of any component. The computer can create as many as needed dynamically. Besides, only data paths need to be taken care of because the code that actually performs an action can be reused for all copies.

What's more, the runtime for my dataflow engine will not only duplicate components, but also run them in parallel if need be and merge the results back in the correct order without you having to do anything. On other occasions, it will pipeline the data, again being able to use all cores on sequential transformations. This means you can very much think sequentially about what you want done to your data and when you're done, just feed all your data through. It'll automatically use all processing units.

Another advantage is how you can connect components together. With hardware, once a connection is made, that's it. But in a software network, you can dynamically create new networks or modify existing ones. I've even had an ultimatum given to me that I MUST allow the passing around of components to allow customizable operations. My issue with this was that certain connections can remain intact even while it is being passed around. I didn't like this as it can cause a mess and make it difficult to debug. But the power it would give is astounding. Networks traveling within networks while still being interconnected to yet more networks who can also be travelling... with all different parts of the same network spread out on thousands or millions of machines...

There's also components that can accept or produce global data within a component. What this means is that you don't have actual connections between components. Instead, you can insert components that accept data of a certain type (or tag/value). IOW, you have a collection of producers and a collection of sinks. The connections will be automatic.

Why would you want this? When the producers can produce various types of information and they can be processed by various other components. The network would be WAY too messy. So you set up sink and producer connections. You can also add a sorting component to redirect traffic on the fly. This is like simulating a delivery company. All data packages will have a tag that tells it where it should go. And you also have a series of possible destinations (component connections) with those same tags.

A contemporary use of this would be a thread pool. Thread pools are pointless in dataflow, but if you wanted to build one to execute existing code, then you could model a series of components, one for each core, and when a core is busy, it would change its flag to busy. When it's free, it would change it to 'execute'. With components that send requests to execute legacy code, those requests would have an 'execute' tag. All of a sudden, your requests will be automatically processed as soon as a processing core becomes free since that would cause a match in the request's tag and the processing core's component's tag creating a temporary virtual connection. You can mix and match different kinds of tags for whatever use.

There will also be a scripting language for conventional code. I will also add a dataflow language into it specifically tailored for Project V.

I didn't even get into the design part of it yet where I think will show the most promising productivity boost. For any component, there can be tools attached to them. Docs, examples, configuration sheets, editors, etc. can all be part of your component at design time. This means the IDE will be enhanced as time goes on and people write editors for their components. They can even create plugins for the IDE in the same way.

Anyways, I could go on and on. Like being able to use humans and other non-machine entities as part of the software logic.

Write a comment

New comments have been disabled for this post.

June 2012
S M T W T F S
May 2012July 2012
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30