Back On
Saturday, 3. June 2006, 05:28:39
In the meantime, I still see that there's still nothing new in the programming front. More features, more API's, more of the same old, same old. It's obvious that not many people really understand this new style of software development. I'll give it one more final attempt to clear this up.
No matter what you deal with, there are always two opposite and equal views. You can accomplish the same tasks both ways. However, one way is usually FAR better than the other depending on what you are trying to do. In programming, from its conception we have done things in one view only. This concept of a "point of view" is also known as relativity. Now, you don't need to know physics to use this. All it means is that two different people can interpret an event in two different, but equal ways.
Suppose you're in outer space with a co-worker during a spacewalk, and you can't see anything except this other spacewalker. All of a sudden, he starts moving to your left. Did he move to your left? Or was it you that moved to your right? In physical space, we can tell because of something called acceleration and inertia. We can feel this force. If we felt it, then it was 'us' that moved. If not, then it's the other person that moved. But if we did not have this 'sense', we would not be able to tell who had moved without a frame of reference. And even so, who's to say it isn't the frame of reference that moved? The truth is that acceleration and inertia aside, movement is relative. Both interpretations are correct. Whether you moved to the right, or the other spacewalker moved to the left makes no difference.
In programming, we don't have acceleration or inertia. But we do have a frame of reference. This reference is called memory. We consider the address space as a static background reference that cannot change. Sure, the contents can change. But the address space never changes. Meaning that address 10 will always be address 10 to your application. All the fancy footwork behind the scenes by the OS with the MMU doesn’t matter as these effects are never seen directly as far as the application is concerned. What I mean is that if you use a certain memory location (say location 10), you expect that it won't suddenly change to another location (say location 1234). You expect it to stay static.
So the way we write programs is that we have a worker, called the CPU, which moves over this memory address space and does work wherever needed. The CPU physically requests data and manipulates it and physically stores it back. If the CPU needs the memory at address 10, it will open a latch with the address lines set to 10. Then it will work on the data, possibly in combination with other data, and write the results back out.
This is the standard sequential manner in which we all write software. It has been done since the beginning of computing. But it is not the only view. Not only that, but this view is only useful when there is only ONE worker. More than ONE worker and now you get into problems of accessing the same data at the same time or writing to the same location at the same time. Obviously, if two CPU's write to the same location at the same time, something is wrong somewhere. Or at the very least, some operations were done for nothing as they've been erased.
So what is the other view? Before we get into that, perhaps a simpler example would be beneficial. Let's say you have 10 boxes on the floor along with a stuffed animal beside each of these boxes. Your job is to put one stuffed animal in its respective box. There are actually two ways to do this. The way we would do it in contemporary programming style is to give the worker a set of instructions that they would execute over each box. So at each box, he or she would pick up the stuffed animal, put it in the box and seal the box. Then he or she would move on to the next box.
The thing is that there is another way. Instead of instructing the worker how many times to do the same operation, you can give ONE set of instructions to repeat AS NECESSARY. So you'd have a conveyor belt with all the boxes and toys on it. As they appear before the worker, he or she would execute its preset instructions. This is simply pick up toy, place in box and seal box. You no longer have to tell the worker how many items there are.
These boxes are like our address space. In computers, we can't possibly imagine our data moving of their own accord, so we list a sequence of instructions to explicitly operate on each piece of data. The truth is that the alternate view where the data moves of its own accord into static instructions is a much better one.
You need to lay out a network between instructions, but once set up, you put data in motion and let the rest take care of itself. This means you no longer need loops to process X items. You simply insert X items at the entry point of your application. The instructions in the network will execute as necessary as the data comes in. You may use a feedback loop to GENERATE the items, but never to actually operate on them.
Basically, you lay out what you want done to your data. Now all you do is plug in your data at the entry points in your network. Now see that each instruction (or node) can be executed independently by any CPU as needed. Wherever there is data coming in to an instruction, any free CPU can execute it. Note that you can process multiple items at once. Multiple instructions in the same chain can execute at once meaning that you don't have to completely process a data item before starting to process the next one. In conventional programming, your loop must execute the body once completely before the next item can be processed. In this alternate programming style, the second item can start being processed after the first statement is completed. Maybe even before.
Now, there are a few interesting things here. In a system where you have X workers (CPU's) with conveyor belts between them all (data streams), you don't actually need a physical worker at each station. You can have workers (CPU's) alternate between stations. Also, if there are no boxes (or data) on a conveyor belt, then you don't need to do anything. This is a very important statement. If 'nothing' comes in, nothing needs be done. That means if you have a NULL reference come in, it doesn't actually exist, so nothing need be done and no errors.
This is the platform I am trying to build. At least, those are the basics of it. There's much more to it than that. You can also group networks together into super-instructions (super-stations). In this way, you can actually build software at a much higher level. You would no longer need to go into minute details unless actually building specific new components. Also, because there are no explicit command loops that would hard code how many items to process, each component is truly reusable. If there is a loop, it would be located OUTSIDE the components that process this data.
All this means no more manual writing of threads, no more locking, implicit parallelism, can use as many processors as are available even if different makes, implicit portability without VM's, no GC, no NULL errors, reusable components, applications as components where you can join different applications into a super-application (and can have each application of the super-app run on a different computer over the Internet), etc. The list goes on and on.
So why haven't I created this platform already? Because ALL OS's and programming tools are in the old style. They are ALL sequential without exception (no pun intended). Having threads does not make a programming language concurrent or parallel. It's still sequential. Just with many execution points. But make no mistake; it is NOT concurrent or parallel. Such things do not currently exist. The main obstacle to creating this new platform is actually the function call. Everything is done with a function call. Imagine that you want to stream two files at once. But the messages that the OS provides for when the messages come in are all provided by the same function call. I would have to manually create a thread and use locking, or create some kind of dispatch system, but this would mean that all components would have static data shared between them... exactly what I'm trying to avoid.
Current OS's are wholly inadequate for this purpose. The current programming languages are even more archaic. I'm finding myself wishing that I had these new tools whenever I write anything in these archaic languages such as C++ or Java. These are truly 50 year old tools that have been rehashed.
If you're not a big fan of what I'm proposing, I have a question for you. What's easier to do? Control a handful of processors or control Terabytes of data? In the contemporary way, we have to write software to explicitly handle terabytes of information and make sure there is no corruption while handling a specific number of threads possibly on different processors. My way, you pretend you only have ONE piece (or handful) of data and it'll work equally well with any amount. You only need to adjust how the processors are allocated in extreme conditions. The rest of the data takes care of itself. We should always tackle the easier problem. Handling the smaller set is the easier way to solve things. So the entire computing industry, especially the educational and research institutions, can stick it in their collective asses with their tools, papers, theories and useless notions. They're attacking the WRONG problem. The more complicated one that has no solutions.
I'm at the point where I have to write a simulator for my platform because there's no other way to work around current OS and programming language limitations. At least building applications will be MUCH faster than what we're doing now and much less bug-prone once I'm done. In this new 'view', you can see your data flow. You can SEE where it's supposed to go. When you write software using a programming language, you can't tell. I've looked and maintained a great deal of other people's code. How many times have I had to backtrack up the call chain to see where a certain piece of data comes from? It's not readily apparent. You have to be a detective. Sure, that may be why we get paid. But it sure is a backwards way to do things.
One last thing I want to talk about is exceptions. Ever since I first saw these, it left a bad taste in my mouth. I never knew for sure, but something always bothered me about them. I've already discussed in the past many things that I don't like about them. I understand hardware exceptions, but programming language exceptions are rather odd creatures. The problem with both error checking and exceptions is that there is no way to know what the errors are. Well, not by the function call or by anything obvious anyways. You'll get the compiler error, but many languages don't force you to handle these exceptions, they just ask you to insert a try/catch block. Although you can unwind a whole block, the problem is basically the same in that you don't know right off what exception or error code to check for.
In this new 'view' of programming, error outputs will be displayed right on the screen with the component itself. So if you want to handle an error, you can just link up the output to components that will handle the error. For example, a division component will have two inputs and two outputs. One of the outputs will be the result; the other output will be a copy of the two inputs along with the division by zero error code.
You may think that writing software for math equations in this fashion is rather strange and you would be right. But there's nothing stopping helper tools to be written to make this easier. These tools could allow you to write equations in the manner you are used to and it will generate the interaction network for you. You can then click on the division operator and the division component will be highlighted. Then you can write in what you want done in case of a division by zero error. Something that can't be done in any conventional language. Currently, it's all or nothing unless you want to separate the division statement by itself with its own try/catch, but then we're back to square one.
It's truly amazing that in most real-world systems, 90% of it is devoted to making sure the other 10% runs smoothly. In programming, it's the other way around. We have 10% (if that) of the code for error checking. I know that since I started checking every single line of code quite a few years ago, debugging has been a joy every time. We need ways to write how the software is supposed to work if everything was perfect. We also need sublayers that make sure all the data is processed in the correct manner and that everything goes well, but not interfere with the display of the main program. These backup components should be more numerous than the proper components. This brings back the old problem of how to know when to stop correcting errors. If a main component fails and its backup component fails as well, how many backup components do we include? Is two enough? Three? Four? It's too bad that in conventional programming practices, this question never comes up. However, in conventional languages, an exception can bring down the entire application because exceptions chain back all the way to the "main" function. I know there are variations on the theme, but they are all sequentially based. An error in one part has the possibility to cause non-obvious errors in other parts of the system.
One day in the future, there will be a battle between the new way and the old way. Although the new way can run anything created in the old way, the opposite is not true. These old systems won't interact very well. Almost every programmer in the world is using the old way. I know of only a handful that are using the new way. And none to its full potential. What will the programming language of the future look like? It'll look like beer in an empty bottle. Looked good and tasted good while it lasted, but now it's nothing more than a bad stentch that leaves a bitter taste in the back of your mouth. There won't be any programming languages in the future other than helper tools. What we know today as applications will be used in the future as data manipulation helper tools. You'll be able to transfer and modify data in one tool, and then transfer it to others for further manipulation. For example, a 3D game environment will allow the 3D software to be integrated directly into the game during its development. When completed, the level editors and other tools will be removed before shipping. You will be able to change things on the fly. Nothing will have to be shut down or restarted. Beta testers can play the game while a 3D artist updates or adds certain models and characters in realtime. The objects in the game are data and can be transferred and updated within the linked 3D tools. Likewise, the 'programmers' can update the physics and game engine in realtime.
I'm looking forward to pushing forward with this project. I'm back on the task of building something better for the future of software development. I sincerely hope the rest of the computing community comes to its senses soon enough.


How to use Quote function: