Project V: Minor Update
Monday, 5. June 2006, 00:25:44
I'll try to describe my original problem with data structures. I wanted to define types, not with a keyword, but with a set of properties. But these properties must also have certain types whether implied or not. So it's the chicken and the egg problem. Obviously, I needed basic building blocks. So I decided to create 4 predefined types. These are predefined collections of properties that describe these four types. I thought this would work. But of course it didn't.
Even if I predefine a group of properties, those properties will still have a type. And types have their own sets of properties that describe them. So again, the recursion would not go away. Not only that, but what if I wanted to expand the definition of these primitive types? After all, I wanted to design something that was extensible. If I redefine an integer to have a property of 'size' for example which holds the number of bytes in the integer, then I get into a recursive problem again. Where does the old definition end and where does the new one begin?
Before I get to my solution, here's a short description of how these types were supposed to work. At the most basic level, these data items could be used within the compiler directly. So you can have templates and macros that execute and produce other code and data before the final compilation process. The collection of properties would be available during the first pass. However, the actual data items to which they define the type would not be available. That would only be available at runtime. At runtime, both properties and the actual data would be made available.
My problem was that I was trying to define my metalanguage within itself. This just plain doesn't work. That's why I was saying earlier that my compiler was compiled within itself and that it was rather strange. Well, this is quite different from writing a C++ compiler where you already have other languages available such as C or even other C++ compilers. I can't use any von Neumann languages for this language other than to build a simulator to kick things off. After which time, it will be ditched.
So it is clear that I have two languages (the metalanguage and the actual primary language), however similar they may work. So my solution is to have a third level (or rather a first level above the other two) of my four predefined types that cannot be broken down. They have no properties, but they can be assigned values. So we have bool, int, float and string. These are up to the compiler how they are to be handled. You can group them together to produce something like a struct in C. However, these structs define primitive types for the second level. The only difference is that the data items in these structures are what together define the actual data for the second level.
At the second level, you can again group together first level data items. At this level, we can create template components and all sorts of compiler routines to be executed before the final stage. You can do everything on this level that you can in the third. The only difference between the two is that this level is executed at compile time while level three is executed at runtime. Maybe there is one extra difference. You only have access to data defined at the current or previous level. So any data defined in level three will not be available at compile time in this level unless hard coded or static.
At the third and final level, this is where most of the software design will take place. This is where you'll be able to directly manipulate data items and create components to be used at runtime.
You can have more than three levels, but I don't see the need right now. Later on when more tools come out, I can see this happening. More than likely, they will be 2 1/2 level components though. Where they contain both metadata and actual runtime components.
Now that this is solved, I expect that everything will start moving along rather quickly. Again, I must stress that building this in a von Neumann language is a pain. I still have the execution handler to write after this. I'm also thinking of ditching my intermediate language in favour of something a little more direct. I think a simple list of descriptions will suffice. Level one will be easy enough. It just defines primitive types. Level 2 combines these together. Level 3 is building the executable. Oddly enough, level 2 is the most difficult as it deals with templates and metaprogramming. Another thing that I'm keeping in mind for later is templates that get used at runtime. What I mean is that the template isn't compiled down right away, but rather compiled at runtime depending on the data that comes through it. You can have a predefined set, or you can have it configured to try and fetch the appropriate components from a database.
Features
I find it odd that my biggest problem is data structures rather than the actual implementation. In von Neumann languages, it's mostly about features and how to implement them. In my platform, it's all about data. The features of the platform are up to you. Here's what I like about all this:
- No commands.
- No keywords.
- No function calls.
- No GC.
- No VM.
- No allocations.
- No deallocations.
- No need for explicit threads or locking.
- Portable without VM or such nonsense.
- No pass by value, pass by reference or by pointer.
- NULL exceptions are impossible.
- Implicitly parallel and concurrent.
- Can use machine specific features in a portable way without changing the main application.
Am I hyping it enough? I need JHype 10.5 or some similar product. Maybe Vaporware 0.0 hehe. Project Vaporware. HAHA Too bad I already released a screenshot, then I could mess with everyone.
In all seriousness, these are all possible. In fact, it has to have those features. It's not like I can make it any other way. Even if I wanted to add keywords for example, there is no actual language to type in, so it'd be difficult to say the least. Memory allocations and deallocations are defined by the input and output of components, so even there, adding manual allocations would be redundant.
Now I get to do the fun part. I get to write graphical components and line drawing algorithms and other graphical tools. After I'm done, I'm going to see if I can't get a professional graphic artist to clean up the interface. I hear they hate making buttons though. I'll have to make sure to include a lot of them.
About the actual executable, I'm having some problems with accessing the OS. Well, not exactly. What I'm having problems with is OS function calls. These don't work well in my environment. Not the function calls themselves, but the fact that many of them are not re-entrant and that many components would need to call the same function to get updates. For example, say you have multiple socket and file components. These are all updated with the same function call. There would need to be some kind of interaction between these components even though in an optimal world, they have nothing to do with each other. Basically, it puts a wrench in concurrency and any kind of implicit modularity. We truly must move forward with new and improved ways of creating software.
In fact, many things in your computer work in a streaming fashion. For example, when you read a file, the OS will set up a DMA channel to read certain sectors off the HD where your file is located. Data is streamed into actual RAM in parallel with the CPU and an interrupt will be generated after so many bytes have arrived. With a HD, most sectors are 512 bytes, but with a soundcard, the buffer size can be variable. So whether you're reading a file or recording (or playing) audio, this data should be streamed to your application automatically without having to request it. It's like the OS is doing everything to turn everything upside down. This is another obstacle to my platform. I'm trying to revert what the OS is doing. Basically, I'm trying to undo the redundant operations that the OS executes. But I still have to go through the OS. I still have to use function calls when I should be using automatic streaming.
I forgot to hype one more thing. Since you have all the tools to define even your primitive types in this platform, this means that you can use it for any kind of hardware in existence. You can even use it for things that don't exist yet. So if you have different hardware, you can make them all work together from the same application. Or you can port your software. Or you can have a new independent software development environment just for this new hardware. It doesn't matter if you have an 8bit machine or a gazzillion bit machine. It can adapt to anything by providing a few basic components from which all others are built. You can have more complex components that take advantage of specialised hardware too such as vector operations.
It's still funny that creating this new platform is not my main problem, but rather working around existing technology such as the OS. Anyways, now the fun begins. Even the assembler instructions fit in as components directly into the platform. Want to use a different computer? No problem, swap out the old instruction set with the new one as well as the components for access to the system and you're done. Cross-compiling will be much more common and far easier. I predict it will become a necessity.
Multi Processing
I was reading another blog. Forget where. Some guy that used to work for Amazon I think. He had a list of 10 predictions in computing. He didn't put much emphasis on them. He just wanted to put out predictions for the sake of getting a handle on what's going on in the computing industry. At least, that's what I understood. In any case, one prediction was that threading and concurrent programming would fall out of favour. His contention was that humans think sequentially and most programmers just don't understand threading. He does have a point. I agree with it to some extent. No, I agree completely. While I agree with his reasoning, I don't agree with his conclusion. It's like saying most people don't understand how a car works, so driving will fall out of favour. Note that I don't critique the guy. His purpose for the predictions was well understood. What I want to follow up on is that I believe many others believe this reasoning as well.
So will threading and concurrent programming fall out of favour? There are real opposing forces in action here. One is that programmers think sequentially. Another is that the free ride is over. Faster speeds are now achieved with multiple cores. Like it or not, multiprocessing is here to stay. Someone will have to write for these machines. We all will. I still remember a long time ago, a friend of a friend bought a dual CPU PC. It was back around 1995 I think. Maybe even earlier, I'm not too sure. In any case, it was the first time I'd heard that someone actually had one of these PC's. When asked if things really ran faster, he said that it did a little, but more than anything the OS had a CPU all of its own. Basically, most of it went to waste unless he was running multiple processes. But again, humans are sequential creatures. We only do one thing at a time. So running multiple applications at once usually don't get used at the exact same time.
I think this is also what set in the mentality that multiple CPU's don't equal linear speedups. My personal take is that this is also a flawed reasoning. Having two motors isn't going to make your car faster if you can't actually put it in your car. Even if you could, it wouldn't make it go faster if it's the same make. They would both have the same top speed. However, you could put the other motor in another car and have both cars travel a total distance twice what you could normally (even if the maximum distance remains the same). So unless applications are built to use multiple CPU's, of course we won't get linear speedups.
Another problem is that manually inserting commands to use multiple threads is WAY too low level. You may as well be using assembler. It's that low level. A thread is a hardware concept. It's a way to control the timer interrupt and other interrupts, even if in an indirect way. That's as low level as it gets. Even in assembly, interrupt programming isn't something that you do all the time, if at all. Yet in high level languages, let's explicitly control thread creation. Not only that, but let's explicitly control memory usage. Memory usage should be handled like timing quota for threads. Upon startup, you can use a lot hoping that you'll terminate quick. But after that, your memory gets swapped to disk and any new requests will fail until you lower your usage. If there's enough free memory, then fine. But if there are other applications running, you should not have the right to steal memory in the same manner that you shouldn't be able to steal CPU time from other applications except under user intervention or special cases.
I don't know what it is. Maybe the computing industry got hit on the head and no one knows what's going on. The more I look at how things are done, the more they appear backwards. It's like I went back in time to when the pyramids were built and found out that they carried rocks along the Nile in boats. But they put multiple boats inside one another because multiple boats have to be faster than just one boat. If programmers of today were to build the pyramids, this is what they would come up with. Rocks are data to the pyramid builders. They need to be processed. In those days, they used whatever resource they had to get the rocks to the pyramid base. Whatever boat arrived, this is where the rocks would be loaded. The same should apply in computing. Whatever processor is free, this is where the data should be processed. We're too stuck in thinking that it's the CODE and functions that should be split up when it's in fact the DATA that should be split up. We're splitting up the workers and sending them to the rocks when we should do the opposite and send the rocks to the workers at the actual pyramid.
In computing today, we have a serious flaw in not only how we design software, but in specifying what our actual data is. Why is it that when it comes to processors, all of a sudden we think execution is a tangible thing that can be sent? All of a sudden, it is our application that becomes the data. This is a seriously flawed notion that needs to be dispelled. No matter if we have one processor or a million, what we process is still the data. Data that needs to be processed should be sent to whatever CPU is free that can do the work. We must stop thinking that it is the code that needs to be split up. If we used programming techniques in the real world, we'd all be dead. At the very least, we'd look like morons. And proof of this is shown by the quality of our software overall.
We have to never lose sight that the data is always what needs to be manipulated. Never the code. If a programming tool forces us to break this rule, then we must throw out that tool. I think I just threw out the entire computing world.


Anonymous # 14. June 2006, 14:42
> So it's the chicken and the egg problem.
> ..other than to build a simulator to kick
> things off. After which time,
> it will be ditched.
I agree..
1) Construct a fake chicken
2) Persuade the fake chicken to lay a real egg
3) Hatch a real chicken from the real egg
or alternatively..
co-evolve the chicken and the egg (hard!)