Too Much Power
Monday, 5. December 2005, 01:24:54
This is just a personal account of certain things about my new programming environment as I work on it. These aren't part of the usual flow of articles I produce.
So I decided to write down some of the basic aspects of my new programming environment. I decided to put in some operators and some other things. It ended up looking a bit like LISP. I only did that because humans will never actually see the code in this format. It just makes it easier for the compiler to process the input.
So the compiler will be written in itself. It's a really strange concept. At first, I'm going to use a custom parser and compiler. I'll keep it simple. Once I get something to compile, I'll add more of the basic concepts. Once I get all the operators that common languages are used to, I will start to write my own high level language on top of this. Once I get this done, I will port the compiler into this higher language. I will also port this higher language into itself. Then I'll be set to release it. Very strange sequence of software production. However, I've done this before with my parser generator where much of its code is produced by its own output.
Since I couldn't decide on an actual name, I decided that I should at least have a codename. I decided to call it Tenkuu for now which means ether or firmament in Japanese.
The one thing about Tenkuu is that it's just too damn powerful. There are side-effects of the way things are done that I did not expect. For example, code can be swapped around just as data is done. But code has a special property in that it can be executed. Not only that, but code can execute and return more code. So there's a method of excuting code and a method of storing code. There's another feature that popped up too. If you pass code back, is it locked to the class it came from (ie. it simply executes as if it was still in its own class), or does it execute with access to the variables and classes where the current execution point is at currently? I think I will add both options. Haven't quite figured out how to state these options yet.
I think I will use something similar to derefencing pointers for code. If you dereference a block of code, you execute it. But if you use simple assignment, it copies the code unexecuted. In the higher language, it will be a different syntax of course because it would confuse the people used to C++ and Java.
Anyhow, I had no idea how powerful this concept was. So many features. I'm adding some basic types for the higher level language, but the core of the environment has none of this. It's up to the higher language how to interpret all this core code. In the core, I have included some basic types (in code) and some operators that act mostly as seperators. There is only one builtin type and that is a string. You can't do anything with it other than to store a string or sequence of characters into a variable. You cannot operate on it. To do that, you create a new class and start coding it up as usual. I just needed one basic atom to get things started. But it gets redefined immediately.
The above paragraph may seem confusing. The core of the code does have some minimal types. The most basic is the string on which everything else is built with. But these are not types in the higher languages. The types in the higher languages depend entirely on how this higher language interprets certain classes. So if it sees "int32", it could use it as a normal 32 bit integer. However, this integer has properties in that it's 32 bit wide, it's signed, etc... So in this way, it can also act as an object. It all depends on how it's used. So the type concept is a little blurred. But luckily, you can tell the compiler exactly what you want. You can tell it that you want a real 32bit signed integer in RAM and it will do that. But this is done because YOU told it to do that. It's in your source code. Not the compiler. The compiler will have code that will do this for you to make it easier as you don't want to deal with such mundane tasks.
So that's why on the one hand, there are types and on the other there are not. It's hard to explain until you actually see it. In any event, in the higher level language, you can code in the usual manner without any of this confusion. I, on the other hand, will create something much more powerful than simply a text oriented language.
I know there are other languages out there, but just looking at this for a few minutes and I'm very optimistic about its outcome. I'm just thinking out loud here, but you can add regular expression as in Perl inside the language itself. You can add things such as math symbols, derivatives, integrals, limits and all sorts of other sorts of equations. You can add new features directly into the "language" such as XML processing or whatever else you can think of. Oh, SQL and database handling. This can be put directly into the language as well.
Another weird thing is that classes and functions are the same thing. You treat them both the same way. Or at least you can. You can do cool things like set default values for parameters at runtime and pass this function around. You can also have sub-functions. These are functions within functions. These can also have classes and other objects within them.
There's just too much here. I did think about many of these, but seeing them all in front of your eyes is very cool. I have underestimated how long it would take to write this though. Luckily, I have my own parser generator available. So hopefully, it won't take that long. I do know that it will be in January at the earliest before I release anything.
For the higher level language, I am taking suggestions. This part of the language will work more like what you're used to. It'll have some very powerful features though. I know what I want and I'd also like to hear about features that you like in other languages. I'm definitely thinking about including power loops. These are very powerful. I'm going to include regular expressions for sure.
OK, so I'm off. Last bit is to solve the execution locality of "pointers" to code. Yes, there'll be pointers, but they will be bounded. You can set their range. For example, you can tell the compiler that you want a pointer that reference an array but is only allowed to go from item 10 to item 19. Anything outside that range will generate some kind of error. Not too concerned about errors just now. Will concentrate on getting something to work first.
Edit: Just adding a note for myself. If statements aren't done as you're used to. An if statement is basically a function with three arguments. The first one is the condition. If it's true, the function returns the block of code in the second argument, otherwise it returns the block of code in the third argument. OH yeah, I'm also going to expand the break statement. I hate that I can only break out of one level of a loop. I should be able to break out of two or three or all loops.
So I decided to write down some of the basic aspects of my new programming environment. I decided to put in some operators and some other things. It ended up looking a bit like LISP. I only did that because humans will never actually see the code in this format. It just makes it easier for the compiler to process the input.
So the compiler will be written in itself. It's a really strange concept. At first, I'm going to use a custom parser and compiler. I'll keep it simple. Once I get something to compile, I'll add more of the basic concepts. Once I get all the operators that common languages are used to, I will start to write my own high level language on top of this. Once I get this done, I will port the compiler into this higher language. I will also port this higher language into itself. Then I'll be set to release it. Very strange sequence of software production. However, I've done this before with my parser generator where much of its code is produced by its own output.
Since I couldn't decide on an actual name, I decided that I should at least have a codename. I decided to call it Tenkuu for now which means ether or firmament in Japanese.
The one thing about Tenkuu is that it's just too damn powerful. There are side-effects of the way things are done that I did not expect. For example, code can be swapped around just as data is done. But code has a special property in that it can be executed. Not only that, but code can execute and return more code. So there's a method of excuting code and a method of storing code. There's another feature that popped up too. If you pass code back, is it locked to the class it came from (ie. it simply executes as if it was still in its own class), or does it execute with access to the variables and classes where the current execution point is at currently? I think I will add both options. Haven't quite figured out how to state these options yet.
I think I will use something similar to derefencing pointers for code. If you dereference a block of code, you execute it. But if you use simple assignment, it copies the code unexecuted. In the higher language, it will be a different syntax of course because it would confuse the people used to C++ and Java.
Anyhow, I had no idea how powerful this concept was. So many features. I'm adding some basic types for the higher level language, but the core of the environment has none of this. It's up to the higher language how to interpret all this core code. In the core, I have included some basic types (in code) and some operators that act mostly as seperators. There is only one builtin type and that is a string. You can't do anything with it other than to store a string or sequence of characters into a variable. You cannot operate on it. To do that, you create a new class and start coding it up as usual. I just needed one basic atom to get things started. But it gets redefined immediately.
The above paragraph may seem confusing. The core of the code does have some minimal types. The most basic is the string on which everything else is built with. But these are not types in the higher languages. The types in the higher languages depend entirely on how this higher language interprets certain classes. So if it sees "int32", it could use it as a normal 32 bit integer. However, this integer has properties in that it's 32 bit wide, it's signed, etc... So in this way, it can also act as an object. It all depends on how it's used. So the type concept is a little blurred. But luckily, you can tell the compiler exactly what you want. You can tell it that you want a real 32bit signed integer in RAM and it will do that. But this is done because YOU told it to do that. It's in your source code. Not the compiler. The compiler will have code that will do this for you to make it easier as you don't want to deal with such mundane tasks.
So that's why on the one hand, there are types and on the other there are not. It's hard to explain until you actually see it. In any event, in the higher level language, you can code in the usual manner without any of this confusion. I, on the other hand, will create something much more powerful than simply a text oriented language.
I know there are other languages out there, but just looking at this for a few minutes and I'm very optimistic about its outcome. I'm just thinking out loud here, but you can add regular expression as in Perl inside the language itself. You can add things such as math symbols, derivatives, integrals, limits and all sorts of other sorts of equations. You can add new features directly into the "language" such as XML processing or whatever else you can think of. Oh, SQL and database handling. This can be put directly into the language as well.
Another weird thing is that classes and functions are the same thing. You treat them both the same way. Or at least you can. You can do cool things like set default values for parameters at runtime and pass this function around. You can also have sub-functions. These are functions within functions. These can also have classes and other objects within them.
There's just too much here. I did think about many of these, but seeing them all in front of your eyes is very cool. I have underestimated how long it would take to write this though. Luckily, I have my own parser generator available. So hopefully, it won't take that long. I do know that it will be in January at the earliest before I release anything.
For the higher level language, I am taking suggestions. This part of the language will work more like what you're used to. It'll have some very powerful features though. I know what I want and I'd also like to hear about features that you like in other languages. I'm definitely thinking about including power loops. These are very powerful. I'm going to include regular expressions for sure.
OK, so I'm off. Last bit is to solve the execution locality of "pointers" to code. Yes, there'll be pointers, but they will be bounded. You can set their range. For example, you can tell the compiler that you want a pointer that reference an array but is only allowed to go from item 10 to item 19. Anything outside that range will generate some kind of error. Not too concerned about errors just now. Will concentrate on getting something to work first.
Edit: Just adding a note for myself. If statements aren't done as you're used to. An if statement is basically a function with three arguments. The first one is the condition. If it's true, the function returns the block of code in the second argument, otherwise it returns the block of code in the third argument. OH yeah, I'm also going to expand the break statement. I hate that I can only break out of one level of a loop. I should be able to break out of two or three or all loops.
For example, I should be able to say "I want to store an integer in this variable, and it will be between X and Y". In some cases it would come out as int32 signed or unsigned depending if X was less than 0 or not. Any attempts to assign an integer to the variable that is outside of its valid range would throw an exception. The variable should also "remember" its constraints as it is passed around.
Of course, I like your idea of being able to force it if you know better; but the compiler should ensure that all the properties you define "agree" with what you forced so you can't go ahead and say a variable is an unsigned integer and it may store a number between -3 and 7.
By dkubb, # 5. December 2005, 07:28:59
And you're right, this would make it much easier for the compiler to know what size data you need instead of using the default 16 or 32bit integer. It would also help for porting. Especially from 32bit to 64bit or even to 16bit or 8bit computers, it could more easily tell what ranges are necessary for its integers and avoid overflows or waste of memory.
By Vorlath, # 5. December 2005, 19:45:15
I'm pretty sure java is compiled with java, by the way
Cheers - Rob
By blitter8, # 5. December 2005, 22:28:28
By Vorlath, # 6. December 2005, 00:38:09
For the higher level language, there will definitely be threading and multi-processor constructs and other utilities. But I'll leave that to the hardware experts to build much of those libraries.
I think right now, I'm interested in knowing what tools you use the most in your language. That's why I liked dkubb's suggestion so much. It'd be so useful for testing and everyday use.
The thing to remember is that in my language, anything can be added. Primitive types, primitive operators, new concepts, threading, keywords, anything at all is written as libraries and not directly part of the language. However, I want a basic set of tools to be available right from the start.
edit: deleted stuff about XML. Already in article.
By Vorlath, # 6. December 2005, 01:16:54
Ada is a powerful language, and is worth a look. Probably nothing like what Vorlath is planning, but very useful in the same domains C and C++ are used. Maybe some of Ada's type system ideas could be useful for this language, though?
Vorlath - It sounds like you've got an interesting set of features planned for your language. It might be wise to step back from the features a bit, though, and get a semi-formal descripton of the concepts and computing model written down. You did mention a lot of things about it here, but not in a way that made much sense to me.
A lot of the things you've mentioned also sound similar to concepts in the literature of programming language research, but the way you talk about them makes me wonder if you're familiar with the common terminology in the literature. Using those terms would probably help people who are interested to get a better understanding of what you're trying to do. If you're not familiar with the literature, a crash course would probably be very helpful as you work on designing and implementing your language.
Let me know if you'd like some relevant references.
By lpearson, # 6. December 2005, 06:45:05
I decided to make a blog entry for this reply as it was getting too big.
I'll quickly mention a few things though. There is no computing model. The whole reason for doing this was to remove what has been holding computing back. I've done away with it. In the higher level language (built on top of the core), you can of course create a computing model that you're familiar with and use that.
The fact that the core "language" doesn't actually have anything concrete is why it's difficult to come up with terminology. The only thing that exists is groupings. Not classes, not lists, not arrays... it simply groups things together.
I've looked around and have read a great many books on programming languages. I've been interested in programming languages for over 10 years, and I've yet to see anything resembling this. Paul Graham is the only one that skims over it when talking about the future. Yet, at the same time, I must admit that this is nothing new. I think you could achieve the same with XML if it was handled differently. XML can group things together, so that's all you need.
By Vorlath, # 6. December 2005, 18:59:55