More on Project V's Type System
Thursday, 10. July 2008, 05:11:32
Anyhow, one thing I learned quickly is that a type system on paper is quite different than one that has been implemented and put to use. In any case, now that I have type comparisons nearly done completely, I thought I'd share some details. I'm also writing this now to make sure I didn't miss anything. It's better I do that now than realise later I've implemented something that can't be easily changed. Note that implementing the type system is by far the biggest obstacle out of all this. I could have done something simple, but apparently I'm a sucker for punishment.
Ok, onto the details.
How do I best explain this?
A type is made up of an ID along with runtime and design time attributes. It also has a human readable name, but is not used by the compiler. It's for humans to better recognise what they are working with. It may also help when using other languages since you can replace the names.
There is no need to use design time attributes. You could define an ID for every single type and be done with it. But that'd be too easy. No, I had to implement something totally ridiculous.
If you look at an Integer for example, you see that it has three properties. Size, Signed and Endian. So an Integer type has those three properties that you can define as you wish. If you want a 32bit unsigned Integer, you simply set Size=4 and Signed=false. Size is in bytes. I was thinking of using bits, but there are valid reasons for not using it. Keeps the storage size consistent with the string type for example or other data structures.
The collections of attributes are called sets. You can define as many sets as you'd like for any given type. It just so happens that two sets come predefined called Properties and Members. Properties is a design time set and Members is a runtime set. There are flags for each set. A design time set will change the meaning of the type. Runtime sets are used to define structures like in Java or C/C++.
There is also a flag to tell if the type is defining a value or not. An Integer type would obviously be defining a value.
To assign a value to a design time property, you derive the type and override the property in question. Project V is all visual, so there's no such thing as a textual representation, but I'll write down what such an example could look like if it existed. I'll leave out the Endian property for this example.
type Integer
{
:Properties(DESIGNTIME)
Bool Signed;
UInt32 Size;
IsValue = true;
};
modifier UInt32 : Integer
{
:Properties(Integer)
Signed = false;
Size = 4;
};
UInt32 myValue = 10;
// The above is exactly the SAME as the following
instance myValue : UInt32
{
Value=10;
}
IsValue and Value are special predefined attribute that do not reside in a set. IsValue is used to define if the type is a value or not. Value is used to hold the binary data for that type. Note that the above is exactly how UInt32 is defined in Project V. Also note how UInt32 is also used inside the Integer type as a property type. This is all perfectly legal.
So yeah, there are three kinds of entities. Types, modifiers and instances. Instances are when you declare a value or a member. Types define what is contained in those instances. And modifiers are used to override attributes that you might want to reuse a lot. For example, you wouldn't want to keep specifying the attributes for a 32-bit integer, so you just define a modifier to hold those attributes and use that instead. Modifiers (this includes instances) CANNOT define attributes of any kind. They can only override them.
So far, that should explain the majority of the type system.
Just a few clarifications. You may have seen that the Properties set in UInt32 had the Integer class in brackets. This allows you to specify what set and what base type you want to override. This is needed because you can use multiple inheritance. Only types may use multiple inheritance. Modifiers must use single inheritance. There is one exception. If you define an instance (which is itself a modifier), you can only use single inheritance until you reach a parent that is a type (and not a modifier). After that point as you go higher into the hierarchy, you can use multiple inheritance as you please. If a modifier has more than one parent before that first type is reached, then all secondary parents are ignored.
Yes, you can create crazy hierarchies with this. Any recursion is stopped just before it happens. So if any chain loops back on itself, the type system will pretend the type chain ends just before the loop happens.
Another interesting fact is that an instance is a derived class. And a value like 4 is again a derived class of the instance in question. That means that 4 implicitly overrides the special Value property. So 4 can be considered a modifier as well.
To finish, I'll show an example of how to define a boolean value with this type system.
First, we know we need two values. One for TRUE and one for FALSE. This is basically an enumeration. And there is a type called Enum to define enumerations. The Enum type also has a set called Options where you put in all possible options. This is where we'd dump TRUE and FALSE. But what would the actual values be for TRUE and FALSE? We could define our own binary format, but we already have Integers that would do just as good. So we'll base our boolean type on the Integer type.
type Bool : UInt32, Enum
{
:Options(Enum)
Bool FALSE=0;
Bool TRUE=1;
}
Voila! That's a boolean type. The Options set limits the choices to TRUE and FALSE only. And here we see the use for multiple inheritance. Secondary types add meaning onto the primary type.
I get a kick out of seeing this type used as a property in the Integer type. Which came first? Doesn't matter. This is all legal. The boolean type is completely built within the type system exactly as it would be done by any user of Project V. It just so happens that I made the built-in Integer type use it for its Signed property.
It should be said that most users may never use most of these features. They're there if you need them. Most everything is visual which means that most, if not all, of these details will be handled for you.
Ah yeah, forgot to mention that the list of base classes is also a set. It's called the Base set. Again, nothing special to implement type hierarchies. I just used an existing feature.
And yes, this type system is up and working right now. It just has limited functionality as we haven't gotten to actually performing ACTIONS. Actions are done in components. I'm going to start getting those working and interacting in the GUI soon. That's where I'm at now.
In any case, components are just like everything mentioned above. It has its own base class called Component. Just like Integer, there are a few basic built in components. You can build your own built-in components just like you can build your own custom primitive types. Or you can just build on top of what's already there. Components have a few predefined sets (from the Component type). These are the Input, Output and Connections sets. The Members set is used for internal components.
Just like types and values, there are definitions and implementations. A component type is a definition while a component instance is an implementation. Components require a default implementation for each definition. Beyond that, you're free to customize as you please. You can override your entire application if you wish and specify exactly what components you want to use. Or you can use the ones defined by your machine. In essence, your application consists of default implementations that get overridden by anything your machine recognises. If the default implementation is something the machine doesn't recognise, it can use a variety of options. I haven't decided yet. Maybe it could run both the one in your application and the one defined on the system at different times and see which one is best during what conditions. That's for the future though. With the Internet, it'd be easy to gather profiling data on every component you could possibly think of. This way, your computer could continuously improve its performance on ALL its existing applications. It would also give an advantage to those who use standard components when possible instead of custom ones.
Anyways, there are plenty of other details under the hood, but that should be good enough to understand the basics of how I decided to build this up. If there are any glaring decisions that I should rethink, let me know now.
Hopefully I've cleared up some unanswered questions. And for portability, that's a bigger issue because of the OS and all that. But overall, it shouldn't be too bad as there will a specific set of components that need to be implemented (basic knowledge by the machine has to start somewhere). All of a sudden, that machine can run most of everything else. At least computationally, it'll be able to run 100% of it. I'll have it working on Windows and Linux for sure. I'll see about other platforms later on.
Also note that you don't even need to use Project V's type system at all, though you could interact with it. You can set up your own type system and set of components just as long as there's a way to translate and connect the data. You can do this within Project V or externally via your own code. So with a library, you can use everything at Project V's disposal from your favourite language if that's what you want to do. There will be a C library to start off. You'll be able to define built-in types and components in C if you'd like. This is how I defined Project V's built-in functionality. So you could swap out what I wrote and completely replace everything with your own system. Or better yet, you can use both systems together. And like I said earlier, you can build applications the standard way with your favourite language and use the functionality you want from Project V by using this library.
Later, I'll build the compiler for it where your application will be a standalone binary. I have to make sure to point this out because some people still think this will be a VM of sorts. No such luck.
I still hope to eventually move onto more complicated topics once people have seen this in action. Design time configuration panels for your components will be a cool feature. Tools within the GUI will be nice. Components that can update itself is also one of my favourites. This is the feature that allows a component to add more input connections when all are taken up such as would be the case for Addition components. Of course, the internal network for that component would also need to be updated, but I've explained above how this would be done by updating the Members and Connections set. And adding another input is done by adding another entry in the Inputs set. There's also a way to interact with the GUI, type system and compiler. So eventually, you'll be able to configure whatever you want. I still don't understand why people don't see the magnitude of this, but maybe it's more impressive because I'm the one coding it up.
That's it for now. I didn't see any obvious errors in my system, so I accomplished my goal. To keep you busy, you may wonder how recursive definitions of components are done... or if you can pass components around as data... all while having connections attached... or how to duplicate components at runtime in order to achieve more parallelism... or even how packets of information are transferred, etc.


Amazing!
By anonymous user, # 10. July 2008, 06:53:39
Also, how does your system deal with the following:
modifier E1 : Integer
{
: Properties(Integer)
Signed = true;
Size = 0;
};
modifier E2 : Integer
{
: Properties(Integer)
Size = 3;
}
modifier E2 : Integer
{
: Properties (Integer)
Size = 4294967295;
};
I'm also not sure of the significance of isValue. What happens when it's set to false?
By spc476, # 11. July 2008, 08:32:37
E2 would change its size to 4 or whatever the system likes to use. I didn't talk about another feature which is the Minimum and Maximum properties in the Enum type. It sets what range of values you can use. So a size of 3 would limit values to whatever values 3 bytes allow.
Also, there is a system in place to substitute whatever types you are using with native ones as long as the functionality of components remains the same. I suppose I'll have to add a property to tell if integers can be replaced by a wider size or not (modular arithmetic, yes or no) (floating point would need this too). Saturation will also be in play.
The last E2 would not be allowed until such time as we have that kind of memory available. You can define the type, but you would not be able to assign a value to it. The GUI and compiler will give an error.
IsValue to false means that you are defining a structure. That's where the Members set and other runtime sets would come into play. If IsValue is true and you have those sets as well, then you will have both a value and a structure. I don't recommend this, but I've seen it used with business records.
IsValue only needs to be set in the original base type to specify that it is defining a value. After that, only set IsValue to true if you are redefining the value itself. This allows you to override the value and even undefine if you want. If IsValue is false, it ignores the Value completely for that entry.
For binary data such as binary files, you'd use serialization features to handle that. This deals specifically with converting binary data to Project V data types and back. This is how data transfers between distributed components work. If you look at JPM's book or website, it's what he calls IP (Information Packets). You don't need to use control packets if the format is known in advance. You can also create a component that dynamically converts the data. Heck, you can create a grammar for binary data just as you would textual data. Add in components for action routines and you're set to go.
Oh, about using types before they are defined, it does not work like in textual programming languages. You create everything as you go. And you can override safety precautions (the Options set always override those precautions otherwise it'd be too annoying to work with). So you would build all your types with nothing in them except their ID and names. After that, you can link them together any which way you please. There's no need to build an entire type all at once.
For Integer, I create the "shell" of that type (internally called an entry). I created one for each of Bool and UInt32. Gave them their ID's and name. And then I created the properties for Integer (Size and Signed). With UInt32, I gave it a parent type of Integer and then overrided its properties by adding a value of 4 and false respectively. With Bool, I set its base type to UInt32 and created two instances with the value 0 and 1. I added a secondary type of Enum and dumped the two instances in the Options set. I could give the two instances values because they had not yet been given a parent of Enum. Even so, the GUI is able to circumvent this restriction if needed. Anyhow, at this point, I had all my types built including the Integer type where the values are handled internally.
BTW, one thing that could cause problems is the fact that Integer type uses UInt32 which is itself derived from Integer. This could be a problem, not because it is used before it is defined, but because if you check the type before reading the Size property, you have infinite recursion. The compiler section that handles the Integer type does not look at the type of the Size property. It knows how it is built, and will already know that it is 4 bytes long and is unsigned, so the value is read directly.
Everything just described is available to anyone using Project V. Types only need to be valid once you use them for allocating the values they define. Before then, you can do what you want. You can also leave values undefined for a while until you finish defining its type.
By Vorlath, # 12. July 2008, 15:53:20
It is good to see design posts again!
You have rarefied the thoughts on type systems, and have come up with, what I consider, the only consistent and complete typesystem possible. In my limited experience, I have seen no one else design such a clean system; although I find it hard to imagine that it has not been done before.
Do not let the Well-Foundedness crowd get you down, you will probably get a lot of distain for circular dependencies. As you have already figured out: Circular dependencies can be perfectly consistent and provide a clean typesystem; there is not problem with compiler just “knowing” about these circular definitions and implementing then appropriately.
One final thought: I believe you will find that hierarchal chains that loop back on themselves CAN be allowed. The compiler can traverse the loops to find a fix-point, and the fix-point will always result in a set of concrete types.
By anonymous user, # 14. July 2008, 00:39:22
Thanks for the vote of confidence. Truth be told, it was the only setup that could work. I've tried all sorts of other variations without much luck. As I said in the article... it looked good on paper, but once you try to implement it, it's a whole new ballgame. I really like the way the current system turned out though.
If I got down easy, I'd have given up a long time ago. I still remember when I thought saying my system was not well founded was some kind of negative remark. Ah, those were the days.
I'm not that big a fan of circular dependencies. I think they can get real ugly if you don't watch what you are doing, but it's either that or create a built-in, non-describable primitive type to bootstrap the system (such as what we find in C++, Pascal, Java etc.). I find that much uglier. In fact, this was one of the very early design from LONG ago that could not work for what I wanted to accomplish. It was the failure of that system (and how to fix it) that led to the discovery of the main concepts behind Project V.
I leave the original hierarchy intact, loops and all. I was talking about the compiler and GUI only. They create a copy of the type tree, break all loops just before looping back to the fix point and work with that. The loops don't matter. It's the members and properties and the overrides of those that matter.
The real reason that you can have loops is because I believe you should be able to organise your data any which way you please. You don't need to use types JUST for data. You can use them for all sorts of things. Specifically, graph theory. You can't have graph theory within a type system that doesn't allow loops.
I don't know why, but I really like that the entire type system for any and all applications is one huge graph. You just keep adding to it and different parts of the compiler will extract what it needs from it. That's how the compiler and the GUI works. It creates a duplicate of the section of the network it wants and works on that (then usually ditches the duplicate until it detects a change in the matrix! HAHA Just had to say it).
By Vorlath, # 14. July 2008, 04:37:27
By Vorlath, # 15. July 2008, 08:07:23