Fifty Questions for a Prospective Language Designer
Monday, 10. November 2008, 02:31:09
The questions themselves tend to go back to contemporary features. So if you are creating something new, these questions don't apply. Yet, I thought it would be interesting to see how ridiculous this can get.
What need are you trying to fill? Don't fall into the trap of “a scripting language”, because they always turn into general-purpose languages.
Implicit multicore and concurrent software development.
Also having your computer learn new thing the more it runs different software.
Right now, your computer is perpetually dumb. After quitting a program, it forgets
everything it knew how to do and even when running, it doesn't share its knowledge
with other software.
What's the metaphor?
Data networks.
Even though you might not be trying to build a “pure” language, it's worth having a model for the core language, such as “imperative, block-structured” (C), “object oriented” (Smalltalk), “generic object orientation” (Lisp), “functional” (ML), “lazy” (Haskell), “logic” (Prolog), “production system” (OPS5), etc. These different core models influence the “natural” styles of program development in different languages even if the set of available facilities is similar. They also help define which late-arriving features will “fit” and which will be warts.
Dataflow Programming with a twist. Dataflow usually requires a contemporary language to implement the lowest level components. That is not necessary or even recommended in Project V though it certainly is possible.
How many programming paradigms does your language support?
Dataflow is more fundamental than any programming paradigm, so you can use any style of programming you wish on top of it, or even for low level implementations. But overall, dataflow is what you will be using as it is more flexible and powerful than anything else out there. Just the feature of pluggable components makes all other paradigms weak by comparison.
How tightly are they integrated?
Unlike most programming paradigms, dataflow programming is a fundamental approach to computing (the other being imperative). This means you can build whatever programming paradigm on top of this and integrate it directly into the development environment. Programming paradigms can be created by the developer. You don't need a new programming language to use it and "tight" integration loses its meaning here. Note that other programming paradigms would restrict the power available to the developer and is not recommended.
Which other paradigms can you integrate with the built-in facilities?
All of them.
How natural is the syntax of user-defined extensions?
There is no syntax unless you use a scripting component. And if you do, then it's your choice what scripting language you use (depending on what languages are available as components at any given time).
Instead, there are design time components that let you interact with every part of the IDE and compiler. Events, display, compilation nodes, etc. are all at your disposal.
Many problems are much better suited to some non-standard programming model than to the usual object-oriented/functional approaches. For example, constraint languages allow a very concise description (and efficient solution) of many optimization problems. Dylan supports functional and object-oriented programming in a tightly integrated manner, but it offers no support for non-deterministic programming, constraint-solving, etc., and not much support to add them to the language. If you have first-class continuations in the language you can add one additional programming model that requires non-standard control flow, but, in general, different extensions based on call/cc don't work together.
If you're looking for implicit concurrency and distributed computing, then Project V is what you want. But it can handle anything else you can think of. If you're more comfortable in those other models, there's nothing stopping you from building it.
Is high performance an issue?
Yeah, but I'll optimize it later. This issue is much more complex than register allocation and execution paths.
This says something about whether you want to implement an interpreted, a VM-based, or a natively compiled language.
First version will have a runtime engine. Some will want to call this a VM, but it has no concept of memory management and no concept of opcodes. It simply coordinates activity between components and assigns tasks to certain cores or remote machines.
Second version will allow native compilation. Both versions can work with each other and in fact, most programs will be a hybrid with dynamic compilation.
Is high programmer productivity an issue?
That's the main reason I'm creating Project V. All other languages are too slow for me. I can't input what I want done fast enough. The programming languages get in the way.
How important is this with respect to performance?
There isn't really a trade-off. We need development environments that allow for distributed computing. So it doesn't matter how slow any of this is right now performance-wise because it'll only get faster.
This decision can affect how you store values and do function calling.
Uhhh... Project V doesn't use functions, so it won't affect something that doesn't exist.
How portable across platforms do you want the language to be?
There is no way to quickly explain how ridiculous and shortsighted the word "portable" is once you see how this is gonna work. Just as a hint, I'll tell you that you'll be able to use specialized functionality that's only available on that specific machine when it runs on it, no matter the machine.
This will relates to whether you want to compile to a VM or to machine code, and to how well you support native libraries.
VM's can't do portability like I want, so they're out. Machine code and native libraries will be used, but it's an initial implementation decision.
It will also affect library design for such things as graphics and GUI tools.
Well, yeah. But only in the sense that it'll actually be logical event-wise. You won't need continuations or keeping track of states between events anymore. The reason is that you can design a sort of state diagram where the events choose the next state to go to and use that as is.
Do you want easily distributed executable code,
Absolutely. Need it if I'm gonna have concurrent software.
i.e., do you you want to allow code to be easily transmitted across networks and run elsewhere, as Java does?
I hope the author of the question isn't talking about RMI because that is a colossal failure and I've seen nothing in Java that makes this easy.
Picture this. Project V can use different machines, pass code around and execute it natively on each of those machines. Java is unable to do this and the fact that it uses a VM actually makes this impossible because the set of opcodes is fixed. Something developers need to understand is that VM's are unnecessary and actually make certain things impossible.
Do you want to provide built-in support for remote execution, like RPC/CORBA/RMI?
These are all based on return to sender protocol and are inherently flawed. They usually require the functionality to already be available on the remote machine. Also, it requires you to KNOW there is a remote machine. The point of Project V is to be able to use a cluster of machines and let the software figure out how to communicate with each of its nodes. You can configure parts of it if you wish, but it's high time we forget about using these kinds of archaic hardcoded protocols.
If you are writing for a VM, this can simplify some of these issue considerably.
This question shows a lack of understanding on how computing is done. VMs are incapable of doing many of the things that Project V takes for granted. I've already mentioned a few of these things.
1. no opcodes
2. portability through specialization
3. automatic distributed computing
4. ability to learn
And the list of things VMs cannot do is MUCH longer than this.
What about debuggability? If you plan to compile it, you need to think about how to store debugging information.
It's called dataflow. So I just need to show where the data is. And because it's called data flow, it already requires to store information about how the data is stored. So displaying that information should be a snap.
How do you want to bootstrap it? This, too, says something about what kind of back-end you might build. Perhaps you build a tiny VM in C, then compile to C. This way, you avoid fun but time-consuming work on code generation for modern super-scalar hardware, register allocation, etc.
At first, there'll be a runtime engine. Sorta like a VM, except no GC, no memory management, no resource allocation, no JIT compilation, no bytecodes. Essentially, all it takes care of is that the network is created and that each component that has all its inputs available is executed. That's it.
Do you want to be able to catch type errors early or late?
Usually early, but is not required. If there is enough information, then the IDE will trigger an error. If there isn't enough information at design time, then the developer will need to take care of it at runtime.
That says something about your type system (whether you require that all types be statically declared at compile-time, or allow them to be dynamic, or have a hybrid scheme like Dylan does). In addition to the obvious effect on performance, this decision will affects your memory model in that completely static systems do not require tags or boxing.
Well, for dataflow, it only depends if you're using static or dynamic connections between components. Static connections can be merged into one uber component while dynamic components cannot.
Will variables be associated with explicit type declarations?
Yes.
If yes, will these type declarations be required or optional? - If optional, will the language use inferencing to supply unspecified types, or simply use an all-purpose type (like Object or 'any')?
Type declarations are required, but you can use all-purpose types as well. And when connecting components, types can be inferred if one end of the connection has already been declared.
Will the language have any run-time type discrimination/checking at all, or will types be completely statically determined?
Again, if there is enough information, the type checking is done at design time, otherwise it will be done at design time.
Some languages considered statically typed still do some run-time checking, such as Java.
Not really my problem.
Will any type checking happen at compile-time?
Yes.
Some languages with explicit type declarations don't always check types at compile-time, such as old Visual Basic.
Alert Microsoft immediately.
If you allow type declarations, you will want to think about whether you want parameterized types.
Actually, parameterized types only show a lack of power by your programming language. I allow the use of properties which can change the meaning of your type. This is much more powerful than anything a parameterized type can do.
If you go whole hog with, say, F-bounded polymorphism, you can get performance and type safety and ease of use, but it's hard to get this exactly right.
This sounds like a good thing. But it's completely unnecessary in data flow. You can do all that if you want, but you'd be wasting your time. A class is what decides what functionality is available. But in data flow, only components can perform actions and they can only ever perform ONE SINGLE action. So polymorphism between ONE SINGLE action is not very useful.
I could see a component that performs a different action depending on different inputs, but that'd just be a two input component performing ONE SINGLE action, even though it appears it can do different things. IOW, the action is not dependent on the type, but on the input. It'll be a rare case where you can't design a component that acts on inputs.
But if you really, really, really want to do it, you can. You can use a component instead of a type. That component will do all the type checking you want it to because you designed and created it. So you can check if a specific interface has been implemented for any data or component. Usually, the type system only works on exact matches, aside from categories.
What about namespaces? Do you want to have a simple scheme as in Java, where classes, namespaces, and files are roughly equivalent? Lisp-style packages? Dylan-style modules and libraries? Within a single first-class namespace, how many second-class namespaces are there? Java has 7 or 8: class names, function names, local variable names, slot names, etc. Common Lisp has at least 3 (function, variable, and class names). Dylan and Scheme have one, which greatly simplifies things at a small loss of generality which can usually be worked around with name conventions.
I use 512 bit GUID's which can be expanded to bigger widths later on. The name is just for show other than for input names. Input names can't have collisions. If there is a collision, the input will be duplicated.
What about encapsulation? Do you want to do information-hiding on a per-class basis as in C++ and Java, or on a “module” basis as in Dylan?
Encapsulation is ridiculous considering the programmer always works within the encapsulation anyways. And if he doesn't, then it's not even an issue.
BTW, encapsulation doesn't make sense in a data flow environment where data is usually bare.
Is your language a functional language (that is, without side-effects)?
This question is seriously flawed. If your language has no side-effects, it does not mean it is functional.
I allow both pure and non-pure components. You set components as pure or unpure in its properties and the compiler will treat it as such. BTW, many of the pure components in Project V can only exist as non-pure in functional languages.
If so, is it an almost-functional language or a true pure functional language? Or is there a functional core with some sort of machinery for isolating side-effects, like monads do in Haskell?
Monads? You know what monads are? It's dataflow that's been reduced to only accepting ONE input at a time for the entire construct and where you can only have linear linkage. So get this... they took a perfectly viable dataflow system and reduced its functionality so much that it reproduces imperative programming. I've honestly never seen something so mindblowingly convoluted.
No, Project V is not functional. Functional is a cross between imperative and dataflow. As such, it is crippled as a paradigm. Project V is 100% dataflow.
What kind of evaluation semantics does the language have? Eager as in most languages, or lazy as in Haskell?
How do you determine what "when it is needed" means? Only if you have imperative constructs (such as assignment) does this even make sense. You may consider Project V to use eager evaluation. As soon as all inputs are available, the component will process the data and output the results.
Also, if a certain data path is no longer possible, those connections will self-destruct.
Is your language purely lexical or do you offer dynamic variables (or, more generally, access to the dynamic environment) as well? Dynamic binding allows you to introduce local state for the duration of a computation without side effects and without adding additional parameters.
There are no lexical properties at all. You can use dynamic binding if you want, but it's much more powerful than simple virtual functions. Your machine learns. So over time, it will know which implementations are best and will eventually override your explicit requests to use a particular implementation. However, using dynamic binding in the usual sense is almost useless in a dataflow environment because each component has ONE and only ONE task. If you want different tasks, you should use a selector input instead. However, dynamic binding is often used where inputs can take on multiple types and the connected output (source) type is unknown at design time.
If you want local state, you can use loopback constructs.
Are there different semantics for “pointer-ish” and “non-pointer-ish” values, like in C? Or is everything a first-class object reference, like in Lisp? Having multiple ways of referencing values can make the user mode much more complicated. On the other hand, making everything be object references can require boxing and/or tagging schemes that make your compiler and FFI more complex.
Everything is unique. The internals may use interning for your data, but the developer need not bother with this detail.
The developer can create whatever they wish, so if they want pointer-like elements, they are free to do so.
How do you want to pass arguments to functions? By name as in Algol? By value or by reference as in C? By object reference like Lisp does? Is there more than one convention in the language?
There are no functions. These issues don't exist in Project V.
Do you want first-class functions? What about lexical closures? First-class continuations? The answer to those questions will tell you things about heap- and stack-allocation, and will also tell you how important it might be to do a continuation-based compiler. It also tells you how hard your compiler has to work to avoid consing environments unnecessarily. Lots of sophisticated language designers go with simple closures and avoid full continuations, because full-scale environment capture is hard to do well.
You don't have any of these issues using Project V.
Does your language have an unwind-protect like facility? When you design a new language it is tempting to include call/cc because it allows you to do define many common (and uncommon) control structures. On the other hand you want to have a facility that allows you to reliably relinquish resources after you are done. If you simply try to combine call/cc and unwind-protect, you immediately get the “impenetrable shield vs. unstoppable force” problem in your language. Possible solutions include: no call/cc, weakened unwind-protect, different semantics for call/cc.
See what kind of mess you get when you try to be to clever with execution? The power of functional comes not from the function, but from the data dependencies. Project V leaves out the function and so it doesn't have these stupid things like unwind-protect.
How do you handle conditions/errors? Return codes or signalling? Do you have an unwinding-only model like C++/Java or do you allow restarts like Dylan/CL? If you do the latter do you separate conditions and restarts like Common Lisp or do unify them like Dylan? These questions are important, because every programming language has to deal with error conditions, and in many cases the unwinding model is used simply because the language designer is not aware of any other possibilities.
Components have error outputs. You can do with them what you please. The IDE provides a different layer for handling errors so that it doesn't clutter the original network. You can also insert cache components that keep a copy of the original data until the processed data reaches the corresponding sentinel component that will let the cache component know when to delete its info. This lets you rewind or retry and operation. Data dependencies are taken care of automatically, so the choice of rewind or retry will be made for you. Of course, you can handle the error any which way you please.
Do you want the language to be “object-oriented” at all, given a broad definition of OO that includes the spectrum from single inheritance single receiver languages as in Java to multiple inheritance multiple receiver languages as in CLOS? Do you want to provide genericity through some sort of template scheme?
Project V works differently for types. A derived type is an entirely new type. But if there exists a conversion component, or a combination of them that will do the conversion, then you can use a type in multiple places.
What's to keep in mind is that a type only lasts between two components. That's the entire lifetime of any given type. If it lasts for longer, it's only because subsequent components have decided to reuse that type. Data does not have a type per se. Rather, it's the components that decide on a protocol for data transmission. So there's no use for derived types in Project V that can't be accomplished via conversion components inserted in between two other components. Object oriented techniques are used for global entities where all types are global. Project V is a local or at least a neighborhood system. This allows you to connect them in sequence without any serious coupling between types two components (or networks) removed. Again, Project V does not suffer from many problems found in legacy programming languages.
Here is how Jonathan Rees has characterized the very fuzzy term “OO”:
1. Encapsulation – the ability to hide the implementation of a type
Again, Project V works differently. It works on a nested framework where one level's interface is another's implementation. So you can't just think of having one component being encapsulated and it ends there. No, you can zoom out or zoom in. That's why Project V is based on containment rather than encapsulation. Also, data is not encapsulated as it never should have been in legacy languages either.
2. Protection – the inability of the client of a type to detect its implementation, guaranteeing that any changes to an implementation that preserve the behavior of the interface will not break any clients. This also gives some measure of “security”, because things like passwords can't leak out. )
Actually, Project V depends on being able to detect its implementation. Project V wouldn't work without it. In fact, I'm baffled at how you could change implementations at runtime without detecting it. Something in your software had to request the change and as such must know about it in some fashion.
3. Ad hoc polymorphism – functions and data structures with parameters that can take on values of many different types.
No need for functions. But data structures have way more flexibility with its design and types than anything you'll find in any programming language. It's too complex to explain here. But yes, a component can act differently based on the inputs, but that's not polymorphism when you're dealing with data flow. It's just normal network building.
4. Parametric polymorphism – functions and data structures that parameterize over arbitrary values, such as “a list of anything”). ML and Lisp both have this. Java doesn't quite because of its non-Object primitive types.
No functions, but lists can be configured any way you wish. Also, it doesn't use parametric anything. Parametric entities show a flaw in the programming language. There is no need for it.
Everything is an object – all values are objects. True in Dylan, but not in Java because of its primitive types.
You can't have everything is X. IMPOSSIBLE! Imagine everything is water. EVERYTHING! We'd be water. The Earth is water. Outer space is water. EVERYTHING IS WATER! Well, everything would be the same and you wouldn't be able to tell them apart. So there has to be something that is different. Just take OOP. You have objects and methods. Objects contain methods and methods contain objects in its parameters. That's how you chain things. You simply cannot chain things without having TWO different things.
So Project V has components and connections. Internally, it uses entries and sets. There must always be a duality. Fighting this is a lost cause. And I've noticed you can use one kind of duality to produce another. That's how entries and sets are used to create both components and connections.
6. “All you can do is send a message” (AYCDISAM) = Actors model – there is no direct manipulation of objects, only communication with (or invocation of) them. The presence of fields in Java violates this.
No, execution violates this in most programming languages. A message should be data only. Invocation requires execution. Project V uses data messages without execution (that means no function calls).
7. Specification inheritance = subtyping – there are distinct types known to the language with the property that a value of one type is as good as a value of another for the purposes of type correctness. An example is Java interface inheritance.
This is another flaw in most programming language. What is actually defined here is a primitive type. Most languages have no constructs available to build your own primitive type. And then you get nonsense like this where you end up using subclassing as a way to build a set from which types can be chosen.
In Project V, if you want multiple types to be allowed, you put them all into a set. Then you use that set as part of an enumeration. An instance of the enumeration can take ONE value out of the possible types provided in the set. This is the very definition of a primitive type. So an interface implementation in Java would insert itself into the set that belongs to the specified interface. The reason you do this is so that the runtime engine is aware of all implementations. In Java, once the software quits, other programs have no idea about the implementations located in the software that just quit. Project V doesn't work that way. It remembers. So there has to be a place where that list is stored. And it is stored globally on your machine so that ALL software may use it. IOW, your machine learns every time it runs something new. Project V treats implementations exactly the same as it would a primitive type. There are ZERO differences.
8. Implementation inheritance/reuse – having written one pile of code, a similar pile (such as a superset) can be generated in a controlled manner, that is the code doesn't have to be copied and edited. A limited and peculiar kind of abstraction. (E.g. Java class inheritance.)
Most entities use a combination of mixins and inheritance. Each entity is independent and can override its base types if it wants to.
9. Sum-of-product-of-function pattern – objects are, in effect, restricted to be functions that take as first argument a distinguished method key argument that is drawn from a finite set of simple names.
Could not find a proper definition of this pattern. But Project V doesn't use functions, so I doubt this would apply.
Some people say Lisp is OO, meaning {3,4,5,7}. Some people say Java is OO, meaning {1,2,3,7,8,9}. E is supposed to be more OO than Java because it has {1,2,3,4,5,7,9} and almost has 6; 8 (subclassing) is seen as antagonistic to E's goals and not necessary for OO. The conventional Simula 67-like pattern of class and instance will get you {1,3,7,9}, which many people take as a definition for OO.
I use NONE of these things. So that would be the empty set {}.
Most of those things are issues created by the language designer or because they are using imperative constructs (yes, even in functional languages) of which they cannot get rid.
If the language is object-oriented, do you want it to be class-based or prototype-based?
Dataflow!
got an object system, do you want it to have first-class objects that exist in the run-time? Should the object system extend to include all the way to the primitive types, or do you want to special-case those like Java does?
Primitive types can be implemented just like everything else if you knew the secret recipe. There is ZERO need for primitive types to be different. Just add a IS-ONE-OF relationship in your language and you're all set. This is what creates instances BTW. And the instance needs to be picked out of some set. So your language needs to support sets as well. If you go down this track, you'll notice that your entire language was badly designed. So either you throw out sets, or you do a complete redesign. Most language designers have chosen to give up on sets. I'm not talking about sets from the programmer's point of view. I'm talking about the building blocks of the type system. Sets within the meta section of the language.
Do you want a Smalltalk/Java-style single receiver object orientation, or a CLOS-style multi-method generic function dispatch? If the former, do you need some sort of static overloading like C++ has? If the latter and performance is important, do you need some sort of Dylan-style “sealing” so that you can do some compile-time optimizations? Do you want single inheritance, single inheritance with interfaces, multiple inheritance, or a hybrid single inheritance with mixins? If you've got a more static type system, you'll need to deal with casts. Do you additionally want auto-conversion?
I use multiple dispatch for implementation selection. And I use multiple inheritance with secondary base classes used as mixins. But each entity is independent and unique from its parent.
If you've got an object system, how much of a meta-object system do you want to expose? Do you want it to be purely reflective, or more than that? In Dylan, we separated 'make' from 'initialize', which was a good idea, but do you also want to separate out 'allocate', so that you have control over where an object is created, e.g., in a “persistent memory” pool that might be back-ended by a database?
Everything is exposed. Allocation is internally handled though you can change it if you really want to. For example, if you want to create a piece of data that is directly linked to some hardware, that's doable.
Do you need hairy CLOS-style method combination, or is a simpler style like we did in Dylan enough? Do you care about what Gregor Kiczales calls “aspects”, which might change your decision?
You use the exact same thing you use normally. There is no difference. The compiler has it's own interfaces you can implement and so does the IDE. You can use events and manipulate whatever you want within the current network.
A more general question that relates to the object system, the meta-object system, and a different dimension of the bootstrapping question is: do you want to implement a language which provides a bunch of predefined and fixed constructs (such as an object system) or do you want to provide a layered language that implements such constructs in terms of lower-level features in the language? The former is probably easier, but the latter can allow very flexible customization, which tends to be traded off against standardization. Note that even a language with a powerful built-in meta-object system won't necessarily allow you to replace that object system with something else, for example, unless the language supports that sort of thing.
It's a layered system with predefined types and components for the low level stuff.
How do you want to do memory management, manual or automatic (GC)?
Neither. It's implicit. It's like asking if you empty your beer after you've drank it. Once a component outputs a value, the compiler knows the type and can allocate it by injecting code to do this. After a component has processed any given input, it is no longer needed. Since streams are used, blocks of memory are allocated. These blocks are reused as necessary. This has the advantage that memory usage can be tweaked with great accuracy.
Do you want to support threading?
Heck no. That's the whole point. The developer doesn't deal with any of this. It's all automatically taken care of for you.
Do you want to roll your own threads or use OS threads?
NO NO NO NO and a million times NO!
Do you want to support massive concurrency like Erlang does?
Is this a trick question? I support massive concurrency, but if it ever degrades to what Erlang does, shoot me now.
The answers to those questions will tell you about aspects of the run-time, memory allocation/GC, and performance. Oh yeah, it also tells you if you can actually take advantage of the multiple processors sitting in most of the machines we all have. Do you want Java-style synchronization where it is built in to objects, or should that be handled orthogonally?
How about implicitly? You do nothing and it works!
If you have threads and continuations, how do they relate to each other?
If a thread runs, but you never know it runs, does it really exist?
How well do you want to be able to integrate with native libraries?
Really well. Must support a link to existing libraries.
This decision affects your memory model, how you plan to represent run-time type info, how function call/return works, how signalling works, etc. By “memory model”, I also mean to include what sorts of objects are boxed or tagged. (Opinion: the Harlqn/FunO Dylan compiler got it wrong – I think we should have boxed everything, and then concentrated our efforts on box/unbox optimizations. This would have hugely simplified FFI issues.) Good integration with native code probably means that you will end up using a conservative collector, and that will effect the semantics of “finalization” (if you have it).
No need to complicate matters. Throw all that stuff out.
Do you want to be able to return multiple values?
Not exactly. I want to output many asynchronous values that can be independently routed to different subsequent components.
How about &rest arguments? These affect function call/return, tail-call elimination, and stack vs. heap allocation optimizations.
I have dynamic input creation when all inputs are used up. For example, an addition component will add an extra input once the first two are connected. You can also send multiple inputs via a single input stream. This requires custom handling, but isn't overly tedious. Yeah, that's right. You can have different data type in a single stream. You can create horizontal data types where you can specify how many of each type you are sending through and even have hierarchies. The number of items can be static or dynamic.
What's your order of evaluation in expressions? This affects what sort of optimizations can be safely done.
There's no syntax, so this doesn't apply. You can of course have components where you can enter scripts and those would have their own order of evaluations.
What compilation model do you want? Lots of include files like C[++]? Lots of “packages” like Java? Whole-worlds like Lisp? Separate libraries like Dylan? This affects a lot of things, not least of which is the ability to deliver small applications. It also informs the design of your core run-time.
I'm going to use a database though not strictly necessary. Everything is managed for you. You don't deal with any "source" files at all.
Is the core run-time tiny like Scheme's? Small like Dylan's? Huge like Common Lisp's? If you like the Common Lisp model, it's worth looking at EuLisp to see how to re-package it in a more layered way.
The core is tiny (except for Unicode string handling). It just has a data dependency analysis routine and a round robin scheduler. That's all I'm implementing at first.
Even in a small run-time, you need to get the basic types right. Are your numeric types “closed” (that is, do they include reals – rationals and irrationals – and complex numbers)? Are your string and character types rich enough to model Unicode?
You can define your own built-in types. I do provide a "starter kit". It includes Unicode strings, floating point and integers along with components for common low level operations. The other types can be built from what's available.
Think hard about collections. How do the following relate to each other: sets, tables, vectors, arrays, lists, sequences, ranges? In Dylan, we decided too late having the tail of a list be a “cons” was maybe not such a great idea; what about that? How do your collections interact with your threading model?
There will be specific data types for these collections. Currently, the internals use a generic collection that supports functionality for sets, vectors, arrays and lists. The exposed data types will have corresponding components that use the fastest implementations.
One thing to remember is that dataflow doesn't require most functionality required by other languages. In dataflow, you process items as they come in. You don't need to have the entire list available. This is where the real power of map/reduce comes in. A power that is simply not possible in functional languages even though they boast about a limited version of map/reduce.
Think hard about iteration, especially over collections. If all collections obey a uniform iteration protocol, it means that you can do things like 'for e in c …'. Note that if iterators are done in a first-class way, this has performance implications that your compiler needs to worry about.
Iteration is unnecessary in dataflow because it is implicit. Whatever data arrives at a component will get processed. This allows one to connect pieces of your software together that is simply impossible in imperative, OO and functional programming. In those languages, you must break the loop (iterative or recursive) open and then piece in what you want to add. In data flow, you simply connect the new pieces to the front. The stream takes care of the iteration implicitly and this is where much of the complexity found in other languages disappears.
Do you want some sort of security model built into the language? What sort of model do you want to use? A simple “checker” like the Java VM uses, or a more sophisticated capability-based model.
There's a security model, but I'm leaving that for later. It has to do with picking what implementations are allowed to execute.
What syntax do you want?
None.
Parentheses unaccountably give lots of people hives, but S-expressions make a lot of things much simpler. Infix syntax is quite nice when it's done well, but you've got to get the “kernel” of that exactly right if you want your infix macro system ever to be usable. If you decide on S-expressions, should they be represented as lists and conses, or do you want a first-class object for that?
None.
Do you want to allow syntactic extensions (macros)? Lisp-style macros? Dylan-style pattern-matching non-procedural hygienic macros? Scheme-style 'syntax-case' pattern-matching procedural hygienic macros? This says a lot about the syntax of your language, and it also says a lot about the model you choose for compile-time evaluation environments.
None. But it does have design time components that lets you change how the IDE and compiler work.
----
I'm thinking that if you can answer these questions, then you shouldn't implement your new language because it's based on existing materials. However, if the questions don't even make sense, then you're on to something new. That's what happened here and I think the questions themselves show that over 80% of the complexity and issues found in contemporary languages need not exist.
And for those that want an update, I'm currently implementing the tools palette in Project V so that you can choose what component or type you want to insert on the current network layer. Things are moving ahead quite nicely and it's actually a lot of fun. There are WAY more data structures internally than I ever thought possible. I'll have the tools palette done this week and component connections done next week. After that, I need to get the runtime engine working along with actually being able to load and save your workspace. So still a lot of work to do.



vladas # 10. November 2008, 23:13
Hehe! This is all-time most notable remark I've ever seen since 1960's. By this you've put all those "paradigmers" down to the knees!
Anonymous # 11. November 2008, 02:13
Anyone here heard of Aardappel (http://strlen.com/aardappel/index.html)? I banged my head against his thesis, I got some understanding from Chapter 6; I don't consider it a truly graphical language though. Its based on Linda for concurrency. His thesis has a few quick notes on dataflow languages in Chapter 2.
Vorlath # 11. November 2008, 03:40