Code Rewrite - Yes
Tuesday, 25. September 2007, 01:44:15
In my last blog entry, I mentioned that code the second or third time around is much better every time. That seems to be a big no-no around the programming community. Joel says no. Rami says no. David says no. Chad says no. Jamie says no. Ronkes says no. I did find one guy named Kevin that says mostly no, but does list when a rewrite might be plausible. Edit: I should probably link the original post that everyone is talking about in case you haven't seen it.
So should I backpeddle? Should I recant my statement about rewrites? I mean, I've seen my fair share of bad rewrites. No, I think rewrites are fucking fantastic. If your software is in its first incarnation, I say a rewrite is a good thing. But it's a good thing only as it relates to your code base. You'll have to decide if it's actually worth it. Is the time and money spent justified? That's another discussion. I want to talk about the code itself.
I've seen bad rewrites, but have NEVER, not once been involved in a bad one. Frankly, it's because I rewrite things when the original is so bad, that it can no longer be improved upon. Sounds like a contradiction, but any programmer has seen code so bad that any modification will result in a reduction in functionality no matter what you do.
The problems that arise in a rewrite are many. First, funding is a problem. I don't want to talk about this too much, but when you do a rewrite, managers and higher ups feel like money is going down the drain. The pressure on programmers makes them uncomfortable and this is the #1 reason why software fails. Programmers don't feel compelled to test, do trial runs, write unit test or do any of the things that are worthwhile because they don't want the finger pointed at them for wasting more time and money. If you have this kind of scenario, get out now. JUST ONE NEGATIVE THOUGHT WILL FAIL THE PROJECT! ONE! I'm not joking. The absolute worst thing a project can have is one single negative thought. After that, it snowballs and game over.
You can and will have obstacles. But everyone should always think that it's possible to succeed. The moment you start to doubt this, the project is over. Obstacles need not be negative. A setback is just that, a setback. You can keep going. A negative thought is one where you move backwards. Where you don't think you can advance, ever. Those are the ones that kill projects.
Every single rewrite that succeeds is WAY superior to the original. Not by a little, but by incredible amounts. That sounds like an obvious statement. Of course, if a rewrite fails, it'll suck. And a successful one will rock. Well, here's the secret to a successful rewrite. You need the original programmers doing the rewrite. If they're gone, kiss the rewrite goodbye. It's as simple as that. If you use new programmers, they will hit the same problems as before, only this time, they won't be tested and won't work.
There's only one kind of situation where I've seen a proper rewrite happen successfully with new programmers. It's where all inputs, transformations and outputs are documented and all special cases are likewise written down in documents. With this amount of documentation, you need to sort it. So thus begins the organisation and the new programmers are forced to think about how this all fits together and also to think about a framework. Another thing. These new programmers CANNOT have access to the old code. They cannot see it, look at it, poke it, or even touch the back of a screen where it is displayed, even if the person is blindfolded. The reason is what I just said. You don't want the new software to be a recreation. Recreations mean a reproduction of not only its features, but of its problems. So you'll just end up with a duplicate of the original, just without all the bug fixes.
Another way to make a rewrite successful is to do it incrementally. In my experience, these have the most chance of success and are the easiest to implement. What you do is start with the original software, but basically butcher it. You rip out everything you don't need. Only leave the core and rewrite the protocol that all the modules in your system will use. Write conversion routines that can translate back and forth between the two protocols. This means you can no longer call a module directly. You'll be amazed at the modularity you'll get just by doing this. It'll force you to rethink your implementation. Then start replacing the modules that you need for the new core to work properly. Then start rewriting the core systems. Some modules may be merged, changed locations or have a different set of functionality. Just as long as the overall operation remains the same. I once took a Java server and made all modules callable by sockets. I rewrote the modules in C++ and eventually left certain parts in its own process. I could then run different parts on different machines when this wasn't a requirement. What it did was give me more flexibility later on.
The biggest problem I've seen with software is that it's monolithic. There are no languages out there that are pluggable. Functions are the anti-thesis of modularity. Ironically, it's being incorrectly promoted as the best tool for this. Unless you're writing a library, no such luck. Most software start out small. You don't need plugins and a framework. You just need a few features. But at this point, all functions call all others. There's no way to insert more code without modifying the existing source. That's where rewrites should concentrate their efforts. If you rewrite just for the sake of a rewrite, don't bother.
And here's the weird part. If you want to do a rewrite, you should want to rewrite the core systems. In short, you want a new framework. You can't write a new framework in languages that force you to use theirs. This includes ALL languages that have an interpreter or VM. C, C++, Pascal, etc. are still the best tools to do a rewrite. It simply makes no sense to rewrite something on top of the same framework. What are you getting? Nothing.
Think of successful rewrites. One example is Linux. Linus used Minix until he had enough pieces in place that he no longer needed it. uTorrent is also a good example of a successful rewrite. Again, there's a framework on which to build on. It may not be obvious, but there are plenty of existing P2P files being transferred right now.
Rewrites are absolutely fantastic. But you have to know what you're doing. If you get someone that has never done one and knows nothing about the original software, then good luck. You'll need it. Rewrites are in the realm of optimizations. They make your software work better. Most programmers have been told not to even look in that direction. It's not that rewrites are bad. It's that it's not a practice that is taught in schools, Universities or Colleges. And there's the bigger point. The main point. The crucial point why people would rather avoid doing a rewrite. It's that normally success is just getting something to work. With a rewrite, the bar is set SO much higher. So high in fact, that you have a concrete measure to compare with. All of a sudden, there's accountability. That's the real reason you shouldn't do one.
Rewrites are actually very easy. Everything is already there. You have MORE data and test cases at your disposal than you could dream for. The reality is that it exposes just how good a programmer you really are and if you can do what you say you can. And about surprises and undocumented features or side-effects. I don't want to hear it. That's your job as a programmer to assess these things. That's how every other field works. There are risks. It's your job to ascertain them. If you decide to start one, you best make sure you have everything you need. Just like any job, there are some that are too risky. But saying that you should avoid all rewrites is a cop out.
After the initial rewrite, you should actually continue rewrites. While you can think up many reasons to do a rewrite and probably more for not doing one, it comes down to this. You rewrite code because you're stuck modifying existing source code. What you want is a system where you NEVER touch the original code except for bug fixes. You don't touch it for adding functionality. You don't touch it to add a module. You don't touch it to refactor. You don't touch it for ANY reason except to fix what should have been working in the first place. If your code doesn't do that, then this is a good reason for a rewrite. It should be only one reason out of two for a rewrite. The other being that the original didn't work.
So this should make it clear what you do with a second rewrite. You don't do a complete rewrite. You basically rip out all the parts that have been added since the last rewrite and make it work in a more consistent way with what's already there, rewriting much of that too if need be. Normally, a second rewrite would only be 20% of the total source tops. A third rewrite is 5% or less. In fact, everything I do is made of rewrites. Every iteration gets better. It is my contention that rewrites are done too late. They wait until they must change everything.
I say yes to rewrites. I rewrite until I no longer need to touch the source code. If I'm touching the source code, then I've done something wrong and it should be rewritten. And if I'm the only one who thinks this way, then so be it. I don't miss not having big balls of mud.
So should I backpeddle? Should I recant my statement about rewrites? I mean, I've seen my fair share of bad rewrites. No, I think rewrites are fucking fantastic. If your software is in its first incarnation, I say a rewrite is a good thing. But it's a good thing only as it relates to your code base. You'll have to decide if it's actually worth it. Is the time and money spent justified? That's another discussion. I want to talk about the code itself.
I've seen bad rewrites, but have NEVER, not once been involved in a bad one. Frankly, it's because I rewrite things when the original is so bad, that it can no longer be improved upon. Sounds like a contradiction, but any programmer has seen code so bad that any modification will result in a reduction in functionality no matter what you do.
The problems that arise in a rewrite are many. First, funding is a problem. I don't want to talk about this too much, but when you do a rewrite, managers and higher ups feel like money is going down the drain. The pressure on programmers makes them uncomfortable and this is the #1 reason why software fails. Programmers don't feel compelled to test, do trial runs, write unit test or do any of the things that are worthwhile because they don't want the finger pointed at them for wasting more time and money. If you have this kind of scenario, get out now. JUST ONE NEGATIVE THOUGHT WILL FAIL THE PROJECT! ONE! I'm not joking. The absolute worst thing a project can have is one single negative thought. After that, it snowballs and game over.
You can and will have obstacles. But everyone should always think that it's possible to succeed. The moment you start to doubt this, the project is over. Obstacles need not be negative. A setback is just that, a setback. You can keep going. A negative thought is one where you move backwards. Where you don't think you can advance, ever. Those are the ones that kill projects.
Every single rewrite that succeeds is WAY superior to the original. Not by a little, but by incredible amounts. That sounds like an obvious statement. Of course, if a rewrite fails, it'll suck. And a successful one will rock. Well, here's the secret to a successful rewrite. You need the original programmers doing the rewrite. If they're gone, kiss the rewrite goodbye. It's as simple as that. If you use new programmers, they will hit the same problems as before, only this time, they won't be tested and won't work.
There's only one kind of situation where I've seen a proper rewrite happen successfully with new programmers. It's where all inputs, transformations and outputs are documented and all special cases are likewise written down in documents. With this amount of documentation, you need to sort it. So thus begins the organisation and the new programmers are forced to think about how this all fits together and also to think about a framework. Another thing. These new programmers CANNOT have access to the old code. They cannot see it, look at it, poke it, or even touch the back of a screen where it is displayed, even if the person is blindfolded. The reason is what I just said. You don't want the new software to be a recreation. Recreations mean a reproduction of not only its features, but of its problems. So you'll just end up with a duplicate of the original, just without all the bug fixes.
Another way to make a rewrite successful is to do it incrementally. In my experience, these have the most chance of success and are the easiest to implement. What you do is start with the original software, but basically butcher it. You rip out everything you don't need. Only leave the core and rewrite the protocol that all the modules in your system will use. Write conversion routines that can translate back and forth between the two protocols. This means you can no longer call a module directly. You'll be amazed at the modularity you'll get just by doing this. It'll force you to rethink your implementation. Then start replacing the modules that you need for the new core to work properly. Then start rewriting the core systems. Some modules may be merged, changed locations or have a different set of functionality. Just as long as the overall operation remains the same. I once took a Java server and made all modules callable by sockets. I rewrote the modules in C++ and eventually left certain parts in its own process. I could then run different parts on different machines when this wasn't a requirement. What it did was give me more flexibility later on.
The biggest problem I've seen with software is that it's monolithic. There are no languages out there that are pluggable. Functions are the anti-thesis of modularity. Ironically, it's being incorrectly promoted as the best tool for this. Unless you're writing a library, no such luck. Most software start out small. You don't need plugins and a framework. You just need a few features. But at this point, all functions call all others. There's no way to insert more code without modifying the existing source. That's where rewrites should concentrate their efforts. If you rewrite just for the sake of a rewrite, don't bother.
And here's the weird part. If you want to do a rewrite, you should want to rewrite the core systems. In short, you want a new framework. You can't write a new framework in languages that force you to use theirs. This includes ALL languages that have an interpreter or VM. C, C++, Pascal, etc. are still the best tools to do a rewrite. It simply makes no sense to rewrite something on top of the same framework. What are you getting? Nothing.
Think of successful rewrites. One example is Linux. Linus used Minix until he had enough pieces in place that he no longer needed it. uTorrent is also a good example of a successful rewrite. Again, there's a framework on which to build on. It may not be obvious, but there are plenty of existing P2P files being transferred right now.
Rewrites are absolutely fantastic. But you have to know what you're doing. If you get someone that has never done one and knows nothing about the original software, then good luck. You'll need it. Rewrites are in the realm of optimizations. They make your software work better. Most programmers have been told not to even look in that direction. It's not that rewrites are bad. It's that it's not a practice that is taught in schools, Universities or Colleges. And there's the bigger point. The main point. The crucial point why people would rather avoid doing a rewrite. It's that normally success is just getting something to work. With a rewrite, the bar is set SO much higher. So high in fact, that you have a concrete measure to compare with. All of a sudden, there's accountability. That's the real reason you shouldn't do one.
Rewrites are actually very easy. Everything is already there. You have MORE data and test cases at your disposal than you could dream for. The reality is that it exposes just how good a programmer you really are and if you can do what you say you can. And about surprises and undocumented features or side-effects. I don't want to hear it. That's your job as a programmer to assess these things. That's how every other field works. There are risks. It's your job to ascertain them. If you decide to start one, you best make sure you have everything you need. Just like any job, there are some that are too risky. But saying that you should avoid all rewrites is a cop out.
After the initial rewrite, you should actually continue rewrites. While you can think up many reasons to do a rewrite and probably more for not doing one, it comes down to this. You rewrite code because you're stuck modifying existing source code. What you want is a system where you NEVER touch the original code except for bug fixes. You don't touch it for adding functionality. You don't touch it to add a module. You don't touch it to refactor. You don't touch it for ANY reason except to fix what should have been working in the first place. If your code doesn't do that, then this is a good reason for a rewrite. It should be only one reason out of two for a rewrite. The other being that the original didn't work.
So this should make it clear what you do with a second rewrite. You don't do a complete rewrite. You basically rip out all the parts that have been added since the last rewrite and make it work in a more consistent way with what's already there, rewriting much of that too if need be. Normally, a second rewrite would only be 20% of the total source tops. A third rewrite is 5% or less. In fact, everything I do is made of rewrites. Every iteration gets better. It is my contention that rewrites are done too late. They wait until they must change everything.
I say yes to rewrites. I rewrite until I no longer need to touch the source code. If I'm touching the source code, then I've done something wrong and it should be rewritten. And if I'm the only one who thinks this way, then so be it. I don't miss not having big balls of mud.
On the other hand, I do complete rewrite in case if something (module or algorithm) gets too complicated while I program it. I don't hesitate in this case. I'm just taking another different view on problem and rewriting it from scratch. Right now I do the same with my Memel. I made a first draft system just for proof of concept, I realized concepts and problems, and now I'm capable to do it in more better way.
About the time pressure - you are right. I think that in the meantime only a small amount of programmers (and customers) remembers that programming is an art. It's impossible to do a good artwork under any kind of pressure. That was my phrase about (you want good or cheap?). It's about time. Cheap in this case means 'quick'.
By vladas, # 25. September 2007, 05:56:51
You also bring up a valid topic. When is refactoring not a rewrite and vice versa. Where's the boundary? What are the differences? When I code, I don't think with these terms in my head so I'm curious where people draw the line.
By Vorlath, # 25. September 2007, 07:18:28
Refactoring are supposed to be incremental, so if you scrap the old code and start out fresh, you are not refactoring.
Nice post on an interesting subject!
Hans-Eric Grönlund
http://www.hans-eric.com/
By anonymous user, # 25. September 2007, 10:44:32
By vladas, # 25. September 2007, 11:29:02
Yes, you have a point. I usually define refactoring as being the technique described in Martin Fowlers book. I might very well be wrong (it was a long time ago I read it). Maybe it's the result that defines refactoring, not the technique.
Wikipedia defines refactoring as "modifying without changing its external behavior". This suggests that you should be able to rewrite from scratch and still call it refactoring, as long as the external behavior is not changed.
Interesting. What then should I call the technique?
By anonymous user, # 25. September 2007, 13:23:45
Yes! The best that can happen to code is that it is rewritten. Each time around the experience from the previous round helps build a better, more orthogonal, clean & lean, flexible and efficient system.
By anonymous user, # 25. September 2007, 14:24:35
And I'd take Joel's opinion on this with a grain of salt. While he does preach that rewriting is bad, or at least very risky, they essentially did a complete rewrite of FogBugz to move to a platform-independent code generator for FB5 (?). He may call it something else, but changing your application to use a generator and spit out both ASP and PHP depending on the target platform is such a significant change, that I think you can only call it a rewrite.
By anonymous user, # 25. September 2007, 14:55:13
I had a little problem with your post because first you wrote
"I want to talk about the code itself"
then you go on and talk about managers raising fingers...
I think that was not the best way to write the article, because
as I was not at all installed at the economy-pressure (i code for
fun), I was much more interested in conclusions than that economic
finger pointing :(
About rewrites, I think its better to always keep in mind what kind
of problems arise in the first case. Maybe with growing experience
of the coder(s) in question, you will less likely need a
real rewrites (or just rewrite some components, which are modular
and easy to change) ...
By anonymous user, # 25. September 2007, 16:53:53
Experience isn't enough to avoid a possible need for a rewrite. That's what my previous blog entry was about. It talks about a well known paper called Big Ball of Mud and it explains this specific issue (why software most often tends to end up as a big ball of mud) far better than I.
By Vorlath, # 25. September 2007, 19:14:43
"You can't write a new framework in languages that force you to use theirs. This includes ALL languages that have an interpreter or VM."
Strongly disagree. With the same argument why dont you want you to rewrite the compiler itself? Interpreters/VMs are the basic underlying machine outside the application domain. Usually they are out of the scope of the problem to solve, except sometimes speed concerns.
By anonymous user, # 25. September 2007, 20:00:46
I did a complete rewrite of a C++ project in Lisp.
Not only it is much shorter now, but quite every time I add a new feature, it just works. In the C++ code, there were some very strange bugs, which have never been caught.
I'm really surprised of the speed (no difference to the C++ version), and catching bugs is now much easier and faster, not to mention that they are very rare now.
But all this isn't real news nowadays (at least for open minds), is it?
By anonymous user, # 25. September 2007, 20:14:46
Anon: You went from using your own framework to using someone else's (Lisp). That's uh... boring.
By Vorlath, # 25. September 2007, 21:50:11
I work in a corporate I.T. department (publishing company) and we don't get to rewrite anything unless it has a business reason to do so. Even if it's the biggest piece of crap ever delivered by one of our consultants or outsourced to someone and brought in house, we don't get to rewrite, or even refactor, ANYTHING unless the business side has a budget for it and only then does the rewrite get approved (and subsequently outsourced to one of our top manager's consulting buddies' firms, but that's a rant for a whole other post).
By anonymous user, # 25. September 2007, 21:50:35
Actually, Linus used Minix until he accidently f!@#ed up his Minix partition and was too lazy to fix it.
By anonymous user, # 25. September 2007, 23:12:16
I find your ideas about working with a system that uses an interpreter or VM a bit strange. Do you write software than runs on Intel processors? The instruction set for many Intel processors implement the assembly instructions using microcode - the machine code that all programs that execute on the computer is running on an interpreter.
Is this cheating? Do you have to build your own hardware to successfully rewrite your software?
If your software happens to be a web application, you are working within the frameworks of HTML, HTTP, CSS, etc. To really rewrite do you have to throw away those frameworks and replace them with your own (or perhaps just going from a web client to a fat client, then for the next one maybe X or console would be good enough).
What about languages like Haskell that provide both an interpreter and a compiler? Does the existence of an interpret automatically disqualify it? If I write a C++ interpreter will you stop using C++?
By anonymous user, # 26. September 2007, 02:43:32
Now, the decision is about how much of a clean break you want to make. If you're doing a rewrite, the cleanest break you can have is by using a system level language.
You can also do a rewrite without writing a new framework. You can extend existing ones and use VM's and whatever else. I just think this is a waste of effort for a rewrite if you're just gonna let other software developers of other languages and platforms decide design decisions of your software anyways that may not be the best fit for your application.
With the web, you can't throw away the framework. The framework is the web. I doubt you can rewrite the Internet. So this is one case (as it is with a lot of hardware) where you want to stay with existing frameworks and be affected by the web's philosophy. But keep in mind that web applications are ONE type of application. There is currently no alternative, but that won't always be so. So when you do a rewrite for this future environment, you will indeed want to ditch HTML, HTTP, CSS, etc. And again, you still won't create a new framework because likely someone else will have designed one. This is called porting. You ditch the old and use whatever is available on the new environment.
What I said is the best way to build a NEW framework is to be in as much control as possible. But not everyone needs a new framework. Many people are ok with a 90% new or 50% new or 20% new framework. Some don't need any new framework at all. Existing products are good enough. So you can build on what's already there. This is why it doesn't matter to many people if the code is native or interpreted. It's not something that will affect the new version of their software.
Now take my project called Project V. This can work with everything from native code to VM's to interpreters. I'm not going to bind my product to existing proprietary products because it makes no sense that Project V can produce native code, but my product isn't itself native code. As for libraries, I can create bindings for anything that currently exists. But my framework is completely new and independant of any outside influence. Not many have my requirements. But if you do, it's good to know about these things. It's also not a bad idea to keep in mind that most of the software we write is conditioned by others who came before us even when we think we're doing a complete rewrite. And now you know what's involved if you ever wanted to not be influenced at all.
By Vorlath, # 26. September 2007, 03:46:09
Every advance in software, I think, happens because of someone's lazyness
By vladas, # 26. September 2007, 05:25:15
@ Vorlath:
Lisp is not a 'framework', it's more or less at the same basic level as assembler (you don't need C to implement Lisp, and a couple implementations don't use C).
So, Lisp and assembler are both bare-bone (in a sence), and they have more in common than one may think.
By anonymous user, # 26. September 2007, 07:12:53
By vladas, # 26. September 2007, 09:40:02
What's about the side effects on CAR and '.' list notation in LISP? And the very internal structure of lists?
Are these bare-bone too?
By vladas, # 26. September 2007, 13:51:29
What's the opcodes for cdr?
When I program in a system level language, I knew pretty much what the opcodes are going to be. Only exceptions and RTTI I can't figure out, but those are higher level anyways. With C, Pascal, heck even Modula-2, I can tell what's going on.
But again, you're missing the point. It's not about being bare bones. It's about escaping design decisions made by others. If you use Lisp, you still have Lisp at runtime. So you must formulate your code according to Lisp and not according to your specific needs. Hey, if they're close, that's great. But that's not the point. It's the fact that no language can be a perfect fit other than a carefully crafted framework built exactly for your needs. System level languages are the only languages that let you do this. Lisp cannot do this.
By Vorlath, # 26. September 2007, 21:01:13
There's a slight difference between rewriting and refactoring: rewrite is refactoring with baggage, while refactoring is a tiny rewrite of the smallest replaceable piece of code.
Most rewrites happen for superficial reasons, like wanting to play with the newest toy (Perl to PHP, PHP to Rails, Rails to Django, etc.), or a messy codebase, inflexible API, etc. But it's dangerous to assume that having a complete version of your product would clarify your vision and simplify a rewrite.
When you rewrite an application, you're starting from zero. Not from 20-30%, but from zero. You know how your interfaces would look like, you know what your application *should* do, but you don't yet know *how* it will do it.
Refactoring, on the other hand, is tiny rewrites applied on tiny portions of code. The point of refactoring is to replace code with better-looking code while maintaining the same functionality. When you apply this principle on small sets (as opposed to complete applications), you needn't worry about features and vision much, because what you're doing is mostly harmless, and humans are perfectly capable of handling small sets of information.
At the end, I could be nitpicking. I feel rewriting is risky. It could be pulled off, but it's not the wisest decision to make.
By anonymous user, # 27. September 2007, 15:42:44
<i>What's the opcodes for cdr?</i>
In case you're interested, CDR was originally implemented for the IBM 704 as the following assembler macro:
LXD JLOC,4
CLA 0,4
PDX 0,4
PXD 0,4
I'm unfamiliar with 704 assembler, so I can't comment much on the code, but I'd imagine it pulls bits 4 to 18 from a memory location and puts them in a register. Modern architectures don't use 15 bit addresses and 36 bit words to store their data, so if I were implementing Lisp on, say, a 68k, I'd use two 32 bit words to represent a cons cell. In which case, I believe CDR could be defined in just one opcode:
MOVE.L (A0),D0
Lisp was originally designed to run on a machine that used vacuum tubes, so it should come as no surprise that it's built-in instruction set can be implemented at a low level. I'd imagine C is rather more complex to implement, and further from the system layer than a minimal lisp implementation.
By anonymous user, # 27. September 2007, 21:49:02
I'll just recap for those who want to know. That I'm not just saying these things.
First, the code you're showing is for the interpreter. So you're making my point that you're using someone else's framework.
Second, the 68K opcode just moves a longword (32bit) into D0. But you make no mention of how you are organising your data structure. Is the pointer to the item first or second? In any case, the value loaded up would be a pointer and you're loading it into a data register. Not sure what you're doing there. But whatever. It's just more nonsense that we're all too familiar seeing with you.
Third, you say that Lisp was designed to run a machine that used vacuum tubes. Awesome! Again, this is someone else's framework. Where's the user's code? With C, the user's code is exactly what the user specified, not what C wants. In fact, nothing of C remains. Even the calling mechanism is not C's, but rather the machine's ABI. So again, with C, the only thing left is your code. Not so with Lisp. With Lisp, it's always Lisp with your additions. I repeat, it's always the same code at runtime. Someone else's code.
By Vorlath, # 28. September 2007, 06:53:19
By vladas, # 28. September 2007, 07:17:17
Vorlath, you argue that C is a "system-level" language, and that Lisp is not, but surely that depends on the system the language is being implemented on.
The IBM 704 used memory made up of 36 bit words. Instructions were held in memory using a format of a 3-bit prefix, a 15-bit decrement, a 3-bit tag, and a 15-bit address. CDR was an assembler macro standing for "Content of Decrement Register", whilst CAR stood for "Content of Address Register". In other words, Lisp cons cells could be stored directly in the instruction set.
In my 68k example, I assumed that cons cells were stored as pairs of 32 bit pairs, so CDR would be:
MOVE.L (A0),D0
And CAR:
MOVE.L +(A0),D0
A Lisp applications would probably contain a lot of arbitrary JMPs from one cons cell to the next (though a good compiler could abstract a lot of them out), but apart from that I don't believe it would be inherently different to a compiled C application.
And whilst C may not seem like it makes many assumptions, that's because we're all used to a standard processor architecture. We expect there to be a few registers, one or more stacks, and a linearly addressable memory arranged in groups of bits divisible by 8. This doesn't have to be the case. Instead of linear blocks of memory, you could use chains of linked lists, with each block of memory divided into two address pairs. I believe this style of computer architecture is known as a pointer machine, and trying to get C working with such a system architecture would likely be an uphill battle. You'd have to implement what would amount to a VM in order to get pointer arithmetic to work.
Given this, why assume that Lisp is any different from compiled C? Both can be implemented in VMs. Both can be executed at the hardware level. At least some processor architectures are incompatible with C, and would require an abstraction layer for C to work. Possibly the same is also true of Lisp.
Why the bias against VMs? How is a system architecture implemented in software any different from one implemented in hardware?
By anonymous user, # 28. September 2007, 11:21:40
My bias against VM's is this.
Functionality of hardware is X.
Functionality of software is Y.
Y < X ALWAYS!
Clear?
Also, if someone else wrote the core, then you must conform to other people's design decisions. OHOH, if you control the core, then you make the decisions and this is the only way you can keep providing functionality to the higher level parts if you need to. With VM's, you're dependent on Sun or whoever to keep it up to date. And if one day they decide they're not going to let you use feature C anymore, too bad. It's your project, you should have final say.
If you use a VM, then your high level part is basically a VM on top of a VM. How ridiculous is that? And if you're not building another higher level on top, then you're doing it wrong.
I still find it funny that Sun, MS, Google, Adobe and a host of big companies all use exactly the techniques mentioned here, yet there's resistance to this idea. There's a real reason they do this. They are in control. They're not dependent on anyone else for the core functionality. And the stupid thing is that there's nothing a VM or interpreter can do that can't be done at the system level with equal ease. There's no reason for VM's at all. That, more than anything is what confuses the hell out of me as to why people use them.
By Vorlath, # 28. September 2007, 13:03:20
You are right in that C can have, and actually has some other - interpreted implementations. But whatever it be - I can think in C as though it would be Assembler. Can You think in LISP similar way?
If I write a++; in C, I expect similar compiled machine code. But I can't think what code will be for CDR in any of LISP implementations. I even can't be aware of lists memory allocation details in various LISP VMs (I can count 5-7 different of them).
As the final paradox I could tell - LISP is more robust and restrictive in the same time. I'm not against it by the way.
By vladas, # 28. September 2007, 16:02:40
The performance of a VM is usually less than the underlying hardware, but the functionality does not have to be. What if I ran my OS under an x86 VM like Bochs on an x86 machine? Then my VM would have exactly the same capabilities as my hardware - just an order of magnitude slower.
I'm also curious as to where different hardware architectures fit into this, Vorlath. A company called Symbolics sold some Lisp Machines in the 1980s, which implemented a minimal lisp core at the hardware layer. Using your "Software < Hardware" equation, it would seem that you'd advocate using Lisp, rather than C, on such a machine. Is this correct?
Regarding moving pointers into data registers, that's essentially what CDR does. With a name like "Content of Decrement Register", it's not likely to do anything complex. If the "decrement" is held in the first word at a memory location, and the "address" is held in the second, then (CDR x) and (CAR x) are just about equivalent to *x and *(x + 1) in C.
Where the output goes depends on what's calling the function, which is why I stuck the result of the MOVE in a data register. Perhaps the next instruction could push it onto the stack or somesuch, but that's an implementation detail.
By anonymous user, # 28. September 2007, 18:19:45
2. I'd need to see the specs of this Lisp machine and I care to look at that as much as I care to shave my next door neighbour's cat.
3. Ever heard of A1 (not the sauce)? MOVEA.L (A0), A1 ; Address registers are made to hold, wait for it... addresses. Yay!
By Vorlath, # 28. September 2007, 22:19:12
1. A different kind of VM? Care to be explicit as to where the line is drawn? Clearly you view say, the JVM and Bochs as being two different "kinds" of VM, but what, precisely, is the defining difference? Is it just whether the VM's architecture has previously been created in hardware?
2. Then why have such strong opinions on matters in which you are not in possession of all the facts?
3. A cons cell doesn't have to store an address in the decrement register, which is why I chose to use D0 over A1.
By anonymous user, # 29. September 2007, 01:52:16
2. You're talking about one specific case called a Lisp machine. Also, a Lisp machine is tailored toward one specific language. I've actually said very little about this Lisp machine.
3. Your statement has more holes, but I'm ending it because it's about one specific framework. I've already said it (the fact that it's from the interpreter) actually proves my point anyways.
Weave Jester, go read a book or something. Every single time, you spread incorrect notions and can't stay on topic. Besides, what I'm saying isn't outrageous and I think it may have been blown way out of proportions. There's nothing in a VM that can't be provided in a system level language. But there are things in a system level language that can't be provided in a VM. Specifically, the ability to control the core of your software and decide what functionality will be available to the higher level parts of your system. This is braindead obvious. There's nothing new here and all large companies do it this way for a reason. Even your precious VM's do it this way. So before you say anything else, explain to me why the very first JVM wasn't built using the SmallTalk one for example?
By Vorlath, # 29. September 2007, 11:44:57
It may be "braindead obvious", but it's also incorrect. A VM may degrade performance, but it doesn't necessarily reduce functionality. The reason the JVM wasn't built on top of another VM was purely a performance consideration. It's not like one couldn't create a working Java bytecode VM in, say, Python, but if one did, it would be depressingly slow.
It occurs to me that perhaps you're talking about functionality from a programmer's perspective, rather than the functionality of the application itself. It's obvious that you could write a Java program that exactly mimics the outputs of a C program, but in the Java environment, the programmer loses the ability to manipulate pointers and so forth, and this may be seen as a restriction of functionality.
However, C does not represent the pinnacle of language functionality. I can do things in a Lisp VM that would be impossible or impractical in raw C. For instance, many Lisp programs are self-modifying, which would not really be workable in C. Different C compilers produce different outputs, and ANSI C can theoretically compiled on many different architectures, so you couldn't just change the assembly of your application at run-time in the same way as you can manipulate S-expressions in Lisp. In order to get self-modifying, cross-platform C code, you'd probably have to implement a VM of your own.
But you'll never fully understand the benefits or disadvantages of something without first trying it, so I'm uncertain why you so strongly cling to notions that are not backed by practical experience. I like VMs because they give me more flexibility, and more functionality. I like introspection, syntax macros and higher level functions, and in order to use them I C, I'd effectively have to create a VM of my own from scratch. The effort involved in getting the functionality of C to the same level as, say, PLT Scheme, seems rather prohibitive, and my home-made VM probably wouldn't be as good as the VMs that scores of programmers have worked on for years.
I mean, if I was really concerned about starting from scratch, I'd begin by etching my own microchips - x86 architecture is really limited in many respects.
By anonymous user, # 29. September 2007, 13:35:18
That's my favorite line. You really do produce some gems sometimes.
"I like VMs because they give me more flexibility, and more functionality. I like introspection, syntax macros and higher level functions, and in order to use them I C, I'd effectively have to create a VM of my own from scratch."
Exactly! This is what I'm talking about. This way, you control what features are allowed at the higher level for extensions, plugins and custom high level code. You can bind certain objects directly into your low level framework and all sorts of things like that.
"The effort involved in getting the functionality of C to the same level as, say, PLT Scheme, seems rather prohibitive, and my home-made VM probably wouldn't be as good as the VMs that scores of programmers have worked on for years."
Huh? Why wouldn't you be able to use existing ones as your high level language? There are plenty of scriptable C and C++ type languages. I'm sure there are Lisp style OSS intepreters that would allow you to bind your own low level stuff into. Also, there are plenty of toolkits that allow you to build your own languages with your own action routines.
And about examples that use what I mention, here are a few:
JVM
.NET
SmallTalk (this is actually a prime example)
Lisp interpreter (most of them)
Quake (all of them)
Most 3D games (that allows mods).
Firefox
Opera
IE (I think)
Photoshop
On and on...
By Vorlath, # 29. September 2007, 14:54:14
I think we're talking past each other, and we may even be advocating the same thing for once, though I confess I'm still unsure what you mean. All the major VM-based languages I can think of have foreign function interfaces, allowing one to bind native C functions to functions the VM can understand. If I wanted to access feature X of the underlying system, I could define a FFI for it... though, most useful functionality is likely already implemented in that fashion. In truth, I've only ever needed to do it for performance reasons.
I don't advocate using VMs without connecting them to the underlying hardware - that would just be silly, and more than a little pointless. But VMs are useful, so long as they expose all the core APIs that would be needed. The majority of code can be written in the VM, with a minimal compatibility layer connecting it to the underlying system.
I'm starting to think that we just have a different idea of what makes up a "framework"...
By anonymous user, # 29. September 2007, 17:19:45
I'll leave you with one last analogy.
If I want an integer, I can use an integer.
If I want a string, I can use a string.
If I want a boolean, I can use a boolean.
If I want a compound structure, I can use a compound structure.
If I want algorithm X to use an integer, I can use algorithm X with an integer.
If I want algorithm X to use a string, I can use algorithm X with a string.
If I want algorithm X to use a boolean, I can use algorithm X with a boolean.
If I want algorithm X to use a compound structure, I can use algorithm X with a compound structure.
This is your argument. The algorithm is an anology for your VM. The tools inside are the basic types. So no matter the algorithm, you can choose the one that most fits your needs. Whether that be Lisp, Java or whatever. But all tools are inside because no matter which one you pick, there they are. So it should just be a matter of preference, right?
If you can just use whatever you need, can you explain to me why anyone would ever create a pattern? Like vector<T>? Or set<T1,T2>
My point is that VM's are just different implementations. Going to system level is a generic framework that allows you to use any implementation you wish for the high level stuff. Even if you only use one particular high level implementation, the choice to use many different kinds of high level implementations at any point in the future is no problem, including custom ones (ie. your own interpreter or scripting language). This allows you to mold everything towards your solution instead of molding your solution to the existing implementations.
If you want to understand my point, you need to understand how patterns are built and why they are built. While most programmers understand how to use patterns, the same cannot be said for building them. From past articles posted online about the topic (not from me), 99% of programmers do not understand this concept. And most of them never will. I'm afraid that you, Weave Jester, are in this majority.
By Vorlath, # 29. September 2007, 17:49:46
It took me a while to penetrate your analogy, Vorlath, but after reading through your comment a few times, I think I understand what you're getting at.
In a nutshell, you seem to be arguing that we should tend toward generic tools, such as vector<T> and set<T1, T2>, and that implementing a framework in a language like C allows you to use it with a large number of higher level languages.
I guess that's a valid point, but I'd argue that's limiting yourself to the lowest common denominator.
For instance, if I created a framework in Scheme, I'd likely make use of macros quite a bit. This would produce a superior framework, but one that would not be directly compatible with a language that lacks the same capabilities. By specializing my framework for a particular language, I don't have to lower myself to a common level of compatibility.
By anonymous user, # 29. September 2007, 21:52:56
If you can understand why you would want to create patterns in a programming language, then you'd see why you would want to create a pattern for your framework one level deeper. However, keep in mind that I'm promoting using your own high level language along with the system level core. And not one already in existance because those VM's and interpreters retain control and don't allow this kind of flexibility. But there are plenty of tools that allow you to do exactly this.
BTW, your example is incorrect in its assumptions, but its wording actually supports my argument, but again I don't think you understand what I'm getting at. What I'm talking about isn't a huge revelation or anything. So don't expect any kind of "aha" moment. More like a "DUH" moment.
There's a bit more to this, but I'll try and keep it to this one point for now.
By Vorlath, # 29. September 2007, 22:43:14
But what about language constructs that exist in one language, but not another? If I were to design a certain framework in C, Lisp and Haskell, I'd come up with three different implementations with very little overlap between them. Certain languages rely heavily on constructs that have no direct analogy in others.
For instance, lets say I wanted to create a general framework for solving logic puzzles. If I were using Scheme, I'd probably implement McCarthy's amb operator via a macro and call/cc. If I were using Haskell, I'd implement amb using a monad comprehension. If I were using C, well, I've seen a fairly portable implementation of amb that screws around directly with the instruction stack to get the desired affect.
Anyway, my general point is that implementing the same functionality requires incompatible approaches. Scheme shows off its macros and continuations; Haskell it's advanced type system; C it's ability to screw around directly with the instruction stack. You couldn't use the C approach in Scheme, or the Haskell approach in C. You could try using a FFI to the C implementation, but I wouldn't like to think of the effects of unsafe stack alterations would have on the respective VMs and GCs, and it wouldn't slot into Haskell's type system very well, either.
So in this case, it's better to write the framework (well, library really) from scratch in each language. And this is just a handful of functions we're talking about. Scale it up to a proper, complex framework and you're likely to have even more problems with incompatible programming paradigms.
By anonymous user, # 29. September 2007, 23:53:12
Sorry, but I can't explain patterns to you. You cannot get closer to the destination than where you are at now and I cannot do anything to bring you over. That's up to you. You're fixated on this world view you've setup for yourself and are afraid to let it go. Maybe the fact that I'm talking about lower level code is what's throwing you off from thinking in higher level terms, I don't know.
By Vorlath, # 30. September 2007, 00:19:43
I'd ask how your definition of "pattern" differs from the norm, but you've already said you can't explain it. I think I need a Vorlath to English dictionary
It sounds as if you're talking about abstractness, but perhaps that's just my views tainting my interpretation of your various explanations. I view abstractness and generality as the key to better programming, and when I see the word "patterns", I think of subtle threads of commonality that run through programming solutions. The most general threads, the ones that run through the most solutions, tend to be the hardest to see and conceptualize, but also the most useful.
You probably don't share the same views or definitions, as you seem to steer away from any abstraction I care to mention, and seem to have little very interest in them. You seem to favour languages with relatively limited syntax and type systems, and are uninterested in languages that offer more exotic language constructs. But at the same time, what you talk about seems remarkably similar to advocating greater abstraction and generality. I confess this apparent inconsistency is rather baffling!
By anonymous user, # 30. September 2007, 01:26:25
I have little interest in abstractions because I see through them. You seem to soak it in as if it were the gospel truth. Your exotic language constructs are illusions. What you get from them may well be nice, but I steer away from them, not because I don't understand what they are capable of, but because I know their weaknesses. If I use them (and I do sometimes), I will paint myself in a corner in the long run. But I also know where they draw their power.
Functional programming is probably the worst at this. It draws 100% of its power from data flow, yet restricts itself by including imperative concepts (not monads BTW) and then their advocates call it beautiful and say it's the blub of anyone who dares to challenge these notions. Even as it has a quite obvious dichotomy with its "monadic" approaches, both are paradoxically called functional. Only someone pure of faith can ignore the sheer number of inconsistencies.
So it is not I that is shying away from languages with more expressive power. In fact, I'm the one trying to create a more powerful platform. I don't advocate abstractions, but I do support the idea of retaining access to all levels of computing.
Here's an interesting quote from Alan Kay:
Here, he's talking about porting and bragging about how easy it is. If you want more bragging, look up anything about Alan Kay and porting SmallTalk. He repeats this all the time. But why would he convert to C? My goodness, a system level language? And easy porting? Now, I don't care much about porting though it is a nice advantage. But any system level functionality can be added rather easily now. No way it can take longer than porting. Also note how this is different than simple external calls. This extra functionality would come from within. Seamless and specifically tailored to the task at hand. I think it's braindead obvious why it's done this way, but it seems to be a forgotten art in many circles. And forget that it's a VM. It could be any kind of application.
By Vorlath, # 30. September 2007, 03:48:07
I don't think we understand the same thing when we talk about "abstractions". An abstraction, in my view, is simply a way of factoring out common behaviour into a more general function. For instance, if a neophyte wanted to find the sum of five numbers, they might hard code it:
x = 1 + 2 + 3 + 4 + 5
A more experienced programmer would write a "sum" function:
sum [] = 0
sum (x:xs) = x + sum xs
A still more experienced programmer would factor out the common behaviour into a fold:
fold _ i [] = i
fold f i (x:xs) = f i (fold x xs)
sum = fold (+) 0
So when I talk about the need for abstraction, I am talking about the desire for a programmer to reduce the amount of repetition in their code. I can't conceive of any programmer who would argue that repeating themselves is good practise, so your objections about abstractions seem more like a misunderstanding to me.
On the subject of monads, you should be away that they are not a concept limited to functional languages. They can be implemented in any language, but tend to be unwieldy without a sophisticated type system and higher level functions. As such, most languages forgo this layer of abstraction.
However, a lot of things you likely use everyday have their roots in monads. Streams in C++ can be thought of as a specialised monad, as can list comprehensions and generator expressions in Python, or LINQ in C# 3.0.
A monad is essentially a way of applying general functionality to an object within a specialised container. It's such a general idea that it applies to a wide range of programming concepts. That's the reason languages like Haskell use monads so much - no because they have some fetish for category theory, but because once you introduce the concept of monads into a language, they tend to apply to a lot of distinct pieces of functionality.
By anonymous user, # 30. September 2007, 12:55:24
You can do that without abstractions. It seems you believe that wrappers are enough to provide an abstraction. Unfortunately, you do not want to admit that sometimes, programmers have to break into those abstractions and fix them to provide better functionality, or sometimes create something entirely new. For this to happen, one must understand what is going on even within these wrappers. This is very true of encapsulation. While it does provide a simpler way for its use, it's still the programmer that must maintain this "abstraction".
So no, I don't promote abstractions for the simple reason that it's a lie, but I do promote tools.
Monads are data flow concepts and have NOTHING to do with functions. In fact, this is the limiting factor in getting their full power. See, there are things I understand that you plainly don't want to. It's not that you can't, but you eat up this notion of abstraction wholesale. Do you not see that there's a difference between the normal way functional programming works and the way monads works. If there were no differences, then there'd be no reason for monads since they'd be the same. So explain to me exactly what causes this difference. (Please don't. It's rhetorical because I already know you can't, but you think you can). You seem to want to claim that there is no difference. We all know there is one and this makes it non-functional. Denial of this fact only proves that you wish to retain your imaginary world view.
You go on to say that monads can be applied in a wide variety of languages, yet know nothing of why this is. You believe it to be a "specialised container". Again with the wrapper concept. The "if I can wrap it, I can call it what I want" scenario. The reason monads are powerful is because it's data flow. Once you remove the imperative stranglehold from functional programming (bet you still don't believe that imperative is critical to the definiton of functional programming)... anyways, if you remove imperative, you end up with data flow. By some weird twist, monads can actually restrict itself so much that it reproduces the effects of imperative programming. Only in functional languages would you find such bastardisations of powerful concepts.
You say C++'s streams can be thought of a sort of monad. Again, these are very limited, but streams are exactly in the domain of data flow. I'm pointing this out because it comes from you and you can't deny that streams are a data flow concept. If you do deny this, then know you're living in your make believe world for one more day by your choice, and your choice alone. There is no outside reason other than you why you do not see reality.
BTW, don't you even get frustrated that everything I've told you adds up? That it all fits neatly together? That your explanations are always filled with holes, yet you are continuously try to fill them up? I'm just trying to figure out what would cause someone to keep up the fight for their virtual (aka fake) world for so long. It's admirable, but one that is untenable in the long run (or now for that matter). What is it that is so importatnt to you to retain this imaginary world view?
By Vorlath, # 30. September 2007, 15:59:56
Keep in mind that when I talk about abstractions, I'm talking about factoring out repetition in code, not necessarily anything to do with wrappers. All I'm interested in is not writing redundant code.
Regarding monads having nothing to do with functions; you're incorrect. I'll provide a brief example in Python to show you:
def unit(x):
return [x]
def bind(f, xs):
return [f(x) for x in xs]
Those two functions turn the standard list class in Python into a monad. That's it. No fancy functional mojo. No IO black magic. That pair of functions you see above are literally _all_ that's needed to turn the list class into a monad.
So no, your arguments don't add up. They're just nonsensical, or based around fundamental misunderstandings. Claiming that monads have nothing to do with functions is completely wrong, because all monads are implemented using a pair of functions. You might as well claim that objects have nothing to do with classes.
Regarding monads being data flow concepts; some monads certainly are, but some are not. Monads are just an general interface used to apply functions to data inside a container. Sometimes that container represents a flow of data, but other times it's just something mundane like a list, or a nullable type.
By anonymous user, # 30. September 2007, 17:00:17
I believe haskell even lets you omit ">>" and ">>=" where the "=" is used when you actually want to use data from the previous operation. In fact, you normally have to define a function to enable this transition of state from one function to the next, but notice that it's not actually there when you use it. No function?!!! WOW! This is what enables lazy lists, map, reduce, monads, arrows and all these other things. These are all dataflow concepts whether you choose to admit this to yourself or not. In Project V, I'm using imperative code to enable dataflow programming because the hardware is based on opcodes. There are no functions (I use a loop) though you can use anything you wish for the implementation.
Again, I've explained why things are the way they are, yet you refuse to see the truth. As long as you persist this personal world view, you are basically putting me up on a pedastal as one who must teach you things before we can even discuss the topic at hand. While I like to introduce topics, it's not my objective in any way to teach people. I do like to help if people are genuinely interested, but you seem more interested in trying to convince me otherwise, or are trying to retain what little shred of your world view is still intact. Please understand that I know more than you on this topic. If you are trying to convince me otherwise, please don't. Not even other functional programmers agree with you.
By Vorlath, # 30. September 2007, 18:48:45
Stop digging. You're clearly unfamiliar with the concept of monads, and it is unwise to make sweeping, confident claims about concepts you do not understand. I realise that it's not in your nature to ever admit ignorance about anything, but perhaps you could instead change the subject, or end the discussion, rather than continuing to embarrass yourself with false claims.
A monad is a specific pair of functions associated with a type. Asserting that they have nothing to do with functions is wrong. Continuing to assert this when it's been pointed out to you to be wrong is foolish.
However, I will admit that the IO monad is somewhat the exception. The IO monad is a contrivance to shield the functional environment from the imperative one that exists outside. It's black magic, a necessary evil, and not a particularly good example of a monad. Ignore the IO monad; all the other monads are far more interesting, and few have anything at all to do with data flow.
That said, I retain little hope that you'll look into anything beyond the horizons you've devised for yourself, monads or otherwise. It may be that your own ideas about Project V are revolutionary; but your lack of interest and knowledge about existing programming concepts does not bode well for this. Truly revolutionary ideas are rarely thought up in a vacuum, and by dismissing any idea that isn't your own, or convincing yourself you know all about it already when you actually do not, it's unlikely you'll advance particularly far.
That said, I may be wrong. Project V seems to be continuing, and dataflow programming does not seem an inherently bad idea. I'll check in occasionally and resist the urge to make comments... but I'm not going to hold my breath.
By anonymous user, # 30. September 2007, 20:53:48
From my Algorithm of creative thinking .
By vladas, # 30. September 2007, 21:45:46
I'll grant you that many people do not accept what I say. This is one of the reasons why this blog exists, so that I can publish any discoveries I have found. And I dislike all programming languages equally. I have nothing to defend, so this puts you in unfamiliar grounds, yet you proceed with arguments that assume I do have something to defend. Every time you come here is because you wish to defend your world view. It's never about discussing the matter at hand. I can say this because you use lawyer-like techniques of using items that seem related, but aren't. I present arguments and instead of disproving what I say, you ignore them or just say otherwise without explaining why. I always explain why I think something is true. Even if I did believe that you knew a little of what you're talking about, you never follow up directly on the topic and instead bring up something else that seems to fit your world view better. When I poke that full of holes, you start over with something else. It's a never ending cycle. Forget any other argument we may have, this alone is reason enough to not believe anything you say.
I could be 100% full of shit, but you don't give yourself the chance to prove me wrong.
By Vorlath, # 30. September 2007, 21:55:28