Who Don't Love Java? ME! But Not Because It's Java.
Saturday, 6. October 2007, 21:51:33
In the linked article, he goes on to talk about a lot of different company's preferred programming languages. There are a lot of different ones, but Java is missing. I don't care about talking about Java anymore, but this article actually misrepresents it and affects what he says about other languages. One thing I do want to point out is that C++ is used to implement Map and Reduce at Google. Anyone who's been reading this blog will know this to be a no brainer. Well, at least they know *I* think it is. And to be fair and come back to Java, Google does use it though the author says this is because they've grown so big that they can.
The others all use some of the P languages like Python, PHP and Perl. There's even some SmallTalk and MySpace is a MS shop, so it uses some .NET and I agree with the author that they must be suckers for punishment.
Now I can get to the contradiction. He lumps all curly braced languages together. As if they are the assembly of the programming world. His words, not mine. In his last article, the first part to the one linked, he says this:
It’s the fact that the Curly Brace Languages have become the assembly languages of our day, and there is nothing more obtuse than a big assembly language program.
I think he's trying to say that the languages are messy or something. Surely, he can't be speaking about power.
You can talk directly to the hardware.
Yup. He puts Java directly in the same group as C, C++. If you stay within Java, I'm afraid you can't talk directly to the hardware unless the VM lets you. And it's never DIRECT now is it? There are the issues of bytecode, JIT, and all sorts of layers until it actually gets to the hardware. Can it be fast? Maybe. But you still don't have access to hardware in the way you would in C, C++ and especially not assembly. Take a look at my Stupid Sort Test if you want an example. I can control the use of the cache with assembly. With C and C++, I can't use non-temporal stores, but I can at least control what amount of memory I'm using so as to better use the cache. In Java, I'm afraid to say there's less control. It just doesn't compare. And I really don't understand the author's attempt at this comparison.
I've programmed extensively in assembly (68K, x86, 6510, and more), C, C++ and Java (and more), and I can say with total authority that Java is not a language that lets you hit the hardware. I don't know anyone in the Java camp that would want to claim this. It's supposed to protect you from the hardware. While you can use JNI, this isn't Java anymore. You don't write the implementation of the other side of JNI with Java. It'll defeat the purpose. Let's see the quote above with Java put directly in it. Maybe that'll produce a more immediate response.
Java can talk directly to the hardware.
Really? I hope I don't have to convince anyone how absurd this is. Yes, sometimes the VM lets you do certain things. But again, it's not really your choice. If there's an API, good. If not, you don't have the same power as in C to create your own tools that hit the hardware. So I'm going to assume everyone agrees that Java is not meant to be a language that can hit the hardware. That's like an insult these days, isn't it?
Because I get better performance.
Unfortunately, this myth still persists about Java. Here, it's said about all system level languages and that much is true. But about Java, it's more like a myth vs. myth showdown. Both sides claim different things about Java's performance. It's already well known what I think about Java, so we'll leave that alone for now.
The decisions to use curly braced languages come down to two choices as remarked by the author. If you need something proprietary that can't be done in any other language, you use it. Otherwise, stay far away. That basically sums a lot of the discussions in my last couple of article. The fact that most of the companies mentioned use the web makes it apparent that there's not much new to do. It's been done. So use those tools. I'll make a huge leap of faith here, but the new crop of languages seem to have taken the web into consideration, no? (DUH).
From now on, take Java completely out of the originally linked article. There's no reason to put Java in the same category as C and C++. Also, I can say that Java did take the web into consideration. Just look at when it came out. Applets anyone? Not saying it was good or bad. I'm saying the web was a consideration.
Then he says two things that are brilliant, but misses the point entirely.
Another thing: notice the companies profiled above that have created their own Component Architecture Frameworks or Core Technology.
Polyglot Programming is becoming increasingly well accepted in an era when the Curly Braced Languages (Java, C++, et al) bring little to the Web 2.0, SaaS, and Multicored Crisis parties.
And read the very last line of his article. I've been saying the same thing all along, though I believe that the core of your architecture should start off with a system level language. And I don't put Java in that category as I've said. That last quote would be ok if it were not for the inclusion of the word "Java". And again, the reason these companies can start off in other languages is because these companies are geared toward the web. The web isn't exactly new anymore.
Take it like this. Imagine in the early days of shipbuilding. If you wanted to travel, it made sense to know how to build ships. When you're out at sea and need repairs, you need to know how to fix your boat if you can find land where there's nobody around. After a while, if you only want to go to the same place and there are commercial travel agencies that will book you a ticket there, there's no need anymore. But if you want to build your own shipping lanes, or you want to use boats in a way that haven't been used in the past, then it's best to retain the knowledge of how to build boats and build it yourself (or have that knowledge directly at your disposal).
Now apply the same logic to the web. The web is not new anymore. There are existing tools there. If you're into programming stuff for the web, well, there's nothing new there. Really, I've never seen anything new on the web. Anytime I've seen new stuff is where the web wasn't the main consideration. Fitting round pegs in square holes sorta deal. In these cases, a system level core is best because you can shape it to do what you actually want instead of shaping your solution to the existing tools.
Polyglot programming is using multiple languages. I couldn't ascertain if this included system level languages, but I'll assume it does. I've been promoting this view for a while now. At the very least, have an internal API that exposes functionality and resources that you can control instead of bypassing the core system and accessing it directly. And it seems that even if you don't use a system level language, you can apply the same principles when you consider that all these web companies create their own component frameworks.
Now I can look at the article in question and rectify a few points. First, we must remove Java from the list of system level languages because that is really what the author meant, system level languages. There's just no way we can include Java as a hardware language. Secondly, if we read the rest of the article, it could do without mentioning Java at all. It could simply be said that system level languages aren't the primary tools in web languages, yet Google uses it. This can be explained away by saying that Google needs to do something that other languages can't and is a very specialised need. The author's quotes above would support this 100%. BORING! His entire article would be useless.
That leaves one question. Where does Java fit in? Java is definitely something that is used for the web. It should be classified together with these other languages. However, we would have to come up with different reasons as to why Java isn't used by the big web companies. It very well could be its curly braced syntax. We must just be careful to not confuse this with the power and functionality of other curly braced languages where the syntax is not involved. And I think this is where the author went wrong. A lot of what he says is very accurate, only that there's usually a critical point that's wrong and throws his whole argument out the window.
So one reason that Java may not be used is its syntax. Many have said it is extremely verbose. I think it goes deeper than this. I think the VM gets in the way. No, not in my usual complaints. Look at how other languages work. When I run a Perl program or a php program on Linux, I just type in the command "./myprogram.pl" and it runs (as long as I set the execute bit). With Java, this is rarely how it's setup. Web stuff usually involves Tomcat. And anyone who's tried to set that beast up has God-like patience (if you omit the flood - though that's the usual result with Tomcat). Maybe it's gotten better over the years. I don't know. But like with everything else with Java, there's a long list of disasters that precede it. Following Java is like following a series of disasters of biblical proportions. I've been active in the Java community only for several years (back when), so I may be out of touch, but just off the top of my head, we have applets, GUI's, EJB, Tomcat (web), difficulty in producing tools that run from the command line, performance, bulky VM and run once/test everywhere.
Yes, many of the problems of Java's yesteryears are gone today. Some actually exist in these other P languages. Yet I think something has to be said for a clean break. There is also the functional approach to certain things. Clever tricks are bound to attract programmers. I personally hate those tricks, but to each his own. Whatever the attraction, it can be said that it's not Java. I also think that Java is collapsing under its own weight. It is bulky. When I code in it, I feel a weight on my shoulders. Like I need to move the Earth before something gets done. Let me explain. It has to do with the list of disasters. Java used to be about having a huge library. Everything was there. Once you learned something, that was that. Over time, the Java library is no better than in any other language. This knowledge became disposable. People don't like being told "this is how you do it", only to be told later "that's not how we do it anymore". It's happened with DOS, it's happened with C and C++. It's happening now with Java and it will happen again. The difference is that C and C++ have never made the claim of saying "this is how it is". In fact, Stroustrup has been adamant about keeping it open, even against much criticism. C++ has survived. So has C. COBOL has actually seems to be resurging lately, but that's another story. Some people DO like COBOL BTW. They may not admit it, but there are a lot of them out there.
Another point is about the author's assertion that system level languages being useless for concurrency and the multicore crisis. From someone who says the multicore crisis has already passed, this is a curious statement. And besides, these other languages are just as useless for this purpose. Like it or not, the fundamental property of all processors built today are based on a program counter. This means it's imperative. There are only two fundamental approaches to computing. One is imperative, the other is data flow (like our brain and circulation system). Everything else is built on top of one of these two approaches. Data is inherently parallel. Luckily, we can simulate data flow with imperative. So like it or not, system level languages will be used with multicore programming. The dangerous assumption given by the author is that system level languages are somehow incapable of resolving the multicore crisis. I say there's no choice for it to be system level code. It has to be something the machine can execute. Everything else is just software. Sometimes I think that there's a genuine attempt by some conspiracy group to make sure that people think system level code is inferior to anything built on top of it such as these P languages.
Java did set these ideas in motion. Ironically, Sun had quite the opposite objective. It wanted to use Java in a sort of viral marketing scheme. It wanted to give Java away for free that anyone could use, but where everyone would see the Sun brand. Basically, it wanted to get name recognition and to sell more of its failing server business. It was quite a tour de force. MS had a lock-in on most of the computer market share. This is why it had, and still has, no intention of producing something that works on multiple platforms. And sorry kids, the web ain't that holy grail. MS knows it too.
Sun was smart about it though I think it lucked out in the long run. Today, Sun has name recognition and its business is doing better. But spreading Java didn't amount to selling Sun servers in the way it had hoped. So you see that Sun cared nothing about interoperability or portability or any of that. In fact, Java had been thrown in the garbage bin already until it was revived from the ashes. Sun is now getting a lot of business because it seems people trust the name. Not because of anything Java related though that did serve to get its name out there.
Think about it. Do companies want to create technology where it doesn't matter if you use their product or not? That's crazy. More crazy than anything I could ever think up. I'd laugh if I heard someone applying to VC's and saying that they want to build a product, give it away for free, and this will make it so users don't have to use the products they are actually selling for profit. This is what people think Java did. You can use Java, but you don't need to use their servers. They wanted you to use them though.
You can say what you want about MS, but they do know what's going on. They could have built a VM that works anywhere. Instead, they came up with .NET. Why? Aren't VM's supposed to work everywhere? Here is a case of a VM that only works on Windows. Strange? No. Now you know why. It's the normal way of doing things. It makes no business sense to create something where people don't need to use your other products. MS was all too happy to produce yet more junk to sell to its users. Since there was already a perception about VM's, MS was all too happy to oblige. And it has a bonus that .NET can run on all its versions of Windows. Thus enabling the internal interoperability, but making it harder for external products.
The real reason Java isn't used is because it's irrelevant. It always has been. There's nothing it provides that can't be done in other languages and often in a much better way. It also never had a primary foothold anywhere. It eventually found its way on the server side. But on a server, you can use anything as long as it works. The language doesn't matter. So if something like Perl or php comes along that is very well suited for web applications, why would you use something that is more difficult to set up? Something that doesn't work well with others?
Like I've said in the past, my biggest fear is that someone like Intel, IBM or Motorola will come along and provide a software platform for concurrency and then modify its hardware to fit this way of programming. It'll create yet another lock-in scenario. Not specifically on the software side, but on the hardware. If we can achieve true interoperability, which is required by concurrency, the hardware doesn't matter. You can use whatever you like. I'm telling you now that is more than possible. I'm working on it now. But imagine what a lock-in by Intel would mean. Then imagine what kind of world it would be if it didn't matter to the programmer what hardware was there and we could produce native code. This is what's best for developers. But it won't happen. Hardware companies are doing a LOT of positioning lately.
There are two ways this will end up. I already know how it'll end up and it won't be in the programmer's favour. But let's assume it does. What would happen is someone comes along with a concurrent platform that works well in practice. Hardware companies will adopt these (for making faster hardware) and make the protocol open. Everyone can use them and you can mix and match processors within your computer. So you could have all sorts of combinations of processors from all sorts of different companies and can produce custom configurations. And your code can adpapt to any of these processors based on functionality of course.
Ok, time to get out of make believe world. That's never gonna happen. First, there'll be a proprietary protocol between computing nodes. Second, hardware sockets are already proprietary. Third, there's still a belief that system level code can't be made portable and ONE set of opcodes or instructions is best. Fourth, Intel and other chip manufacturers will continue their buyouts of other chip companies. AMD buying ATI is a continuation of what I'm talking about here. AMD will be able to produce its own chips and will be able to provide GPU's so that you need not look elsewhere. It's like Java in many ways. As long as you keep the users happy, there's no reason to look elsewhere. Intel already has a deal with NVidia. When the multicore crisis is resolved, and it will be, expect a lot of shifts in power. There will be a HUGE drive to merge all different kinds of possible processing under one roof. And they'll do this to keep their lock-in. So that you don't need to go anywhere else. That's fine for MOST uses. But it's EXTREMELY frustrating for anyone doing something new. I want a machine where I can plug in all sorts of processors from different companies. Will it happen? Sure. Will it be popular? No. Programmers and users alike will be pushed toward one way of doing things. Someone will win out. And this may be the biggest win (or loss) that computing has ever seen. Think of it this way. Processors should be pluggable like USB. Yet we have a different view of this. SCSI was built this way where the computer was just another node, but that notion is all but dead. Remember, it's all about control.
Look at the history of computing for really open protocols that have been accepted and what changes they brought about. The Internet is one. The web is another. Yes, there's a colossal difference there. Computer hardware has protocols, but they change a lot. As they should. But it allows different makers of devices. In the programming world, C libraries and types are well known. Some may argue if they are the best or not, but it did change the world.
If you remember ONE thing from all this, I want it to be this. The language isn't what's important. It's the communication protocol. It's not the functionality, but the way the functionality communicates. Think of the telephone and how many devices were used on it. Everyone has a different phone with different capabilities. Today, there's even ADSL on it. Soon, video phones. In the past, there were modems. Yeah, the really slow ones. Faxes still use phones lines. Voice mail. Think of road and railway networks. John A. MacDonald, the first Canadian Prime Minister, didn't want to build a transcontinental railway for shits and giggles. He wanted other territories to join. Railways have often been called the lifeblood of society. Programming has it all backwards. It's not what we do, but how we communicate. And right now, we don't.
Edit: I read some of the comments left on the article in question and somebody mentioned that he should have said system level languages instead of curly brace languages. The author of the article responded by saying that you can build device drivers with Java. One may want to read this[pdf] as one example. There are some good points. Right on the first page, it says that main() is called is not appropriate for device drivers. This is another example of what I'm talking about when I say VM's assume control. Sun changed this for device drivers. The JVM has also been drastically reduced in size. Anyways, it's like I said... if Sun provides this functionality, then fine. But it's not on the same level as system level languages. Sun even says "If some feature from the C libraries is needed, it may have to be re-implemented in the kernel." This is in relation to running the JVM, not the code that runs on top of the JVM. So that should put a lid on that issue. Saying that Java can hit the hardware is disingenuous. This functionality has to be put into the JVM in the first place. By system level code.



spc476 # 7. October 2007, 00:41
I hate to inform you of this, but this already happened at least once.
Research in the late 70s and early 80s in compiler technology revealed that a typical compiler only used a subset of the available instruction set (and it didn't matter which CPU, the results were similar across all the architectures studied). Fancy instructions like CRC in the VAX (yup, the VAX has an opcode devoted to calculating a CRC---it also has instructions for inserting and removing items from a linked list) or fancy addressing modes like the memory indirect postindexed (from the MC68030 and looked like ([16,A0],D7,32)) weren't used---it was too hard to recognize when they could be useful. So research instead turned to making CPUs with simplified instruction sets (modifying the hardware to fit a way of programming) and then speeding them up, which was easy since the hardware to interpret instructions was so simple, the complexity could be shoved elsewhere (large register banks, multiple execution pipelines, internal cache memory, etc.).
You then said, "[a]nd your code can adpapt to any of these processors based on functionality of course." I really hate such hand-waving statements. How will my code adapt? Do I have to write code to adapt itself? Will it be a recompile? If so, you mean I have to ensure a development system exists on all my servers? Perhaps I'm not imaginative or creative enough to envision how this is possible.
Vorlath # 7. October 2007, 04:24
About code adapting, how is it hand-waving? Most of this blog is devoted to this issue. Any regular reader will know I've tackled the issue more than once (like hundreds of times). Sure, I've mostly failed at getting my point across, but I'm most certainly not hand-waving. And many people have contacted me and confirmed exactly what I was saying. If you want info, it's there. The question of how is rather simple. You have some code that can receive instructions on what to do. So yes, each core must have some kind of bootstrapping. But all machines are like this anyways. This is why I said there will be a lock-in once this protocol is set up. Operations are not fixed either. There are basic ones that should be available on all platforms, but are not limited to those. In fact, the multitude of opcodes is what enables the code to adapt. With a locked and fixed instruction set, you lose this ability. And with multicore, you can treat each part of your code the same way as if it were multicore. You mix and match what you need. That's why I'm working on a data flow development environment that will enable all of this so the developer doesn't have to. And the best part is that anyone can create their own way of doing things and it'll still work. Even the hardware and core systems can be changed just as long as there's at least ONE protocol it understands. That should make it clearer where my last paragraph in this blog entry comes from.
Anonymous # 7. October 2007, 12:26
Can you give a Reader's Digest version of your article. Just a summary for those who don't want to read your entire sermon in tiny fonts.
Anonymous # 7. October 2007, 14:54
Small tip: it's "Smalltalk", not "SmallTalk".
Vorlath # 7. October 2007, 15:47
Basically, the article I reference puts Java as a hardware language. I say this is ridiculous and go on to explain why.
So why isn't Java used by the big web companies if the hardware language excuse doesn't work? (That Java isn't used is a claim made by the linked article BTW) I say Java isn't used because it's a lock-in technology and I go on to explain why this is so. It also leaves behind a trail of failure. There's other details too. At the end, I say that the communication protocol is what's important, not the language or the functionality. I cite a lot of examples and show how this all ties together and I also explain in more detail some stuff I talked about in previous articles that are relevant here.
Good 'nuff?
Anonymous #2: Tomato TomaTo.
spc476 # 8. October 2007, 06:33
And that's different from a VM how?
I'm sorry, but I need to see specifics before I understand something. A few examples maybe. A better explanation perhaps. I get lost when you say stuff like "in fact, the multitude of opcodes is what enables the code to adapt."
Perhaps I just need to see the implementation before commenting further.
Vorlath # 8. October 2007, 20:21
If you want an explanation of why a fixed set of opcodes makes portability impossible, look at this article.
On Transition of Equivalent Source Code
Then you need to understand the two fundamental concepts that make computations possible. They are both equivalent for computations, but only one is parallel and concurrent.
Causality in Programming Language Design
"I'm sorry, but I need to see specifics before I understand something."
and
"Perhaps I just need to see the implementation before commenting further."
These quotes support my claim that programmers deal with the concrete and not abstractions. I'll show you an implementation soon enough to prove that what I say is possible. There are plenty of people who are interested in it for what it can do. So stay tuned. I'm making huge progress on Project V. I'm at the compilation stage right now.
May I ask you what your definition of a VM is? It seems to encompass everything under the Sun much like functional programmers' definition of a function.
The multitude of opcodes makes portability possible through the application of design patterns. Asking for an implementation defeats the purpose as it relates to understanding patterns. While I will show examples of implementations, realise that these will be ONE such implementation and not the only way to do things.
spc476 # 10. October 2007, 20:36
So I guess a VM could mean any simulated machine in software.
By that definition, most Forth implementations are VMs (as to why? Because such an implementation of Forth is not only easy to get running, but the resulting code density is smaller than assembly (whereas most compiled languages tend to have a smaller code density than assembly) and a complete system can easily fit in 1 or 2k of memory).
A real mindbender though is the PERQ, a workstation where you can literally rewrite the opcodes of the CPU. Since the opcodes themselves are not written in stone, so to speak, does that make the PERQ a VM? If I want to run Java, I can reprogram the CPU to directly execute the JVM. Same with UCSD Pascal (yet another VM from the early 80s). Heck, maybe even do a Lisp like system.
spc476 # 11. October 2007, 01:17
open-file-read
swap
open-file-write
swap
2dup
copy-by-fh
close
close
end
Assuming it takes four bytes to store each address, this works out to 36 bytes. The "interpreter" just goes through this list of addresses and calls each routine and in itself is pretty trivial to write.
If compiled to native code, it might look something like:
enter 8,0
push offset name_dest_file
call open-write
move [ebp+fh-write],eax
push offset name_src_file
call open-read
move [ebp+fh-read],eax
push [ebp+fh-write]
push [ebp+fh_read]
call copy-by-fh
push [ebp+fh-read]
call close
push [ebp+fh-write]
call close
leave
ret
Which works out to what? 50 odd bytes or so?
Would you even consider the Forth implementation a VM? In that case, the "opcodes" are just addresses of routines to call, with implicit data passing (and forgive if my x86 code is a bit off, I don't have my reference materials at and right now, but it should be close enough to see what I'm trying to do).
Vorlath # 11. October 2007, 09:48
This is what I'll be doing for controller nodes. Leaf nodes will simply execute code as is (so no VM for sure for this part at least). I think this is more a hybrid between coordination tools and native code. But I don't see it as a VM.
However, my first version will be like an interpreter mixed in with the coordination tools. Later versions will allow dynamic compilation (not always to the same native code). And this is where the line seems to get blurred. If we go by your definition, not only bytecodes, but all source code that is processed in any way at runtime would be a VM. I don't think I can support that view. Is dynamic compilation of source code a VM?
This is why I don't like the term VM much. You can stretch it in places where I don't think it was meant to go.
Here's another twist. Say you have an infinite array of possible operations and you have dynamic compilation with that, is this a VM? There's no fixed set of opcodes. These opcodes would include all hardware machines, all VM opcodes (for those who like to hurt themselves) and most possible high level data transformations (like codecs, parsers and everything else). That's Project V. It can use all machines, VM or not. Well, eventually anyways.