Skip navigation.

Notes to self

Whatever I feel like writing

Posts tagged with "Programming"

Textjure

, ,

What have I been up to, code wise, in my spare time? I felt like talking about this because I think it has gotten mildly interesting lately.

It hasn't been Fabric. I started Fabric, not because I thought it would be a fun thing to do, but because I absolutely needed a tool that did exactly what Fabric does. And ever since Fabric got to the point where it more or less completely scratches my deployment itches, my time-investment in it has pretty much retreated to maintenance mode.

No, my spare time is now primarily spent on the Clojure programming language, in some form or another.

When it comes to programming languages, I learn by doing and I think most people feel the same way; you can't learn how to write code if you don't write code. I learned Python by writing a blogging engine (how original) in it using Django, and then Fabric.

Now, I'm learning Clojure. I started out with the Euler problems but ran out of steam; the math problems just wasn't my thing. Then I was mostly idling around the clojure.core source code and the Java code, looking for a cool idea to try out but nothing really jumped out at me. I proposed a number of what I thought would be language improvements - with patches'n'all, but each and every one of them was turned down. Oh well.

Then, drifting aimlessly around Github, I stubbed my toe on a tiny little thing that Chris Houser had done. I was a text editor, but hardly even complete enough to be called a prototype.

I downloaded it and fired it up. I was mildly surprised the thing actually started and showed a little window with two parts: and editor panel and a build-in Clojure REPL that actually worked.

The interface wasn't terribly exciting, though. The window was tiny and placed in the top-left corner, and the program could not be started without naming a file to open.

I dug into the code - what little there was of it - and fixed those things. Made the window chrome-less and full-screen, and pivoted the editor/repl-split to a left-right configuration to better utilize my wider-than-tall screen resolution. It actually worked ad wasn't terribly difficult.

Then I found out that the thing actually had no functionality in the way of opening and saving files. Also, the key-binding looked messy. I turned to these issues next.

One thing leads to the next, and textjure is now slowly shaping up to be a genuinely useful text editor, though it is still woefully underfeatured.

The code is in my fork of Chris Housers repository: http://github.com/karmazilla/textjure/tree/master

The feature that I'm currently working on is syntax coloring. Following this, is stuff like indenting, incremental search and REPL-history. Then I guess I'll properly announce my work on the Clojure Google Group.

I always knew that I would eventually find something fun to hack on with Clojure, but I would never have imagined it would be a text editor. :smile:

ANN: Fabric - Simple pythonic remote deployment tool.

, , ,

Deployment woes.

I code in Java at work, and all of my projects can be divided into two camps:
  1. Those that build to .jar files and deploy to our maven2 repo.
  2. And those that build to .war files and are deployed to some application server.


Once configured, maven itself is pretty good at making case #1 run smoothly - "mvn deploy" is all it really takes. But do I have anything similar for case #2?

Well, not quite. I tried banging something together with Capistrano, but that didn't work. For one thing, Capistrano expects you to be deploying a Rails application and these are typically all deployed in the same way, which is very different from deploying a .war file. Secondly, I had a build/compile step in my process that I needed to do locally (because compiling on the server is troublesome and a bad idea), this meant that I had to use the put function to upload my .war file. Put worked nicely for small text files (ie. my project.properties file), but failed miserably when the payload war a near 20 MB heavy .war file.
Googling and tweaking the flags on the File object didn't work. It simply refused to upload that damn .war file.

Itch-scratchers mentality.

I suppose I could have written a shell script and leveraged the existing scp and ssh tool chain. And I did throw a pebble down this route to see how it felt, but eventually rejected the idea. The reason is that I'm deploying to multiple hosts simultaneously. I tried looking at clusterssh to see if it would help me in this regard, but it turned out to be odd, klonky and basicly not designed for such a task. I also took a look at Dancer's Shell, dsh, which was a lot closer to home, but my username contains a backslash and dsh meticulously stripped it out prior to logging in, foiling any and all of my attempts at tricking it into taking it at face value.

What was I suppose to do? I had pretty much given up on automating the process and started to accept defeat, when I happened upon the paramiko module for python. Paramiko is a pure python implementation of the SSH 2 protocol, and lets you log into servers, execute commands and upload files. It prompted me to the idea to write my own pythonic version of a Capistrano like tool, and then use that for deployment. It could be made open source and a perfect oppotunity try out Git on a real project.

Birth of Fabric.

So I swiftly went to work. In two days, I had written the first prototype. It supported the most basic operations such as running remote shell commands, sudo'ing and uploading files. Then I registered my project on nongnu.org (chosen because of their Git support). And ba-da-bim, Fabric became an open source project: https://savannah.nongnu.org/projects/fab/

Fabric looks, on the surface at least (or for those who've only spent a short while with either), a lot like Capistrano. You have a fabfile (as oppose to a capfile) in you project directory, and that file describes all of your deployment tasks (or commands, in Fabric speak).

Commands are really just regular python functions (and you're allowed to call it fabfile.py if you like) that simply makes calls to some other, more magical, functions called operations. Operations are magical because they just sort of exist; you don't import them from a module and don't find them in an object - you just call them.

So, to get the feet wet, here's a hello-world'ish simple fabfile:
set(
    fab_user = 'joe.shmoe',
    fab_mode = 'rolling', # run stuff on one host at a time
    fab_hosts = ['node1.servers.com', 'node2.servers.com'],
)

def deploy():
    "Build and deploy a war file to our app. servers."
    local("mvn clean package")
    put("target/myapp.war", "myapp.war")
    run("mkdir /rollback/$(fab_timestamp)")
    run("cp /$(fab_host)/deply/myapp.war /rollback/$(fab_timestamp)/myapp.war")
    sudo("cp myapp.war /$(fab_host)/deploy/myapp.war")
    sudo("$(fab_host) restart")


Then, all it takes to deploy is a "fab deploy" in the directory where the above file is found. It's still pretty basic, low-level and imparative, but it's a good start.

Fabric also has a built in help system that is powered by your doc-strings; try typing "fab help:deploy" for instance. If you want to see what other commands are availble to you, then simply type "fab list". You can also get a list of operations (the local(), put(), run(), sudo() kind of things) by typing "fab help:ops" and get more details about the individual operation with, for instance, "fab help:put".

And that's basically it for today.

Perfect Type System

, , ,

Whenever you venture into the realm of language design, you will eventually touch on type systems. Type systems are an inherent part of any language, and many languages are famous specifically for their type systems - Haskell and Ruby comes to mind.

Boiled down, this means that if you design a language, you will also find yourself designing a type system.

This brings me to an old quote:

A design is perfect not when there is nothing more to add,
but when there is nothing more to take away.



How does this relate to type systems? A perfect type system by this definition would be more than minimal - almost non-existent.

Take assembler: Everything is bits and bytes, and bytes can be grouped in words who's size depend on the target processor architecture.

This might be minimal but it is hardly practical; the whole purpose of a programming language is to provide abstractions, and that can't really be said about assembler.

Assembler has primitives, numbers of varying bit-width, but in order to provider higher levels of abstractions, another meta-type is needed; aggregates - a collection or grouping of primitives. Without both aggregates and primitives, then it'll be pretty hard to define a practical and proper high-level language.

Now, the challenge is to find the minimum set of members for these two meta-types. First up is primitives, and I propose that we can do with just a single member and some syntactic sugar.

Consider how the notion of a number can be the only primitive in a language; characters are really just a special case of numbers, and integrals and reals can be unified into a single type. This could be simulated with cohesion as it is in many languages, or by not considering integrals and reals to be two separate types
at all.

With a sufficiently high-level language, we can also unify finite precision and infinite precision numbers into a single type. The Strongtalk implementation of the Smalltalk language is able to automatically detect overflow in finite precision numeric types, and convert them on the fly to their infinite
counterparts.

If we're willing to defer performance considerations to an implementation detail, we could simplify things even further by making all numbers infinite precision decimals. Characters would be nothing more than a unicode ordinal and a bit of syntactic sugar.

This way, we'd end up with a single primitive type: the Number.

So what of the aggregates? How many members of this meta-type are required for a proper type system? Once again, I propose that the answer is just one; the Function.

Let's contemplate that idea a little, and see how it will compare to the type system of an existing functional language, such as Haskell.

In Haskell, I would count functions, lists, tuples and `data` types to be among the aggregates. The challenge for my alleged functions-as-only-aggregate is now to figure out how to represent each of these meta-types as functions.

Functions themselves are a no-brainer in this regard to let us jump straight to lists: A list is an ordered sequence of values, these values are most often iterated and re-arranged when worked with on programs. For the purpose of iteration, and special kind of functions exists in the Python languages that are called generators - they are functions that can return multiple times during a single invocation, a poor mans continuation, if you will.

So using continuations, it becomes quite easy to create ordered sequences of values - every time you need the next value in an iteration, you just 'continue' to the next return value of the continuation. Then add some syntactic sugar so the complexity of continuations doesn't bleed into code where it is not needed, and we have a very powerful list representation.

If we make our type system dynamic (non-static, to be precise), then tuples would be in the same exact ballpark as lists. I don't think we even need to distinguish between tuples and lists at all, if our type system is dynamic, and I'm not sure it's needed in a static type system either - Java, for instance, seems to be doing just fine without tuples.

Lists was arguable the most important meta-type to get straight, but object or struct-like types, types with named attributes or parts, are extremely useful as well - many languages allow for some interesting magic by facilitating introspection of the names and values of parts of these types.

Let's return to the generators in Python. These functions are able to stop in the middle of their execution to return values. Then they wait to resume execution when the next value is requested - their complete state is preserved while they wait, and they resume execution from the exact point they left off.

Now, Python is a procedural (and object-oriented) language and as such has the ability to assign values to variables. Then imagine a continuation that is waiting to be resumed, and that we were able to access the variables inside the scope of the continuation.

This solution is arguably less elegant than the list solution, and also raises a number of questions; specifically, in a functional language, functions tend to not really have any state at all, much less a state where the values have names.

Functional languages like Lisp and Haskell are declarative. This makes stateful functions obtuse and very hard, if not impossible, to create. Haskell have monads for representing state, but they are not quite the same thing as a function, and as such would be a separate member of the aggregate meta-type, distinct from the functions.

We could instead allow for assigning aliases to the expressions that make up a function, and the values that these expressions generate would be accessible through these aliases when the continuation halts.

This still wouldn't be entirely elegant, and there's also the question of what happens when the continuation is continued - I don't see an obviously correct answer to this question, so a language would be forced to define a convention in this regard, and such a convention will certainly surprise and bite a few people, as it is destined to be misunderstood or assumed to have a different behavior.

Regardless, I think it is doable, albeit it isn't pretty. So a language would probably have to take that fact into account, and either provide some heavy syntactic sugar to make this bearable to work with, or be designed in such a way that you will need these kinds of types less, or both.

At the bottom of it all, I think it is possible to design a programming language around a minimalist type system like this. It would probably be the kind of languages that are easy to learn but difficult to master - a very, interesting, language to be sure.

Would such a type system be perfect? I'm not sure, 'perfect' is tainted with subjectivism, but I do think that it would be rather elegant if we disregard the way that we implemented struct-like types - I suppose you can't make a language without compromise.

Principles of a Well Behaved Code Generator

Code generation, automatic or not, have emerged in numerous places as a productivety enhancing technique or a powerful tool of abstraction, or, most often, both.

It come in quite a number of incarnations, from the obvious to the hidden. Some frameworks, like Rails and CGLIB even have code generation as a core feature.

With such broad applicability, and so many different kinds of code generators, they're bound to not all act the same. This difference in behavior is an entrance to mistakes, and is at odds with the useability principle of least surprise.

I have seen, worked with and written numerous code generators - I can hardly imagine a developer who have not. And looking back on this experience, I'll try to jot down a few simple principles that, when a code generator follows them, will make the experience of working with the generator much more pleasent.

However, with the diversity in application, the're bound to be exceptions to the rules. I'll do my best to also dig up these exceptions, and explain why they are a good idea in the given situation.
So, here follows the Principles of the Well Behaved Code Generator:

1: No destructive update.
A code generator should not make any destructive updates. Generated code, like schaffolds in Rails, are intended to releaf the user of burden of writing bioler plate code and leave a fill-in-the-blanks template. Once this is done, the users will often go ahead and... fill in the blanks.

But development is iterative. The foundation from which the code was generated might change over time. Two good examples are generating O/R-mapping code from a database schema, and generating stub-classes from a WSDL in interface-first web service development: both the database schema and the WSDL might change over time as development move through iterations - and when the foundation changes, a good way to keep the generated code up to date it to regenerate it.

But therein lies a catch; what becomes of the filled-in blanks when the code is regenerated? If the generator overwrites the developers careful changes, then it would mean good work is lost into the thin air. Source code management systems and various backup techniques are not withstanding because the generator cannot rely on those features to be available.

Loosing good work is something that we cannot affort; code generators is meant to enhance productivety, not destroy existing productions. There really is only one way around this: code generators must avoid destructive updates at all costs.

If a file we would like to write already exists, give the user a warning. A code generator is not in the business of data backup nor in the business of data shredding. If the file exists, skip it. Then notify the user and let him decide how to handle the situation. If he want's to back the file up, let him do it. If he want's to delete it, let him delete it. But don't go over his head and make arbitry decisions.

Note that this dosn't mean that you cannot touch his files. If you can make the update to his existing code without breaking his work, then by all means go right ahead! I know the visual GUI editor in JDeveloper has the ability to read back the code it generates after it has been altered by hand, and then make changes to the code without breaking the manual changes.

Exception: Not his files
The most obvious exception is when the user isn't the logical "owner" of the generated files. Examples include the Java code a JSP compiler would generate on an application server - the user is by no means suppose to fiddle with these files, so destructive updates to these files are OK as nothing is lost by it. Nothing is suppose to be lost by it anyway. A user making changes to those files had it coming.

Another example is the Java code generated on the fly by Jasper Reports - it is pretty obvious from their hidious style that you aren't suppose to touch those.

2: Exemplary correctness
As a code generator, there can be no doubt about it: your users are a bunch of unshaved, half-asleep coffee slurpers who a too lazy to write their own code, so we might aswell set the standard and show'em How It's Done(TM).

It's easy to make good code bad, but hard to make bad code good. So since a code generator generally don't have a clue about the quality of the existing code, the generator should produce code of the highest quality possible.

This means that, when applicable, generated code should ...
  • Be properly indented.
  • Have meaningful variable, method and class names.
  • Be consistent with its naming convention, and in line the naming convention of the language/technology/environment.
  • Have a good and consistent commenting policy.
  • Be readable.
  • Make correct use of exception handling.
  • Make sane use of logging features.
  • Be debuggable.
  • Be amenable to testing.

Not all of these make sense in every situation as is depends heavily on the domain in which the code generator operates, but the fundamental point remains: code generators should produce high quality code.

Exeption: It's only run once
I have on several occasions written programs that generated (for the most part) SQL insert scripts. These scripts were only intended to be executed once, and then thrown away.

I don't see much point in going out of your way to care for every little detail in the code when you know it's headed for the bin in a few minutes anyway.

This kind of code just have to be free of any bugs (that will manifest themselves the single time the code is run) and get the job done. Therefor, this is an exeption to the principle of exemplary correctness.

3: Blend in
If there's one thing generated should not look like, then it's generated code!

This principle is much along the same vain of the previous principle of exemplary correctness, but I still think it bears mention of its own. Especially becouse I have a nice little story to go with it.

I once wrote a hidious hack code generator for a Java web framework called Rife. This framework came with its own pojo-based ORM framework, and the mapping was (is) written in pure Java in some special Metadata classes. What my nifty little tool did, was to read the source files for a bunch of pojo-beans and then produce and equivelent amount of stub-metadata classes with all the blanks filled with some default values (based on heuristics). In order to make these generated classes blend in with the existing code base, my tool went to great lengths to copy the indentation policy of the pojo-files in the generated metadata-files.

If you used tabs, then it would use tabs, and if you used four spaces then it would use four spaces, and so forth. It didn't copy the placing of curly braces or any other fancy stuff like that, but it did go that extra mile to make the code look a little bit more as if I had written it myself.

More general examples are visual GUI editors and code templates in IDEs. A respectable IDE knows how you like to indent your code and where you like your curly braces, and you'd also think that code generated by an IDE is expected stay around in the code base for a long time and be available for tinkering, so there's simply no excuse for an IDE to not try and replicate your coding style when it generates code.

Exception: Nobody cares
I think the exeptions for the first an the second principles also applies to this principle. If you know that nobody's ever going to deal with the code on an permanent basis, then there's no point in polish like this. Besides, it may not even be possible to deduce these policies or heuristics in the given situation.

4: Let the user decide
User: "Hey! I want a class that extends this other class and implements these interfaces."
Eclipse: "Do you want comments with that?"

Put briefly; if a user wants a generator to override these principles, let him do it. If he wants to overwrite existing files, give him a --force flag. If he wants to turn off comments in generated code, give him a checkbox to click.

You get the idea.

Users want different output in different situations, so the ability to tweak the generators behavior could easily come in handy.

Exception: YAGNI
You ain't gonna need it.

There's seldomly a need to allow tweaking of every thinkably incarnation of these four principles. This is especially true for ad hoc generators that themselves are going to be thrown away shortly after their inception.

I suppose the general rule is that tweakability is good, but there's no need to overdo it, and that basicaly depends on the situation; who's the generator for? what domain is it operating in? what's its life expectancy?

So there we have it
Four principles that, when implemented in a code generator, should make working with it slightly (or considerably, depending on how it is without) more pleasant.

Official: Haskell is the Best Language

, , ,

When having few lines of code is more important than runtime and memory performance, and when the rage is all about binary trees, nsieve-bits, charmeneos, recursion and regex-dna, then Haskell is the language for you.

The proof is right here in the computer language shootout - completely objective, and undestorted linguistic statistics!

Rise of functional (article)

, , ,

I came upon this article by Pat Eyler of Linux Journal:
http://www.linuxjournal.com/node/1000217/print

A nice little read that suggests that my prediction about Haskell getting a come back ain't entirely off, rather, this guy is seeing pretty much the same patterns - he only did it before I did, plus he's putting his money on Erlang whereas I have a gut feeling about Haskell.

And while I'm at it, here's a bonus link to an article that goes beyond static/dynamic and strong/loose in discussing how to discuss types: http://cdsmith.twu.net/types.html

A programmin language is (at some point in the future) born.

, , , ...

Like the evil genious that I am, *cough-cough*, I'm conducting secret experiments in programming language design in my spare time.

It's not like I know what I'm doing; by no means do I have any formal qualifications in language design - pretty much I'm just having fun.

Anyway, I've spend a lot of time pondering how the syntax should look like and what kind of language it should be, but since I've barely started on the parser (the lexer is quite far, on the other hand), I think it'd be way premature to go into those details - let it just be told that brevity and conciceness are key.

Instead, let me tease you with some code snips that shows off some syntax and functionality that I'de like to make possible (did I mention that I've barely started on the parser?).

First, given a list of objects who has a description method returning a string, how do we concatenate all the non-empty descriptions with newlines in between?
mylist[@x:x.description()][!=''].join('\n')
Of couse there are languages that can do this in less code, but somehow I don't feel like taking on the beast that is APL :wink:

From left to right, we first have the "mylist" identifier that refers to our source list of objects.

Then comes a map expression that, for each element 'x' in 'mylist', injects the value of 'x.description()' into a new list. Map expressions are destinguished by the AT sign and should contain an expression that will evaluate to a function object - in this instance, a function literal.

Next up is the funky fella that is a filter expression. They are basically an operator followed by some expression, contained in square brackets. Each element in a source list is placed in front of the operator and the completed expression is evaluated, and if it amounts to boolean True, then the element in question is injected into a new list. Filter expressions are somewhat limited in that they are required to contain that leading operator - for those needing more power than that, the good ol' trusty filter() function exists.

Finally we have a list of interesting descriptions and we call the join() method on that list. This will cause the list to concatenate the string version of its elements into one big string, where each element is seperated by the string that is parsed to the join() method as an argument.

All in all, pretty simple.

The second example I want to show you is the infinite fibonacci sequence, but before we jump to the beef, let's just recap what such a thing looks like in, say, Python:
def fib():
    x,y=0,1
    yield x; yield y
    while True: x,y = y,x+y; yield y
Four lines and a generator function later, the fib is ready for prime time.

Wait, four lines? Surely Haskell can do that better!
fib :: [Int]
fib = 0 : 1 : [ a + b | (a, b) <- zip fibs (tail fibs)]
Down to two lines, not bad! I can't even begin to imagine what this would look like in Java.

Alright, I've held it off long enough, let's see how it looks like in my mysterious language X (I have yet to come up with a good name for it - suggestions are welcome):
fib = [0,1,[-2:].sum..]
I think I'm just gonna let that code speak for itself... :wink: (because too many things are going on under the hood and, as I said, I don't want to go into details just yet)

Prediction: Haskell gets a come-back.

, , ,

I have a gut feeling, that the functional language paradigme is on the return.

We went from procedural C to an object-oriented paradigme with C++. This paradigme was refined and improved in Java, and later, in C#.

Now we see the rise of the dynamic languages; Ruby and Python. They augment the object-oriented way by loosening up the type system and give us developers a glipse and a sip of the taste of functional.

Function objects? Lambda expressions? List comprehensions? Closures? We kinda like these new old toys - even Java looks to reinvent itself with closures (at least some wishes it that way, but I've not noticed any JSR yet).

The net result is that we tasted functional and we want more. For the time being, we'll be making due with Ruby, Python and, eventually, Java-with-closures. But these technologies will only take us so far, and wont last forever.

I believe that in two or three years time, Haskell will enjoy much the same popularity that Ruby has now. On the five-year horison, it might top at one-third of the popularity that Java has now, and then start to stagnate and slowly decline again.

Since Haskell is purely functional, I don't think it'll ever reach the popularity of Java and C#. I don't think we'll be able to abandon the object-oriented paradigme the same way we abandoned procedural programming.
We want the power of both worlds, plus an excellent console/intepreted mode and dynamicy in the type system.

I think Haskell will eventually fall short of all these requirements, but in its fall, it will provide the seed to the next generation language - a language that probably haven't been invented yet (or is in the works on some garage-geeks laptop :wink: ).
I think it is this next-gen language that will put Java and C# in their respective graves (which is to catch the hearts and minds of developers world wide) - so both of these camps have nothing to worry about until then (that's right, Java people, Ruby is not going to kill you and neither is .NET :wink: ).

And while I'm at it, this next language may not be one but several languages. We've been headed for a diversity in programming languages; right tool for the right problem - and I don't see this development go away anytime soon, if ever.

So, why do I think that Haskell, of all choices, will be the functional that gets a come-back? Well, first of all, Lisp is too crude and is pretty much the functional Fortran. Other languages are, well, unkown and don't have the same maturity and wealth in existing libraries to offer.
As far as I can see, the closest contender will be Erlang, but I know very little about this language so it's hard to say how it'll fare - it just seems to be the only other option on the horison.
Haskell, on the other hand, is mature, well tested and has the minimum of libraries to be useful in modern projects. And finally, this book will be the spark that lights the flame.

So, clear as mud, this is my prediction for the near to midterm future of programming languages. Feel free to leave thoughts in the comments :smile:

Good Riddance, PermGen OutOfMemoryError !

, , , ...

Hi, you're probably here because you are hot-deploying some Java web applications and you have problems with some PermGen: OutOfMemoryErrors. Then you somehow found a link to an article that claimed to solve this very problem.

Well, that article was wrong. So I put up a big red note, but people still link to it.

So, in an attempt to set things straight again, I present a short list of facts that I have collected since I wrote the original article:

  • Bugs in the Sun garbage collector is one source of this problem.
  • There are (were? can't be arsed to check) more than one bug, hence confusion.
  • Your app failing to undeploy completely will leak all classes in your app, every time you redeploy - this is another source of the problem.
  • None of these sources can be fixed with GC flags.
  • The JRocket JVM does NOT fix this problem - it simply lets the permgen grow unbounded, until something bad happens, like swap-thrashing or the OS kills it.

The only solution I have found that is feasible, reliable and portable is this: reboot your servers!

P.S. Don't bother giving me links to other articles on this issue. Even if you happen to find something I haven't already read, chances are that it is simply rehashing stuff I already know.

Thanks!


Bellow this line is the original article with all it's faulty conclusions.
-----------------------------------------------------------------------------------------------------------------------
I did it!

Yes, lads & lasses! If you've been annoyed with having JBoss or Tomcat die with an OutOfMemoryError every fifth time you redeploy your beloved brainchild of a web application, then this is your lucky day! Because I found a fix! It's true! Yay! ....!!!!!11one (can you tell this has been a pain to me?)

Boring but Serious Theory and Hypothesis part:
The "PermGen" error happens, when the Java virtual machine runs out of memory in the permanent generation. Recall that Java has a generational garbage collector, with four generations: eden, young, old and permanent.

In the eden generation, objects are very short lived and garbage collection is swift and often.

The young generation consists of objects that survived the eden generation (or was pushed down to young because the eden generation was full at the time of allocation), garbage collection in the young generation is less frequent but still happens at quite regular intervals (provided that your application actually does something and allocates objects every now and then).

The old generation, well, you figured it. It contains objects that survived the young generation, or have been pushed down, and garbage collection is even less infrequent but can still happen.

And finally, the permanent generation. This is for objects that the virtual machine has decided to endorse with eternal life - which is precicely the core of the problem. Objects in the permanent generation are never garbage collected; that is, under normal circumstances when the jvm is started with normal command line parameters.

So what happens when you redeploy your web application is, that your WAR file is unpacked and its class files loaded into the jvm. And here's the thing: almost always ends up in the permanent generation... Because, seriously, who wants to garbage collect their classes?!? Well, apparently application servers do, and here's how we make that happen;

PermGen, The Fix:
The standard garbage collector can't collect in the permanent generation, but the concurrent collector can. So the first thing we need to do is to make sure that the jvm uses the concurrent garbage collector. This is done by putting this:

> -XX:+UseConcMarkSweepGC

In java's command line arguments. But this is not enough. We must also specifically tell it to collect in the permanent generation, and this is done with this command line argument:

> -XX:+CMSPermGenSweepingEnabled

Good, now the concurrent collector will take the permantent generation under its wings. But wait! Classes are special, and the jvm is reluctant to let go of them, so we must also explicitly allow classes to be unloaded:

> -XX:+CMSClassUnloadingEnabled

Now we're certain that the permanent generation will be properly cleaned. But this raises another issue: what if the jvm unloads classes that might still be needed? I imagine it can be hard for a collector to tell whether or not a class might still be needed with the amount of reflection that goes on in an application server. Therefor, we might want to tweak the amount of memory allocated for the permanent generation, and this is done with this command line parameter:

> -XX:MaxPermSize=128m

Which will set the maximum size of our permanent generation to 128 megabytes - tweak it to fit your needs.

With these parameters properly applied to the jvm that runs your application server, your chances of running into a PermGen OutOfMemoryError will be considerably lessened.

Take care!

Josh on Good API Design

This is an interesting talk by Josh Block on how to get API design right: http://www.infoq.com/presentations/effective-api-design

Some key points from the talk:
  • Have a good power to weight ratio. Get a lot of power without excessive bloat, but not too much.
  • When ever in doubt about a feature, omit it. You can always add it later if people need it, but you can't take features out.
  • Fail fast. As soon as someone makes a mistake, blow up so they'll know what went wrong.
  • Avoid returning information in Strings. People will end up parsing the String, so when ever you add information to this String, or change the way it's structured, you will break these parsers.
  • Make the simple things easy, and to complex things posible.
  • Design for inheritence, or prevent it.
  • Be consistent in naming, metaphors and ordering of parameters.