Tuesday, 21. August 2007, 18:27:13
Code generation, automatic or not, have emerged in numerous places as a productivety enhancing technique or a powerful tool of abstraction, or, most often, both.
It come in quite a number of incarnations, from the obvious to the hidden. Some frameworks, like Rails and CGLIB even have code generation as a core feature.
With such broad applicability, and so many different kinds of code generators, they're bound to not all act the same. This difference in behavior is an entrance to mistakes, and is at odds with the useability principle of least surprise.
I have seen, worked with and written numerous code generators - I can hardly imagine a developer who have not. And looking back on this experience, I'll try to jot down a few simple principles that, when a code generator follows them, will make the experience of working with the generator much more pleasent.
However, with the diversity in application, the're bound to be exceptions to the rules. I'll do my best to also dig up these exceptions, and explain why they are a good idea in the given situation.
So, here follows the Principles of the Well Behaved Code Generator:
1: No destructive update.A code generator should not make any destructive updates. Generated code, like schaffolds in Rails, are intended to releaf the user of burden of writing bioler plate code and leave a fill-in-the-blanks template. Once this is done, the users will often go ahead and... fill in the blanks.
But development is iterative. The foundation from which the code was generated might change over time. Two good examples are generating O/R-mapping code from a database schema, and generating stub-classes from a WSDL in interface-first web service development: both the database schema and the WSDL might change over time as development move through iterations - and when the foundation changes, a good way to keep the generated code up to date it to regenerate it.
But therein lies a catch; what becomes of the filled-in blanks when the code is regenerated? If the generator overwrites the developers careful changes, then it would mean good work is lost into the thin air. Source code management systems and various backup techniques are not withstanding because the generator cannot rely on those features to be available.
Loosing good work is something that we cannot affort; code generators is meant to enhance productivety, not destroy existing productions. There really is only one way around this: code generators must avoid destructive updates at all costs.
If a file we would like to write already exists, give the user a warning. A code generator is not in the business of data backup nor in the business of data shredding. If the file exists, skip it. Then notify the user and
let him decide how to handle the situation. If he want's to back the file up, let him do it. If he want's to delete it, let him delete it. But don't go over his head and make arbitry decisions.
Note that this dosn't mean that you cannot touch his files. If you can make the update to his existing code without breaking his work, then by all means go right ahead! I know the visual GUI editor in JDeveloper has the ability to read back the code it generates after it has been altered by hand, and then make changes to the code without breaking the manual changes.
Exception: Not his filesThe most obvious exception is when the user isn't the logical "owner" of the generated files. Examples include the Java code a JSP compiler would generate on an application server - the user is by no means suppose to fiddle with these files, so destructive updates to these files are OK as nothing is lost by it. Nothing is
suppose to be lost by it anyway. A user making changes to those files had it coming.
Another example is the Java code generated on the fly by Jasper Reports - it is pretty obvious from their hidious style that you aren't suppose to touch those.
2: Exemplary correctnessAs a code generator, there can be no doubt about it: your users are a bunch of unshaved, half-asleep coffee slurpers who a too lazy to write their own code, so we might aswell set the standard and show'em How It's Done(TM).
It's easy to make good code bad, but hard to make bad code good. So since a code generator generally don't have a clue about the quality of the existing code, the generator should produce code of the highest quality possible.
This means that, when applicable, generated code should ...
- Be properly indented.
- Have meaningful variable, method and class names.
- Be consistent with its naming convention, and in line the naming convention of the language/technology/environment.
- Have a good and consistent commenting policy.
- Be readable.
- Make correct use of exception handling.
- Make sane use of logging features.
- Be debuggable.
- Be amenable to testing.
Not all of these make sense in every situation as is depends heavily on the domain in which the code generator operates, but the fundamental point remains: code generators should produce high quality code.
Exeption: It's only run onceI have on several occasions written programs that generated (for the most part) SQL insert scripts. These scripts were only intended to be executed once, and then thrown away.
I don't see much point in going out of your way to care for every little detail in the code when you know it's headed for the bin in a few minutes anyway.
This kind of code just have to be free of any bugs (that will manifest themselves the single time the code is run) and get the job done. Therefor, this is an exeption to the principle of exemplary correctness.
3: Blend inIf there's one thing generated should
not look like, then it's generated code!
This principle is much along the same vain of the previous principle of exemplary correctness, but I still think it bears mention of its own. Especially becouse I have a nice little story to go with it.
I once wrote a
hidious hack code generator for a Java web framework called Rife. This framework came with its own pojo-based ORM framework, and the mapping was (is) written in pure Java in some special Metadata classes. What my nifty little tool did, was to read the source files for a bunch of pojo-beans and then produce and equivelent amount of stub-metadata classes with all the blanks filled with some default values (based on heuristics). In order to make these generated classes blend in with the existing code base, my tool went to great lengths to copy the indentation policy of the pojo-files in the generated metadata-files.
If you used tabs, then it would use tabs, and if you used four spaces then it would use four spaces, and so forth. It didn't copy the placing of curly braces or any other fancy stuff like that, but it did go that extra mile to make the code look a little bit more as if I had written it myself.
More general examples are visual GUI editors and code templates in IDEs. A respectable IDE knows how you like to indent your code and where you like your curly braces, and you'd also think that code generated by an IDE is expected stay around in the code base for a long time and be available for tinkering, so there's simply no excuse for an IDE to not try and replicate your coding style when it generates code.
Exception: Nobody caresI think the exeptions for the first an the second principles also applies to this principle. If you know that nobody's ever going to deal with the code on an permanent basis, then there's no point in polish like this. Besides, it may not even be possible to deduce these policies or heuristics in the given situation.
4: Let the user decideUser: "Hey! I want a class that extends this other class and implements these interfaces."
Eclipse: "Do you want comments with that?"
Put briefly; if a user wants a generator to override these principles, let him do it. If he wants to overwrite existing files, give him a --force flag. If he wants to turn off comments in generated code, give him a checkbox to click.
You get the idea.
Users want different output in different situations, so the ability to tweak the generators behavior could easily come in handy.
Exception: YAGNIYou ain't gonna need it.
There's seldomly a need to allow tweaking of every thinkably incarnation of these four principles. This is especially true for ad hoc generators that themselves are going to be thrown away shortly after their inception.
I suppose the general rule is that tweakability is good, but there's no need to overdo it, and that basicaly depends on the situation; who's the generator for? what domain is it operating in? what's its life expectancy?
So there we have itFour principles that, when implemented in a code generator, should make working with it slightly (or considerably, depending on how it is without) more pleasant.