Microformat Authoring Not Necessarily Easy
Tuesday, 26. February 2008, 11:58:22
A couple of weeks ago Danny blogged about an Amaya hack that made it easier to insert microformat class names into an HTML document. It's a neat little trick, but the title of the post ("Easy microformat authoring") only reinforces the received wisdom that microformats are easily implemented, especially relative to something like RDF. Predictably this issue raised its head at SemanticCamp in London and led to a brief intellectual scuffle that sadly fizzled out without any real conclusions being reached. I sensed that Premasagar "got" it - he seems like a pretty smart guy - but there seemed to be a lot of microformats enthusiasts suffering from a kind of weapon focus: someone lunges at you demanding data interoperability, but you don't properly take in their face or fully assess the situation because you're focusing on the microformat they're holding in their hand.
In my experience this view that microformats are easy is a myth. It may be trivial to construct snippets of HTML marked up with microformats, but what I found when implementing hReview in Revyu.com is that adding the appropriate classes to the kind of code that exists in the wild is anything but easy.
In most cases it was not adding the class names themselves that was the problem (although not even the hReview "spec" seems to know what the semantics of "url" actually are). The big issue was getting the structure right. Despite the claim that microformats are for "humans first, machines second", checking that I'd applied the right classes to the right elements within my HTML source required me to think like an HTML parser in order to check that elements were correctly nested and therefore reflected the meaning I intended.
After a couple of hours of peering at the hReview classes in my HTML I was fairly confident that I'd got the structure right, but wanted some validation. So I went in search of a microformats validator. This was quite funny. Apparently nothing of the sort exists, then or now. The best answer I got was to run my hReview through an XSL transformation and check that the RDF/XML that came out the other side looked OK. Excuse me while I choke on my coffee.
Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash.
Maybe none of this matters. Maybe the Web can handle microformat garbage just like it handles so much other rubbish. What really drives me mad are the claims that microformats are up to the same jobs as RDF, and so much easier to implement.
The "humans first, machines second" claim is perverse. What my little anecdote suggests is that, in spite of these claims, microformats are neither easy to use for humans, or particularly likely to yield much reliable data for machines.
In my experience this view that microformats are easy is a myth. It may be trivial to construct snippets of HTML marked up with microformats, but what I found when implementing hReview in Revyu.com is that adding the appropriate classes to the kind of code that exists in the wild is anything but easy.
In most cases it was not adding the class names themselves that was the problem (although not even the hReview "spec" seems to know what the semantics of "url" actually are). The big issue was getting the structure right. Despite the claim that microformats are for "humans first, machines second", checking that I'd applied the right classes to the right elements within my HTML source required me to think like an HTML parser in order to check that elements were correctly nested and therefore reflected the meaning I intended.
After a couple of hours of peering at the hReview classes in my HTML I was fairly confident that I'd got the structure right, but wanted some validation. So I went in search of a microformats validator. This was quite funny. Apparently nothing of the sort exists, then or now. The best answer I got was to run my hReview through an XSL transformation and check that the RDF/XML that came out the other side looked OK. Excuse me while I choke on my coffee.
Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash.
Maybe none of this matters. Maybe the Web can handle microformat garbage just like it handles so much other rubbish. What really drives me mad are the claims that microformats are up to the same jobs as RDF, and so much easier to implement.
The "humans first, machines second" claim is perverse. What my little anecdote suggests is that, in spite of these claims, microformats are neither easy to use for humans, or particularly likely to yield much reliable data for machines.
Amen!
BTW - This blog system doesn't support OpenID?
Kingsley
The one with the Blog Home Page URL: http://www.openlinksw.com/blog/~kidehen :-)
By kidehen, # 26. February 2008, 14:09:16
My own anecdotal evidence has been that adding microformats (specifically hCal and hCard) has been trivial. Implementing hReview may be more difficult, I can't really judge from the spec.
I do agree that no underlying data model makes it harder than it could be, and that the specs are poorly written/difficult to understand in places, but in my experience this view that microformats are easy is true.
As far as I can see, you implement uF, RDF or both depending on what your use case is.
By philwilson, # 26. February 2008, 16:52:10
I failed and I don't want to look back (unless I'll improve my regular expressions). The results I was getting were just so bad, no matter what logic you specify.
Format validator should be released along with every new format specification. Specification without a validator is good for .. well you know what I mean now.
By vizualbod, # 28. April 2008, 15:09:08
Thanks for your comment - interesting to hear about some more real-world experiences. If you get the chance I'd love to read about it in some more detail.
Cheers,
Tom.
By tomheath, # 12. May 2008, 11:12:30