Tom Heath's Displacement Activities

Microformat Authoring Not Necessarily Easy

, , , , , , ,

A couple of weeks ago Danny blogged about an Amaya hack that made it easier to insert microformat class names into an HTML document. It's a neat little trick, but the title of the post ("Easy microformat authoring") only reinforces the received wisdom that microformats are easily implemented, especially relative to something like RDF. Predictably this issue raised its head at SemanticCamp in London and led to a brief intellectual scuffle that sadly fizzled out without any real conclusions being reached. I sensed that Premasagar "got" it - he seems like a pretty smart guy - but there seemed to be a lot of microformats enthusiasts suffering from a kind of weapon focus: someone lunges at you demanding data interoperability, but you don't properly take in their face or fully assess the situation because you're focusing on the microformat they're holding in their hand.

In my experience this view that microformats are easy is a myth. It may be trivial to construct snippets of HTML marked up with microformats, but what I found when implementing hReview in Revyu.com is that adding the appropriate classes to the kind of code that exists in the wild is anything but easy.

In most cases it was not adding the class names themselves that was the problem (although not even the hReview "spec" seems to know what the semantics of "url" actually are). The big issue was getting the structure right. Despite the claim that microformats are for "humans first, machines second", checking that I'd applied the right classes to the right elements within my HTML source required me to think like an HTML parser in order to check that elements were correctly nested and therefore reflected the meaning I intended.

After a couple of hours of peering at the hReview classes in my HTML I was fairly confident that I'd got the structure right, but wanted some validation. So I went in search of a microformats validator. This was quite funny. Apparently nothing of the sort exists, then or now. The best answer I got was to run my hReview through an XSL transformation and check that the RDF/XML that came out the other side looked OK. Excuse me while I choke on my coffee.

Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash.

Maybe none of this matters. Maybe the Web can handle microformat garbage just like it handles so much other rubbish. What really drives me mad are the claims that microformats are up to the same jobs as RDF, and so much easier to implement.

The "humans first, machines second" claim is perverse. What my little anecdote suggests is that, in spite of these claims, microformats are neither easy to use for humans, or particularly likely to yield much reliable data for machines.

New Mailbox at TalisTim Berners-Lee Talks with Talis, and plugs Linked Data

Comments

kidehen Tuesday, February 26, 2008 2:09:16 PM

Tom,

Amen!

BTW - This blog system doesn't support OpenID?


Kingsley
The one with the Blog Home Page URL: http://www.openlinksw.com/blog/~kidehen :-)

philwilson Tuesday, February 26, 2008 4:52:10 PM

Is there such a thing (for example) as an iCal validator? or a vCard validator? I haven't seen any, and this is a more fundamental issue for me, given that uF are supposed to be able to transform into them.

My own anecdotal evidence has been that adding microformats (specifically hCal and hCard) has been trivial. Implementing hReview may be more difficult, I can't really judge from the spec.

I do agree that no underlying data model makes it harder than it could be, and that the specs are poorly written/difficult to understand in places, but in my experience this view that microformats are easy is true.

As far as I can see, you implement uF, RDF or both depending on what your use case is.

-vizualbod Monday, April 28, 2008 3:09:08 PM

There is no such thing as vCard validator as far as I know. I tried to make one using PHP simpleXML, bundling it with my spider and a quick CodeIgniter App to publish the crawl results in a directory.

I failed and I don't want to look back (unless I'll improve my regular expressions). The results I was getting were just so bad, no matter what logic you specify.

Format validator should be released along with every new format specification. Specification without a validator is good for .. well you know what I mean now.

Tom Heathtomheath Monday, May 12, 2008 11:12:30 AM

Hi visualbod,

Thanks for your comment - interesting to hear about some more real-world experiences. If you get the chance I'd love to read about it in some more detail.

Cheers,

Tom.

TimMinor Thursday, January 8, 2009 5:06:22 PM

I came across an unofficial hCard validator here:

http://hcard.geekhood.net/

You might find it useful,
Tim

alanfluff Monday, July 13, 2009 4:13:50 PM

With heavy heart, for I love the passion and much else about microformats, I must agree with much of what you write -- I don't know anything detail about RDFas so can't comment there, but, I did just try to create an hreview and although it's probably correct in it's construction, it is ironic that something most in need of correct construction is least served, so far, by tools to allow us to check stuff.

Were I more clever and with free time I would love to have a go at making validators for microformats since, as noted, I think their aims are excellent. So far, all I've been able to find and use (and it's excellent for hcard) is the above mentioned
http://hcard.geekhood.net/

Here's hoping that a cleverer version of me with some time free can step-repeat this type of validator or the other microfomats.

engmark Thursday, January 28, 2010 7:50:29 AM

Originally posted by philwilson:

Is there such a thing (for example) as an iCal validator? or a vCard validator?


There is now:
http://sourceforge.net/projects/vcard-module/

It's not complete by anyone's standards, but it checks things like character range, line splits, line breaks, groups, and escaped characters.

Write a comment

You must be logged in to write a comment. If you're not a registered member, please sign up.