Skip navigation

Sign up | Lost password? | Help

Tom Heath's Displacement Activities

Posts tagged with "revyu"

Microformat Authoring Not Necessarily Easy

, , , ...

A couple of weeks ago Danny blogged about an Amaya hack that made it easier to insert microformat class names into an HTML document. It's a neat little trick, but the title of the post ("Easy microformat authoring") only reinforces the received wisdom that microformats are easily implemented, especially relative to something like RDF. Predictably this issue raised its head at SemanticCamp in London and led to a brief intellectual scuffle that sadly fizzled out without any real conclusions being reached. I sensed that Premasagar "got" it - he seems like a pretty smart guy - but there seemed to be a lot of microformats enthusiasts suffering from a kind of weapon focus: someone lunges at you demanding data interoperability, but you don't properly take in their face or fully assess the situation because you're focusing on the microformat they're holding in their hand.

In my experience this view that microformats are easy is a myth. It may be trivial to construct snippets of HTML marked up with microformats, but what I found when implementing hReview in Revyu.com is that adding the appropriate classes to the kind of code that exists in the wild is anything but easy.

In most cases it was not adding the class names themselves that was the problem (although not even the hReview "spec" seems to know what the semantics of "url" actually are). The big issue was getting the structure right. Despite the claim that microformats are for "humans first, machines second", checking that I'd applied the right classes to the right elements within my HTML source required me to think like an HTML parser in order to check that elements were correctly nested and therefore reflected the meaning I intended.

After a couple of hours of peering at the hReview classes in my HTML I was fairly confident that I'd got the structure right, but wanted some validation. So I went in search of a microformats validator. This was quite funny. Apparently nothing of the sort exists, then or now. The best answer I got was to run my hReview through an XSL transformation and check that the RDF/XML that came out the other side looked OK. Excuse me while I choke on my coffee.

Therein lies the issue with microformats. Without an underlying abstract data model, validation becomes a bit like standing back looking at a used car, kicking the tyres, concluding "yeah, looks alright", and then handing over the cash.

Maybe none of this matters. Maybe the Web can handle microformat garbage just like it handles so much other rubbish. What really drives me mad are the claims that microformats are up to the same jobs as RDF, and so much easier to implement.

The "humans first, machines second" claim is perverse. What my little anecdote suggests is that, in spite of these claims, microformats are neither easy to use for humans, or particularly likely to yield much reliable data for machines.

On the Web, but not *In* the Web

, , , ...

In my recent Talk with Talis podcast, Paul Miller and I got chatting about the conceptual difference between exposing data on the web using Web2.0-style APIs (such as Amazon), and serving up Linked Data (also look here for TimBL's original Design Issues document, which spells out what must rapidly be becoming "the four commandments of Linked Data"). The discussion centers around the "On the Web, but not In the Web" distinction. Kingsley liked the discussion, and suggested it should be blogged for posterity, so here is a transcribed excerpt (starting at 28m41s through the podcast):

Paul Miller: You said that reviews you put into Revyu.com are available on the web as a normal review, and also available on the Semantic Web, to be embedded in other places. Now, how is that different to me doing a review on Amazon, and cutting and pasting it and sticking it into epinions, or my blog, or whatever?

Tom Heath: OK, so, if you do the review in Amazon it will be available on the Web in two ways. It'll be available on the HTML Web for people to browse with their browser, and the review would also be available through the Amazon Web Services API, which means that it is reusable to an extent: I can query the Amazon Web Services API and retrieve that information and do something with it. But this kind of highlights a really key distinction between Web2.0 APIs and the Semantic Web, or the Web of Data, or the Linked Data Web, or however you choose to name it, in that by default if you write a review in Revyu then it's there available, it has a URI, people can make other statements about it, they can reference it in other RDF statements on the Semantic Web, and they can also link to it from the HTML Web.

So, in contrast, if you write a review in Amazon, then the ability to link that review with other bits of information is very limited. You can't necessarily easily say that the review references a certain item or is provided by a certain person, in any way other than embedding this information in XML elements within the results from the Amazon Web Services API. So, this information is available on the Web, but it's not really in the Web, if that distinction makes sense.

It's a distinction that Tim Berners-Lee has, um, well I'm not sure if he's explicitly made the distinction but he always uses the phrase "in the Web" and I never really understood, I never really got why he was using this form of words until recently, when it dawned on me that something being on the Web doesn't really make it in the Web, and I think that's the key distinction between data from Amazon, the Amazon API, or any of the the other Web2.0 kind of APIs, that it's there available on the Web but it's not really in the Web, because it's hard to link it together, which is something that RDF does very well, which XML doesn't really do.

Will Yahoo Pipes Morph into Drag and Drop Semantic Web Mashups?

, , , ...

Over some beers last night in the Cellar Bar, Tony Hirst was talking about Yahoo Pipes. Slightly embarrassingly this was the first I'd heard of them (Tony is my human feedreader - thanks Tony :wink:, but I've just had a quick look, and they're pretty cool. Researchers have been going on for ages about novice programming, which has always seemed to me to be a thankless task, rapidly going nowhere. If it will ever be truly possible, this is how.

The Pipes interface is stunning. I want to use it just because it looks nice. Once you get your head round actually using the interface (why can't I double-click to add modules, as well as being able to drag them?), creating new Pipes is dead easy. Just as a trial run I created a very basic All About Milton Keynes pipe that brings in items tagged milton-keynes on del.icio.us, recent changes from the Open Guide to Milton Keynes, and things mentioning the words "Milton Keynes" from Revyu. It actually not very useful right now (the challenge is now in the novel combinations, not the technical glue), as it mixes different types of items (links to web resources, city guide entries, and reviews) without distinguishing one type from the other. Maybe that's down to me choosing to aggregate all items and sort them alphabetically, but actually the real story is that this is just the limitation of RSS compared to RDF that uses a range of vocabularies.

Annoyingly, as I haven't finished the tag based RSS feeds on Revyu (unlike those that are People-based, like mine) I had to resort to using the generic latest reviews feed and then filtering it by keywords. Not right for a Semantic Web application. Really I just need to put aside an hour or so to finish the tag-based feeds, but I kind of object to having to do that when Revyu already has much richer information already available in RDF/XML. So, there are two ways forward. Do the feeds, and deal with it, or wait until Pipes begins to speak more RDF than just RSS. This is probably a way off, but would be awesome. Being able to create FOAFmap (currently offline) style Semantic Web mashups by dragging much richer data than RSS can provide would do wonders for bridging the gap between the Web2.0 Wow! factor and the web-scale data integration capabilities of the Semantic Web.