Semantic Web at Opera

Subscribe to RSS feed

Posts tagged with "SPARQL"

Complexities of tag to vocabulary mapping

, , , ...

"Things should be as simple as possible, but not simpler" -- Albert Einstein



I'm happy that quite a few people have mapped their tags, more than I expected! up Now, many things are somewhat complex by their very nature, sometimes you have deal with complexities to make things that are useful. And even more often, the question is who or what should deal with the complexities. When I created the tag mapper, I realised that there would be conflicting goals, and now I'd like to discuss them with you:

For some terms, you will see that there is a dropdown under "Relation" that contains both "topic" and "depicts". The "depicts" is there when the term is a noun, and the idea is that when a tag is used for a picture, then you can say that the picture depicts a dog, for example. So, it is a way to very directly express that meaning.

Doing it this way would also create really simple SPARQL queries, to get all pictures that depicts a dog would amount to (ignoring the namespaces):
    SELECT ?pics WHERE { ?pics foaf:depicts <http://www.w3.org/2006/03/wn/wn20/instances/wordsense-dog-noun-1> . }

That's about as simple as these queries get.

Also, as I mentioned, the plan is to use tags for content labels, and since I'm Opera's representative on W3C's POWDER working group one of my main concerns is how people can easily tag their content with content labels on sites like Opera Community. Again, the easiest would be if a content label could be directly associated with a picture. For example, we have to live with a certain amount of nudity in the pictures our users upload since we don't want to exercise censorship, we just don't want to push it on random visitors, and we want to facilitate parental control, and that's one of the things content labels will be used for: A standard way to say that a picture contains nudity.


This is made possible by a slightly more complex user interface. Arguably, it would be easier to just map your tags, not also having to decide if it depicts something, or if it points to a content label. If we didn't do this, it would be harder to formulate the SPARQL queries, but more importantly, this would only be the beginning, since you're clearly using the same tag not just for pictures, but also for blog posts, and you wouldn't say that a blog post depicts something. So, for this to really work, we would need different relation types depending on the type of resource, whether it was a picture or a blog.

That's where it gets nasty.

So, I'm wondering if the course I've started out on is unworkable. Perhaps the relationship from a tag to a term like the Wordnet terms should be unchangeable? That would make the user interface simpler, but the queries and other uses would be harder. So, it isn't just a question what's the simplest, rather, it is a question of who should deal with the complexities.

Now, it is important that as many as possible to participate in tag mapping. Not everyone needs to write applications or queries that uses these data, so those wishing to do so are probably better suited to deal with the complexities than all those seeing the tag setup page. On the other hand, it is quite important to make the POWDER specification quite simple too, and this use could add complexity to the specification.

So, your opinions will matter here, please bring them forward!

Then, you might ask, dear guinea pigs psmurf , why I made a complex user interface to begin with, and then ask if I should make it simpler? Well, clearly, I couldn't start discussing this with you if I didn't show you a complex user interface that worked, it would be much harder to explain what I had in mind. So, I figured I might as well do a little research on you. The experiences gain from this is something that will be used to make the right decisions when designing important standards that will be with us from many years to come, and I feel it is important to get this experience now, before those standards are set.

The SPARQL Engine is back up

, ,

There has been many urgent matters that we have attended to upgrading the Opera Community, and therefore, the SPARQL engine fell into neglect for a while, but that was only temporary! It is now back up, with 15 million triples. Be warned however, that not all data may be there, even though it has grown substantially. Also, it is now too big to build the way it has been built in the past, so it needs to be rewritten, which will take some time to do. Thus, it remains an experimental service for now.

Opera's SPARQL Engine on XTech

, , ,

Leigh Dodds, one of the foremost in the RDF and SPARQL field gave a presentation titled "SPARQLing Services" at the XTech conference in Amsterdam. He has also published his slides. In his presentation, Leigh argued why SPARQL could very well power "Web 2.0" with data. In some of his examples, he used Opera's SPARQL engine.

More on the SPARQL query engine

, ,

Some time ago, we announced the SPARQL query engine. Since then, the amount of data that can be accessed through it has grown with the growth of the Opera Community. Also, more data has been added. Being the first major site to publish such a query engine, it was done both so that the community could experiment with the data and so that we could gain experience on the server side.

The approach I chose was to rely heavily on allready available libraries, mainly Redland, a great library written in C, but with bindings to many other languages. One of its features is that you can insert RDF statements into a model, which again can be stored in many different types of databases. Also, it can take SPARQL queries and return results. Thus, all that was needed to get this running was to take all the data out of the Opera Community databases, create RDF statements from the data and insert it into the Redland RDF model, and create a system between the web server and Redland's SPARQL query interface.

Given that we plow new land it wouldn't come as a surprise that some problems have surfaced, and I've worked on them occasionally. For one thing, it takes about 5 hours and takes a lot of resources to build the model this way. Therefore, I have only been able to renew the model once a week, much less in real time. Also, not all the data are available for query while rebuilding occurs.

I have now addressed the latter problem, but that it takes 5 hours to rebuild persists. I have tried many approaches to find a way to resolve that, which hasn't yet taking me to a solution, but it has provided much more insight into the cause of the problem.

As many have pointed out, the main hurdle in using the SPARQL query engine has been to find out what kind of data is in there, it has been a trail and error thing up to now. With the recent issues ironed out, and with a new version of Redland, I expect to provide an approach to that problem soon.