More on the SPARQL query engine
By Kjetil Kjernsmo. Friday, 10. February 2006, 15:23:58
Some time ago, we announced the SPARQL query engine. Since then, the amount of data that can be accessed through it has grown with the growth of the Opera Community. Also, more data has been added. Being the first major site to publish such a query engine, it was done both so that the community could experiment with the data and so that we could gain experience on the server side.
The approach I chose was to rely heavily on allready available libraries, mainly Redland, a great library written in C, but with bindings to many other languages. One of its features is that you can insert RDF statements into a model, which again can be stored in many different types of databases. Also, it can take SPARQL queries and return results. Thus, all that was needed to get this running was to take all the data out of the Opera Community databases, create RDF statements from the data and insert it into the Redland RDF model, and create a system between the web server and Redland's SPARQL query interface.
Given that we plow new land it wouldn't come as a surprise that some problems have surfaced, and I've worked on them occasionally. For one thing, it takes about 5 hours and takes a lot of resources to build the model this way. Therefore, I have only been able to renew the model once a week, much less in real time. Also, not all the data are available for query while rebuilding occurs.
I have now addressed the latter problem, but that it takes 5 hours to rebuild persists. I have tried many approaches to find a way to resolve that, which hasn't yet taking me to a solution, but it has provided much more insight into the cause of the problem.
As many have pointed out, the main hurdle in using the SPARQL query engine has been to find out what kind of data is in there, it has been a trail and error thing up to now. With the recent issues ironed out, and with a new version of Redland, I expect to provide an approach to that problem soon.
The approach I chose was to rely heavily on allready available libraries, mainly Redland, a great library written in C, but with bindings to many other languages. One of its features is that you can insert RDF statements into a model, which again can be stored in many different types of databases. Also, it can take SPARQL queries and return results. Thus, all that was needed to get this running was to take all the data out of the Opera Community databases, create RDF statements from the data and insert it into the Redland RDF model, and create a system between the web server and Redland's SPARQL query interface.
Given that we plow new land it wouldn't come as a surprise that some problems have surfaced, and I've worked on them occasionally. For one thing, it takes about 5 hours and takes a lot of resources to build the model this way. Therefore, I have only been able to renew the model once a week, much less in real time. Also, not all the data are available for query while rebuilding occurs.
I have now addressed the latter problem, but that it takes 5 hours to rebuild persists. I have tried many approaches to find a way to resolve that, which hasn't yet taking me to a solution, but it has provided much more insight into the cause of the problem.
As many have pointed out, the main hurdle in using the SPARQL query engine has been to find out what kind of data is in there, it has been a trail and error thing up to now. With the recent issues ironed out, and with a new version of Redland, I expect to provide an approach to that problem soon.













