Skip navigation.

Open Life

Opendocuments, Web Office, Office suites

Posts tagged with "XML"

Parsing XML with OOoBasic

, , ,

OpenOffice.org is loaded with a full IDE and a language that even if it looks like a toy language. This weekend I have been reviweing a lot of code on OOoBasic and found that OOoBasic is a powerful script. One of the things that show the power of a very high level OOoBasic is parsing an XML file. Since OOo is made from XML it seems glorious that OOo could autoconfigure itself.

First is the powerful UNO framework which is lives in the inner pieces of OpenOffice.org. The UNO interface is conformed by interfaces, services and methods. The cool thing is the wide array of interfaces that OOoBasic can use and manipulate.

OOoBasic can parse XML on different ways, from SAX which is a smaller and simpler stream parser to a DOM parsing which will be a more indepth parsing based on the Document Object Model. This are both on the XML Module with hundred of tools that will be able to configure and reconfigure the code.

So here is the code that I was working with. First I needed to get an XML file, the file was a simple employee document.
<Employees>
   <Employee id="101">
       <Name>
          <First>John</First>
          <Last>Smith</Last>
       </Name>
       <Phone type="Home">785-555-1234</Phone>
   </Employee>
</Employees>


Then here is the first stage of the code, we basically load the XML by the first interface which is the one that deals with external file manipulation:
Sub Main
   cXmlFile = "/home/user/tmp/test.xml"
   
   cXmlUrl = ConvertToURL( cXmlFile )
   
   ReadXmlFromUrl( cXmlUrl )
End Sub


We first create the ConverToURL function that will basically make the path to the file get used like a URL and then execute the function ReadXmlFromUrl that we will show next:

Sub ReadXmlFromUrl( cUrl )
   oSFA = createUnoService( "com.sun.star.ucb.SimpleFileAccess" )
   oInputStream = oSFA.openFileRead( cUrl )
   ReadXmlFromInputStream( oInputStream )
   oInputStream.closeInput()
End Sub


This function use the SimplefileAccess to generate a Service using the createUnoSerive using the interface from the API. Then we will execute one of the methods called openFileRead this will get the file and to a variable and then implement the ReadXmlFromInputStream. Finally we close the the file using closeUput.

The next function is the ReadXmlFromInputStream, this is the one in charge of reading the XML.

Sub ReadXmlFromInputStream( oInputStream )
   oSaxParser = createUnoService( "com.sun.star.xml.sax.Parser" )
   oDocEventsHandler = CreateDocumentHandler()
   oSaxParser.setDocumentHandler( oDocEventsHandler )
   oInputSource = createUnoStruct( "com.sun.star.xml.sax.InputSource" )
   With oInputSource
      .aInputStream = oInputStream 
   End With
   oSaxParser.parseStream( oInputSource )
End Sub


This is the second function that is supposed to read the XML and will execute the parser itself. First we call the Parser service into a variable called oSaxParser. Then we have the CreateDocumentHandler then the parser will get the setDocumentHandler function.

Private goLocator As Object
Private glLocatorSet As Boolean


We build an object as goLocator and make it as a boolean object, we later assign it to false under the DocumentHandler. We need to create the service for XDocumentHandler first.

Function CreateDocumentHandler()
   oDocHandler = CreateUnoListener( "DocHandler_",_
                                    "com.sun.star.xml.sax.XDocumentHandler" )
   glLocatorSet = False
   CreateDocumentHandler() = oDocHandler
End Function


Finally we have a series of functions where we specified the DocumentHandler to print out on the different elements of the XML. By default I comment all this handlers except for the character which is the one that specified the content. Unfortunately print will not just report the visible content such as John but all the invisible characters such as spaces, end of line and tab keys..


Sub DocHandler_startDocument()
'   Print "Start document"
End Sub

Sub DocHandler_endDocument()
'   Print "End document"
End Sub

Sub DocHandler_startElement( cName As String, oAttributes As _
                             com.sun.star.xml.sax.XAttributeList )
'    Print cName
'Print oAttributes.Length
End Sub

Sub DocHandler_endElement( cName As String )
'   Print "End element", cName
End Sub

Sub DocHandler_characters( cChars As String )
Print "Contenido:",cChars
End Sub 

Sub DocHandler_ignorableWhitespace( cWhitespace As String )
'Print cWhitespace
End Sub

Sub DocHandler_processingInstruction( cTarget As String, cData As String )
End Sub


Sub DocHandler_setDocumentLocator( oLocator As com.sun.star.xml.sax.XLocator )
   goLocator = oLocator
   glLocatorSet = True
End Sub

Debo admitir que el proceso no es muy claro todavia pero haciendo una revision se puede ver que queremos 3 cosas:
  1. Invocar el servicio de parseo de XML
  2. Enviar nuestro llamado a una ventana dentro de OOo

XML is the future

, , , ...

Having seen Svante Schubert talk about XML filters within OpenOffice.org the light bulbs always go on. Basically the theory that OOo might become an incredible WYIWYG XML editor to develop process and deploy schemas across the board is facinating.

Schemas such as ebXML, UBL, LegalXML and many others will be able to have a front-view thanks to the development of these filters.

OOo might find a nitch which includes the standarization of business process across companies, web services or servers that simply this process, assemble reports and finally make the OpenDocument format a format not restricted to the Office suite but also a default format for accesing and viewing data across the board.

We do still need a common-ground framework and eventually have the XML schema itself mature a bit more. I can see a business opportunity to develop ODF based components software and services to consume this data on a rather added value services.

On other news is actually intersting seen engineers from Sun and IBM working hand in hand to make OOo fly.

Bob and Svante
December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31