Thursday, 26. April 2007, 10:33:39
Yay, xslt, the holy grail of halfway between static and dynamic content distribution. Or so it seems.
I've been trying to wrap my head around xslt for a couple months now. There's plenty of primers out there, no need for me to reprise that. But there is some stuff I found missing. Some stuff I found out the hard way. Today I share it with you, so you don't have to spend hours looking for it.
Xslt can be used to transform an existing data source into a different format. It can add pieces, remove, sort, reshuffle, count, translate and more. In my case, I have
sitemap data in an xml file that needs to be transformed into 6 different formats:
1) an html sitemap web page
2) a Google sitemap xml file
3) an html snippet for reuse in a php template
4) a javascript snippet for reuse in html pages
5) a JSON object for a portal
6) a list of keywords for navigating the portal
And this needs to be done similarly for all (currently 7) sub sites on my network.
Real-time executionXslt can be invoked at run-time. Very much like an html page calls up a css file, an xml file can call up an xslt. This would be lovely, if my site were to be changed so often that it would need a real-time update. It doesn't need this at all, hence all the real-time execution does, is take up rendering time that is of no use to the visitor whatsoever. Wasted.
CachingSo instead we cache (read: save to disk) the 6 files. A php script runs the sitemap.xml through 6 different xslts and creates 6 files in the process. I don't have to schedule this script: since I know when I add another article to my site, it's easy enough to call up the php script and recreate the files. Process once, read often.
DoctypesBecause I want to turn my sitemap into an html page as well, and because I want my web pages to comply to W3 recommendations, I needed a doctype. Luckily, the xslt language houses the xsl:output instruction.
This instruction allows us to set which doctype we wish to generate in our output. That helps because that way we don't need to add it as character data. Here's an example:
<xsl:output
method="html"
omit-xml-declaration="yes"
media-type="text/html"
encoding="windows-1252"
doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/REC-html40/loose.dtd"
indent="yes"
/>
Now what my specific xslt parser (built into php 4) does is baffling: it adds the doctype before the very first
generated element. Generated. Not some text element that is copied into the output, not some text (character data) you wrote in between the xslt instructions, no.
So this:
<xsl:template match="/urlset">
<html><body><ul>
<xsl:for-each match="url"><li><xsl:value-of select="." /></li></xsl:for-each>
</ul></body></html>
</xsl:template>
results in this:
<html><body><ul>
<!DOCTYPE .....>
<li>bla bla bla</li>
</ul></body></html>
To prevent this from happening, stop inserting text instead of html elements and use xsl:element as the first child of xsl:template, like this:
<xsl:template match="/urlset">
<xsl:element name="html"><xsl:element name="body"><xsl:element name="ul">
<xsl:for-each match="url"><li><xsl:value-of select="." /></li></xsl:for-each>
</xsl:element></xsl:element></xsl:element>
</xsl:template>
which then results in what we like to see:
<!DOCTYPE .....>
<html><body><ul>
<li>bla bla bla</li>
</ul></body></html>
Use real elements and real attributesAnd while you're at it, stop pretending to create html and start creating real html. In my case, I want to transform a bunch of locations to a list of hyperlinks. I started out by doing something like this:
<xsl:for-each match="url">
<li><a href="<xsl:value-of select="loc" />"><xsl:value-of select="title" /></a></li>
</xsl:for-each>
Do not do this at home!
Why? Because by using the correct xsl:output, the parser will translate our fake mark-up into html entities, resulting in
& lt; li & gt; & lt; a href= & quot; http://....& quot; & gt; link & lt; /a & gt; & lt; /li & gt;
(Added some spaces to go around the auto-formatting of this board software.)
Without using xsl:output, the output will be left to the whim of the parser, so it might do a translation, or it might keep what we wrote... but we won't be sure.
Secondly, using fake mark-up prevents the parser from applying its wonders on the web addresses, prevents it from automatically escaping weird characters that shouldn't be in web addresses at all. We really don't want to do this work by hand, so in order to have the parser do this for us, we have to use real elements and real attributes.
Like so:
<xsl:for-each select="url">
<xsl:element name="li">
<xsl:element name="a">
<xsl:attribute name="href"><xsl:value-of select="loc"/></xsl:attribute>
<xsl:attribute name="title"><xsl:value-of select="desc"/></xsl:attribute>
<xsl:attribute name="lang"><xsl:value-of select="lang"/></xsl:attribute>
<xsl:value-of select="title"/>
</xsl:element>
</xsl:element>
</xsl:for-each>
This will now create a nice li-element with an a-element with a nicely escaped href-attribute.
That's it for now, I'll add more when I happen on other stuff.