Skip navigation.

Claws, fangs, fur...

...the bear essentials

Posts tagged with "doctype"

HTML5 and other unknown elements for non-HTML5-browsers

, , , ...

Just to see whether I could make sense of the specification-in-progress, I decided to create HTML5-versions of several of the pages in my website, including its homepage. Conclusion: it's feasable, but I'm sure less experienced web authors could use a "Web Authors' Guide to HTML5" (yup, this counts as prior art p: ).

I proceeded to testing how web browsers by various manufacturors displayed that HTML5-page. The results:
  • Opera 9.27: pass
  • Safari 3.1: pass
  • Firefox 3 (latest nightly build): pass
  • Firefox 2: fail
  • Internet Explorer 7: fail


Haven't tried it in MSIE8 yet... I'm part of the feedback program but I didn't want to risk having that beta application wreaking havoc on my system.

Now what's interesting about that list?

Wait, what? Opera 9.27? Does it know HTML5? It may have some basic understanding but I doubt Opera implemented the entire spec... if only because the spec is being developed as we speak. So how can it pass this test? And how is it that Firefox 2 and Microsoft's excuse for a browser fail?

My best guess: it's because of the way browsers handle unknown elements.

Huh?

Point in case: HTML5 has a SECTION element. With it, an author can divide just about any part of the body in, you guessed it, sections. How should such a section be rendered? Should it be a block? A table row? A floating square? The specification doesn't say... it leaves that directive to CSS.

So what should a browser do, when running into an element it doesn't recognise? It could simply ignore the element... but none of the current major browsers chose that option. Bravely, they try their best at displaying it anyway.

With one major difference:

where some browsers (Opera 9, Safari 3, Firefox 3) display the unknown element and apply CSS and javascript rules to that element as provided by the author, some others (Internet Explorer 7, Firefox 2) display the unknown element bare, meaning without CSS and / or without javascript.

Why? Is it a strategic management decision? Is it developer short-sightedness? Is it a technical difficulty? And if so, in what part of the browser? In the HTML-parser, or in the CSS-parser, or in the javascript parser? In all 3? Somewhere else?

It can't be a technical difficulty. After all, all of the tested browsers can handle XML and do render CSS on XML documents... and XML by definition is riddled with unknown elements.

It shouldn't be a CSS-parser problem, as the CSS that I wrote isn't all that difficult and does work when applied to known elements.

It shouldn't be an HTML-parser problem, as all of the tested browsers are well-adept at handling TAG SOUP, especially Internet Explorer (yay, history!).

So that leaves... that leaves a browser trying to render TAG SOUP using standards-compliant render-mode for a doctype that has no DTD where one is expected. It is one thing to have NO doctype, as is the case with many XML documents. It is quite another, to have a document that claims to have a doctype and then breaks the rules by providing neither a public nor a system identifier... which is normal for HTML5, but not for HTML4 nor the XHTMLs.

So unless a browser is instructed to handle such cases, it will behave in unexpected ways. If anything, the fact that both Internet Explorer 7 and Firefox 2 do display the unknown elements shows that the developers did expect some unexpected authoring and made sure their browsers wouldn't crash, which is quite commendable. That makes the fail largely a strategic management decision.

Conclusion: Microsoft's flagship browser and Firefox 2 don't necessarily fail at HTML5... they simply fail at handling unrecognised doctypes altogether.

XSLT in practice

, , , ...

Yay, xslt, the holy grail of halfway between static and dynamic content distribution. Or so it seems.

I've been trying to wrap my head around xslt for a couple months now. There's plenty of primers out there, no need for me to reprise that. But there is some stuff I found missing. Some stuff I found out the hard way. Today I share it with you, so you don't have to spend hours looking for it.

Xslt can be used to transform an existing data source into a different format. It can add pieces, remove, sort, reshuffle, count, translate and more. In my case, I have sitemap data in an xml file that needs to be transformed into 6 different formats:

1) an html sitemap web page
2) a Google sitemap xml file
3) an html snippet for reuse in a php template
4) a javascript snippet for reuse in html pages
5) a JSON object for a portal
6) a list of keywords for navigating the portal

And this needs to be done similarly for all (currently 7) sub sites on my network.

Real-time execution

Xslt can be invoked at run-time. Very much like an html page calls up a css file, an xml file can call up an xslt. This would be lovely, if my site were to be changed so often that it would need a real-time update. It doesn't need this at all, hence all the real-time execution does, is take up rendering time that is of no use to the visitor whatsoever. Wasted.

Caching

So instead we cache (read: save to disk) the 6 files. A php script runs the sitemap.xml through 6 different xslts and creates 6 files in the process. I don't have to schedule this script: since I know when I add another article to my site, it's easy enough to call up the php script and recreate the files. Process once, read often.

Doctypes

Because I want to turn my sitemap into an html page as well, and because I want my web pages to comply to W3 recommendations, I needed a doctype. Luckily, the xslt language houses the xsl:output instruction.

This instruction allows us to set which doctype we wish to generate in our output. That helps because that way we don't need to add it as character data. Here's an example:
<xsl:output
method="html"
omit-xml-declaration="yes"
media-type="text/html"
encoding="windows-1252"
doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/REC-html40/loose.dtd"
indent="yes" 
/>


Now what my specific xslt parser (built into php 4) does is baffling: it adds the doctype before the very first generated element. Generated. Not some text element that is copied into the output, not some text (character data) you wrote in between the xslt instructions, no.

So this:
<xsl:template match="/urlset">
<html><body><ul>
<xsl:for-each match="url"><li><xsl:value-of select="." /></li></xsl:for-each>
</ul></body></html>
</xsl:template>


results in this:
<html><body><ul>
<!DOCTYPE .....>
<li>bla bla bla</li>
</ul></body></html>


To prevent this from happening, stop inserting text instead of html elements and use xsl:element as the first child of xsl:template, like this:
<xsl:template match="/urlset">
<xsl:element name="html"><xsl:element name="body"><xsl:element name="ul">
<xsl:for-each match="url"><li><xsl:value-of select="." /></li></xsl:for-each>
</xsl:element></xsl:element></xsl:element>
</xsl:template>


which then results in what we like to see:
<!DOCTYPE .....>
<html><body><ul>
<li>bla bla bla</li>
</ul></body></html>


Use real elements and real attributes

And while you're at it, stop pretending to create html and start creating real html. In my case, I want to transform a bunch of locations to a list of hyperlinks. I started out by doing something like this:
<xsl:for-each match="url">
<li><a href="<xsl:value-of select="loc" />"><xsl:value-of select="title" /></a></li>
</xsl:for-each>
Do not do this at home!

Why? Because by using the correct xsl:output, the parser will translate our fake mark-up into html entities, resulting in
& lt; li & gt; & lt; a href= & quot; http://....& quot; & gt; link & lt; /a & gt; & lt; /li & gt;
(Added some spaces to go around the auto-formatting of this board software.)

Without using xsl:output, the output will be left to the whim of the parser, so it might do a translation, or it might keep what we wrote... but we won't be sure.

Secondly, using fake mark-up prevents the parser from applying its wonders on the web addresses, prevents it from automatically escaping weird characters that shouldn't be in web addresses at all. We really don't want to do this work by hand, so in order to have the parser do this for us, we have to use real elements and real attributes.

Like so:
<xsl:for-each select="url">
 <xsl:element name="li">
 <xsl:element name="a">
  <xsl:attribute name="href"><xsl:value-of select="loc"/></xsl:attribute>
  <xsl:attribute name="title"><xsl:value-of select="desc"/></xsl:attribute>
  <xsl:attribute name="lang"><xsl:value-of select="lang"/></xsl:attribute>
  <xsl:value-of select="title"/>
 </xsl:element>
 </xsl:element>
</xsl:for-each>

This will now create a nice li-element with an a-element with a nicely escaped href-attribute.

That's it for now, I'll add more when I happen on other stuff.
Download Opera, the fastest and most secure browser