miscoded

the web is a hack

How to cook tag soup with XSLT

,

Working for Opera Software's QA department gives you in-depth perspectives on the web's wild and varied coding practises. I still wasn't prepared for the curious solutions that power the menu on the new Israeli rail website. The XSLT markup/programming language is widely used to transform one sort of DOM into another - for example turning the DOM of a generic XML file into valid XHTML. Much of the benefit is that you're working on DOM trees - making it hard or impossible to create syntactically invalid pages. Diving into the source code shows that the JavaScript coders working on the Rail site were asleep during their education's "what's the point of XSLT" lesson. The coding is unbelievable. It's more like an XML parser/serializer stress test than a production site. Now, I don't really know XSLT and trying to debug this confirms my impression that it must be one of the worse programming languages mankind has invented - but the point of this script is to generate HTML with XSLT *string concatenation*?!?? Look at this:
<xsl:value-of select="$attribute-name"/>="<xsl:call-template name="inner-attribute-text-value"><xsl:with-param name="attribute-value" select="$attribute-value"/></xsl:call-template>"
or
<xsl:template name="inner-text-tag-open"><xsl:text disable-output-escaping="yes"><</xsl:text></xsl:template>
<xsl:template name="inner-text-element-close">
<xsl:param name="element-name"/><xsl:call-template name="inner-text-tag-open"/>/<xsl:value-of select="$element-name"/><xsl:call-template name="inner-text-tag-close"/></xsl:template>
<xsl:template name="inner-text-tag-close"><xsl:text disable-output-escaping="yes">></xsl:text></xsl:template>
Yes, all that to create a text node containing e.g.
</div>
in a DOM they will serialize only to parse it again by setting innerHTML on some poor element.. When they in their wisdom chose to generate markup inside text nodes with their XSLT they run into the familiar problem: when is < going to start a tag and when is it going to live in a text node? Hence, < is sometimes escaped as an 'lt' entity to create proper text nodes with HTML source-as-text in them (see for example the instance of
&lt;
in the code above). Now, of course when they set innerHTML they do not want this entity to appear as a literal < so they do some pre-processing: all entities they want to change into proper < and > before setting innerHTML have a comment node next to them:
&lt;TR class="nw-2r"&gt;&lt;TD class="nw-2c"&gt;
and their pre-processing is a simple string replace:
sHtml = sHtml.replace(/\<!--nwlt--\>&lt;/g,"<").replace(/&gt;\<!--nwgt--\>/g,">").replace(/\<[\/]?tbody\>/gi,"");
(Why they hate the poor TBODY so much they must strip it from the markup even though the browser will re-generate them in the DOM as soon as innerHTML is parsed I can't even begin to imagine.) If you thought XML-based toolchains and processes were going to make the Web a saner place, think again. We have now seen that in the right hands, XSLT is just another recipe for tag soup.

Rabobank trusts only RabokeysMy O statuses

Comments

Hallvord R. M. Steenhallvors Wednesday, September 17, 2008 2:54:32 PM

(The broken rendering of the menu in Opera 9.5x is due to their weirdness running into a known XSLT bug - it's fixed internally already.)

This story is also sent to the Daily WFT. We'll see if it is found worthy wink

Tenno Seremeltenno-seremel Wednesday, September 17, 2008 5:39:02 PM

>Now, I don't really know XSLT and trying to debug this confirms my impression that it must be one of the worse programming languages mankind has invented

Well, it's not *that* bad language, it's just you'll need to make some things outside of XSLT. I use it client-side on my static site, though it's kinda hard for now — IE don't understand real XHTML (using xsl:copy-of + xhtml in proper namespace = garbage), Fx don't understand disable-output-escaping attribute, so...

Martin RauscherHades32 Wednesday, September 17, 2008 6:44:38 PM

Actually I find XSLT quite cool. I find it cool that XML can be turing-complete bigsmile

theoddbod Wednesday, September 17, 2008 9:27:18 PM

Looks like somebody at Israeli Rail got bored with doing things the same way - it's got "I wonder if I could do it like this" written all over it.

MyOpera team, please fix this!fearphage Thursday, September 18, 2008 12:20:03 AM

I agree that XSLT is great and powerful when wielded correctly and in the right hands. Here is a moderately useful sample of xslt.

João EirasxErath Thursday, September 18, 2008 7:42:39 AM

Just because you find code soup and bad programmer does not imply the technology is bad. Actually, xslt is quite powerful and flexible. You just looked at the wrong code.

johnnysaucepn Thursday, September 18, 2008 8:41:14 AM

@xErath, he did say 'the right hands', by which I think he means 'the wrong hands'. The point being that you can really screw things up in any language, no matter how much structure they put in to guide you!

Christian WaldeXenoFur Friday, September 19, 2008 12:55:20 PM

Heya, just thought i should let you know that posts like this break the rss reader of Opera hilariously. smile

johnnysaucepn Friday, September 19, 2008 1:19:06 PM

Originally posted by XenoFur:

Heya, just thought i should let you know that posts like this break the rss reader of Opera hilariously.


Really? Looks fine to me?

Christian WaldeXenoFur Friday, September 19, 2008 1:48:38 PM

This is how it looks like for me: http://i34.tinypic.com/ta4xtz.png

Tenno Seremeltenno-seremel Friday, September 19, 2008 1:50:52 PM

XenoFur, johnnysaucepn: hmm, are you both use RSS or ATOM feed?

johnnysaucepn Friday, September 19, 2008 3:07:49 PM

Atom for me. I'm using the default Opera font settings on Win XP, if that makes any difference.

Tenno Seremeltenno-seremel Friday, September 19, 2008 3:14:23 PM

I suspect that RSS feed needs additional character escaping then. Since XenoFur use RSS (or so it seems).

Nick Fitzsimonsnickfitz Friday, September 19, 2008 5:34:08 PM

no

XSLT is a wonderful language once you understand the principles of declarative programming.

Not only do these bozos lack such understanding, they appear to have not the faintest grasp of any part of the language whatsoever awww

I've worked with XSLT since it first appeared, I've been active on the XSLT mailing list, I've helped other people in client companies come to terms with it - but I can honestly say that I have never seen anything as appalling as this. Many thanks for sharing it - I now have the definitive example of how not to use the language smile

Christian WaldeXenoFur Friday, September 19, 2008 6:37:17 PM

You're right Haruka, i'm using the RSS feed and Atom does work properly for me. Personally i have no idea which format is "better" though. ^^

Tenno Seremeltenno-seremel Friday, September 19, 2008 6:58:39 PM

As for me I somewhat prefer Atom feeds. There is some info in Wikipedia:

Content Model

RSS 2.0 may contain either plain text or escaped HTML as a payload, with no way to indicate which of the two is provided. Atom, on the other hand, provides a mechanism to explicitly and unambiguously label the type of content being provided by the entry, and allows for a broad variety of payload types including plain text, escaped HTML, XHTML, XML, Base64-encoded binary, and references to external content such as documents, video, audio streams, and so forth.

http://en.wikipedia.org/wiki/ATOM#Atom_compared_to_RSS_2.0

Well, since I write them by hand sometimes I choosed Atom.

Hallvord R. M. Steenhallvors Friday, September 19, 2008 8:02:38 PM

I should not dismiss a language I don't know well so easily. A minor apology to all commenters who took offense and pointed out that XSLT can be useful in the right hands (without irony in "right" this time).

Guess I'll have to bother somebody about this invalid RSS feed. Reading the post again I also realised that the point where it refers to the entity in the code above makes no sense since at some point during preview/re-editing that entity was itself turned into an angle bracket. Shall we suggest My Opera implements a live preview during blog editing, powered by XSLT string gymnastics?

Well, maybe not.. wink

Robin_reala Monday, September 29, 2008 3:12:16 PM

Just make a stand and dump RSS. People shouldn't need to choose between feed formats, and I can't think of a reader that doesn't support Atom these days.

Omega JuniorOmegaJunior Tuesday, November 18, 2008 11:50:56 PM

Speaking of XSLT... one of my customers recently claimed they didn't know XSLT and dismissed it because they thought it was too new a technology. Obviously they didn't know it's been around for 10 years. It's the same customer who also claimed Javascript to be buggy, and claimed not to know any HTML... they build web sites using MS Visual Studio.

Ah, ignorance precedes judgement in such interesting ways.

Write a comment

You must be logged in to write a comment. If you're not a registered member, please sign up.