Changes to the behavior of innerHTML in XML documents

In the run-up to the next stable desktop release, you'll notice that a lot of changes are being made to our browser's core engine. Although here on the Developer Relations blog we usually just cherry-pick and explain some of the shinier new additions that fall under the big "HTML5" umbrella, there are also quite often tiny improvements under the hood that remove bugs and browser incompatibilities that don't get much notice...until sites that somehow relied on our previous behaviour start to misbehave.

A recent example of this is the rather innocently titled CORE-4336: Setting innerHTML in XML which was recently shipped in one of our Opera Next snapshots.

Being a hip and happening developer, you may be thinking "XML? HTML5 is where it's at!"...so it may come as a shock to you that in the vast reaches of the web, there are still a sizeable number of sites that use XML/XHTML.

While the release of Opera 12.10 is still a bit away, one of Opera's products that already does include many of these core changes, including CORE-4336, is the newly released Opera Device SDK 3.4, which is used by many TV and set-top box device manufacturers to provide web browsing functionality. And it's on this platform that some of our customers have started to report issues with the latest SDK relating to this particular core change.

In previous versions of Opera's core, innerHTML was quite forgiving when injecting markup into XML/XHTML documents. As with regular HTML pages, when trying to add malformed content, Opera silently error-corrected the injected fragment according to its HTML parsing algorithm.

To see this in action, here's a simple test case using innerHTML to inject the classic <b><i>...</b></i> set of misnested tags into an XHTML document. If you take a peek at the DOM after the page was loaded you'll see how the misnested tags have been silently fixed in the current stable version of Opera, while the same test case will fail in other browsers such as Chrome and Firefox.

Opera Dragonfly showing how the misnested markup has been silently fixed in the DOM

Following the fix to CORE-4336, Opera's core is now aligned with the stricter behavior of other browsers, which has been formally specified in WHATWG's DOM Parsing and Serialization:

In the case of an XML document, [innerHTML] will throw an INVALID_STATE_ERR if the Element cannot be serialized to XML, and a SYNTAX_ERR if the given string is not well-formed.

In the long run, this fix will ensure greater cross-browser compatibility...but obviously, if your XHTML sites start to misbehave and throw errors as a result of this change, the best advice we can give is to ensure that injected markup fragments are sanitised to ensure that they're well-formed XHTML (no misnesting, correct use of quotes around attributes, etc). If this is not possible, a short term (though admittedly quite inelegant) fix would also be to change your site from XHTML to HTML.

Educating Bangladeshi schoolkids about the WebWhat's new in Opera 12.10 beta

Comments

lucideer Thursday, September 13, 2012 9:00:54 PM

Originally posted by patrickhlauke:

Being a hip and happening developer, you may be thinking "XML? HTML5 is where it's at!"...so it may come as a shock to you that in the vast reaches of the web, there are still a sizeable number of sites that use XML/XHTML.


This seems to imply that that "sizeable number of sites" are somehow behind the times and "out of touch" with HTML5, whereas anyone familiar with the spec. will be well aware XML is a *part* of HTML5.

Patrick H. Laukepatrickhlauke Thursday, September 13, 2012 9:14:26 PM

Lucideer...not quite. There is XHTML5 - the XML serialization of HTML5. But to say that XML is part of HTML5 is an incorrect synecdoche. And there are still a large number of sites written to, say, XHTML 1.0 (and written way before HTML5 was even a "thing"), without any HTML5 DOCTYPE. So again, to say that these sites are actually HTML5 is incorrect.

lucideer Saturday, September 15, 2012 1:25:03 AM

Originally posted by patrickhlauke:

There is XHTML5 - the XML serialization of HTML5. But to say that XML is part of HTML5 is an incorrect synecdoche


I'm not sure what you mean by this. The XML Spec. may not be - in it's entirety - included within or subsumed by HTML5, obviously, but the the HTML5 specification is itself entitled "A vocabulary and associated APIs for HTML and XHTML" - XML is most definitely a considered part of the HTML5 spec.

Originally posted by patrickhlauketo:

There is XHTML5 ... XHTML1.0 ... without any HTML5 DOCTYPE. So again, to say that these sites are actually HTML5 is incorrect.


This is technically incorrect; XHTML5 is a buzzword rather than a separate specification in itself - the term isn't referenced anywhere in any spec. that I've read sofar - (although one could certaintly argue that HTML5 itself is a buzzord). The XML serialization of HTML5 (which is within the same HTML5 spec.) explicitly states that a DOCTYPE (of any kind) is neither required nor recognised - as XML processors aren't required to query or parse it.

Patrick H. Laukepatrickhlauke Saturday, September 15, 2012 9:14:42 AM

sigh. fine, so let's agree that sites that were created in XHTML 1.0 about 10 years or so ago and are still prevalently used - as well as sites that are required to use XML-ish markup because they're aimed at, say, HbbTV as well - are also developed by hip and happening developers. and that of course they've been doing HTML5 all along, some of them even without knowing it...because HTML5 grandfathered its own inclusion of old XHTML.

Write a comment

New comments have been disabled for this post.