karlcow

Opening The Web one bug at a time

Wrong To Be Right - application/xhtml+xml

, , , , , ,

Update 30 March 2011: The issue with Starbucks Web site has been fixed. There are still other sites exhibiting the issue. A fix has been proposed by Rohan Singh

I have always been experimenting with XHTML. When properly served as application/xhtml+xml, it becomes a powerful tool to check the quality of your markup. The browser will throw an error if the code is not well-formed. It doesn’t solve the accessibility or semantics issues, but it helps a bit for constraining the quality of your code. On the other hand, it introduces issues in terms of scripting if you decide to change the Content-Type at the end of the chain.

Accept and Content-Type in HTTP

A browser (client) and a server exchange messages on how to handle a specific piece of information identified by a URI. So When a client is requesting http://www.opera.com, it sends along some HTTP headers. One of these headers specifies the type of format the browser is able to process and the order in which it would like to process. This header is Accept:. Each browser has a slight different Accept: header.

Table of Accept headers for some HTTP clients
Browser Accept header
Opera Desktop 11.0 text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Opera Mobile Emulator text/html, application/xml;q=0.9, application/xhtml+xml, multipart/mixed, application/vnd.wap.multipart.mixed, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Firefox 4 text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Safari 5 application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
curl */*
IE9 image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*

Usually the server replies with the most appropriate format matching what the client is requesting. So if the client is asking application/xhtml+xml and the server has the representation of this resource in this format it will send it along with the appropriate Content-Type, in this case application/xhtml+xml.

The issue

Some high profiles Web sites all under Microsoft IIS are behaving strangely. Let’s use curl as we did it previously and its possibility to send specific User-Agent: information and Accept: HTTP headers. We will do our tests on Starbucks Web site.

Curl Accept: /

A very simple one, curl by default sends Accept: */*. It means “send me anything you could have for this URI and I will do my best to understand it”

curl -sI http://www.starbucks.com

The server returns a Content-Type: application/xhtml+xml. Fair enough, we said we were accepting anything.

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: application/xhtml+xml; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=40pk0kfikyv2lnr3t10yckjl; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 16:58:38 GMT

Curl Accept: text/html

Let’s be more precise. We will tell the server that we want text/html only.

curl -sI -H "Accept: text/html" http://www.starbucks.com/

Neat! The server returns the right Content-Type: text/html. Everything is good! This server is behaving quite well so far.

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=xm0t31ebosa1symnb5xn4gkm; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:03:35 GMT

Opera 11.01

I will be using Opera 11.01 with these two parameters :

  • User-Agent: Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01
  • Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1

We can use Opera Dragonfly to check the headers or use curl by mocking Opera.

curl -sI -H "Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01" http://www.starbucks.com/

We then receive from the server Content-Type: application/xhtml+xml. As I said before, it is ok, because we said it was one of the formats we accepted.

HTTP/1.1 200 OK 
Cache-Control: private
Content-Type: application/xhtml+xml; charset=utf-8
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=xdtworhtqlte5mxur2zeulay; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:09:43 GMT

The big issue is that the server is sending to Opera a file which is obviously not well-formed XHTML and because the server said it was Content-Type: application/xhtml+xml, the XML parser fails to process a none well-formed XML (XHTML). That is a major usability issue for Opera users. Let’s continue a bit our testing.

Firefox 4

This time we will be using Firefox 4.

curl -sI -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b11) Gecko/20100101 Firefox/4.0b11" http://www.starbucks.com/

Hmmm Amazing the server this time is answering with Content-Type: text/html. This starts to be fishy. Note that it is still a valid answer from the server. The client said it could receive both. The server decided to send html.

HTTP/1.1 200 OK
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:06:43 GMT

Stricter Firefox 4 with only “application/xhtml+xml”

This time we will ask the server to send only application/xhtml+xml

curl -sI -H "Accept: application/xhtml+xml" -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b11) Gecko/20100101 Firefox/4.0b11" http://www.starbucks.com/

Oh surprise. This time the answer of the server is wrong. The server sent back Content-Type: text/html without respecting the HTTP contract. Fishy, the server does user agent sniffing ?

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=q41qfc3adwjsidjgtevpxnnk; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:24:48 GMT

Opera 11 with only “text/html”

Let’s be sure about the intuition on user agent sniffing. We send Opera user agent and we require to receive text/html.

curl -sI -H "Accept: text/html" -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01" http://www.starbucks.com/

The server sends to Opera… Content-Type: application/xhtml+xml. This time we can be sure, the server does user agent sniffing.

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: application/xhtml+xml; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=vfzwujvflotbibe4qet0foxb; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:27:51 GMT

Why is this an issue?

Opera gets a different content-type than other browsers which could be fine if the server was sending well-formed XML. Opera is right to fail on this none well-formed content. But unfortunately because nobody cared to test that the markup was well-formed, Opera looks like if it was wrong compared to other browsers. It’s why I call wrong to be right. The consequences are terrible, because the Opera users can’t access these sites and they are penalized because we do the right thing.

What should we do?

  • We try to contact the owners of these Web sites (starbucks, spanair, phenomblue, leisurepro, mcafee, teavana, etc.)
  • We try to identify the library which is in charge of the user-agent sniffing, we have a lead for an unsupported library called MDBF
  • We could PATCH for these specific Web sites, but then we lose the benefits of pressuring the owners of fixing their sites, because they will have the impression it is working.
Really there is no perfect solution, but in the end, the Opera users do not have the freedom of choice.

HTML5, a better documentationWeb Standards Links - 28 February 2011 to 6 March 2011

Comments

João EirasxErath Thursday, March 3, 2011 6:08:32 PM

How about automatically fallback to the html parser if there is either a "ASP.Net" cookie or IIS in the Server name and post to the error console that the webmaster is a twat ?

Spaceman Spiffspfv Thursday, March 3, 2011 6:17:19 PM

Hum, and sending multiple representations based on Accept (and possibly User-Agent) without a Vary: header in the response ? The server explicitly disallow caching the document in public caches, but a private cache could in theory be shared by multiple browsers, and send the "wrong" version to the client...

Mike Taylormiketaylr Thursday, March 3, 2011 7:18:02 PM

I kind of like João's suggestion.

Rohan Singhrohanrsingh Wednesday, March 16, 2011 1:35:24 AM

Hey Karl,

Thanks for the heads up about this happening on Starbucks.com. We tracked it down to an issue in the ASP.NET Browser Capabilities functionality.

As it turns out, if you supply ASP.NET with a browsercaps file (database of browser capabilities), it sets the response's Content-Type to the user agent's preferred Content-Type.

Of course, this doesn't make a lot of sense, since HTML isn't going to magically change into XHTML or WML or anything else simply because it's the preferred type.

Anyway, we've worked around this behavior and are testing the fix. Should be out soon.

Thanks again!

Karl Dubostkarlcow Wednesday, March 16, 2011 11:23:02 AM

That is just excellent news, Rohan Singh. Tell me when you have deployed. Feel free to send me an email too, because this behavior is a pattern we have seen on a few high profile site, and having a detailed fix would be super helpful.

Deep bow and respect!

Martin RauscherHades32 Wednesday, March 16, 2011 4:18:14 PM

A nice way of working around this problem would be to send a HEAD request with a Firefox UA string as soon as you get XHTML from an ASP.NET server.
If that request gives HTML... well... you know what to do wink

Zi Bin Cheahzibin Saturday, April 16, 2011 8:13:04 AM

Originally posted by Hades32:

A nice way of working around this problem would be to send a HEAD request with a Firefox UA string as soon as you get XHTML from an ASP.NET server.



Doing this will cause too much overhead. microsoft should just change their browsercaps file.

Write a comment

New comments have been disabled for this post.