New adventures in compatibility testing
Wednesday, 2. December 2009, 13:12:03
The bug I'm investigating is a small and ugly one hiding in the document.getElementsByName() implementation - getElementsByName('someID') will find an element with id="someID".
This is of course bad behaviour. That method has nothing to do with IDs and should find elements by name only.
The good news is that it's trivial to fix. The bad news is that it's there for a reason, and the reason is called Internet Explorer. We've been bug-compatible on purpose and while we'd like to remove the bug we have no idea how many sites will break if we do!
So, I'd like an answer to questions like these:
- How many sites use getElementsByName() to find elements with an ID?
- Do these sites break if we fix the bug?
- Do they have alternate code paths for browsers doing it right? If yes, how do they figure out what code to use?
Tools at our disposal: the MAMA web code search engine (an internal Opera project), User JavaScript, and two ad-hoc Opera Unite services.
MAMA tracks sites that might be using document.getElementsByName(). It knows about roughly 45 000 sites where it has seen the string "getElementsByName" in script source code, and it generously provides 5000 random ones in a text file on my request. Naturally, MAMA does only static analysis of the scripts, it can't tell whether the method is actually called or what it was used for.
That information is a piece of cake to get with User JavaScript. A trivial custom script, trackGEBNabuse.js, overwrites the getElementsByName() method with one that will do a bit of debugging and logging on our behalf. And I'm playing with Opera Unite for the first time, with one logging service and one URL player that keeps track of which of the 5000 URLs were already visited and sends Opera to the next one.
(Opera Unite actually rocks! It's fun to write backend-type logic in JavaScript rather than PHP, and it's less hassle while developing to keep all the information, URL lists, log files and scripts locally on the hard drive. I've been undecided about Unite, not sure if it was more important than all the other things we should be spending time on - now I see it's maturing and making itself useful. Nice.)
To walk you through the main logic of things - here's the user JS that overwrites the native method to do logging - commented:
(function(gebn){/* "gebn" is a reference to the actual, native function */
document.getElementsByName=function(name){ /* overwrite the real one */
var elementList=gebn.apply(this, arguments); /* call the native function, record the list it returns */
/* we want to know if anything in the elementList is there due to a matching id rather than a matching name */
var abuse=[];
for(var i=0,elm;elm=elementList[i];i++){ /* go through all returned elements */
if( elm.getAttribute('name')!==name )abuse.push(elm.outerHTML); /* we found one that's probably in the list because of an ID attribute! */
}
if(abuse.length>0){
/* log errors to some server... */
(new Image()).src='http://hr-opera.hallvors.operaunite.com/logger/logGEBN/?data='+encodeURIComponent(abuse.join(', '))+'&href='+encodeURIComponent(location.href);
}
return elementList;/* don't forget to return the list of elements to the waiting script */
}
})(document.getElementsByName); /* this is where we pass the real method as an argument to the function */
As you see, it uses the oldest trick in the book - new Image() - to ping the Unite service with some data. The data is then stored in the folder I told Opera Unite to use when installing the widget.
The only other interesting part is the code that requests the next URL from the URL player - as trivial as doing this from a load event listener:
if(location.hostname!='hr-opera.hallvors.operaunite.com')
setTimeout( function(){ location.href='http://hr-opera.hallvors.operaunite.com/urldriver/nexturl?'+Math.random(); }, 500 );
The urldriver service also accepts the "urllist=somefile.txt" query string argument, so a different user scripts could play URLs from a different file (though not at the same time since the index of what URL one has reached is not stored per-file. That's obviously a bug in my Unite service - keep in mind that these are ad-hoc throwaway services done in 30 minutes of cutting, pasting and typing last night, so don't expect QA and polish :-p).
And the results? Left an Opera 10.10 instance to surf on its own in 5 different tabs overnight, which generated this log file listing 6 unique sites and the HTML of the elements returned in response to a getElementsByName() call due to this bug. Analysing 6 out of 5000 URLs manually is certainly doable












