Automated regression testing of the browser core
By Wilhelm Joys Andersen. Tuesday, 13. October 2009, 14:13:15
The cornerstone of all testing done on the core of the Opera browser is our automated regression testing system, named SPARTAN. The system consists of a central server and about 50 test machines running our 120 000 automated tests on all core reference builds. The purpose of this system is to help us discover any new bugs we introduce as early as possible, so that we can fix them before they cause any trouble for our users.
Step one: Preparing a build
Before SPARTAN can test anything, it will require a build to test. Our build system automatically creates builds every night and pings SPARTAN when they are ready. Developers and testers can also request their own builds from the build system, using any build tag they want, to test stuff from their own experimental branches before this is eventually merged into the stable mainline we base our products on.
Unlike other browser vendors we ship our browser on a variety of different platforms. So our core build packages do not contain just one binary, but several. One for each general product category. Each of these profiles have the same feature set and memory constraints as the platform they correspond to. The whole set of tests are run on each of these profiles.
Step two: Testing
When the SPARTAN server is informed about the existence of a new build it will add this build to its testing queue and distribute a few hundred tests to each of the test machines the next time they ask for more work. Each test machine works independently with its assigned tests. It will download the Opera binaries it has been told to use, and run its assigned tests on it. Once it has finished its batch of tests, it will pass the test results back to the SPARTAN server, and again ask what to do next.
If it ever runs out of new builds to test, for example during the weekend, it will look back at older builds and run any newly added tests on them too. This to ensure that we have a full history for each test, and at any time can determine when a fix or regression was first introduced without having to manually test things again.
We have several different types of tests:
- Unit tests
- Unit tests (or selftests), written by the same developers who write the running code, tests individual functions and APIs.
- JavaScript tests
- Our JavaScript tests test a wide array of different features on a functional level. This includes for example tests for the Selectors API, tests for common JavaScript frameworks, or any other feature we can interact with through JavaScript.
- Watir tests
- Many tests require some sort of user interaction. To test forms, for example, one must click buttons or checkboxes or type in text fields. To avoid having to do this manually, we have implemented support for the cross-browser Watir API. While others use this API to test their web applications, we use it to test the browser itself.
- Visual tests
- To test our stylesheet and graphics code, we need to test that our visual test cases look right. Some of our visual tests automatically compare two pages or two elements to determine whether they are the same. On other tests, the test machines will take a screenshot of the rendered page and pass it back to the SPARTAN server. If the SPARTAN server has seen this screenshot before, it will know whether that particular rendering means PASS or FAIL. If SPARTAN has never seen it before, the screenshot must be labeled as PASS or FAIL by a human. This is labour-intensive work we intend to further reduce through reftests.
- Performance tests
- A modern browser must not only pass tests for all its different features. It must also be fast. SPARTAN runs a number of different performance tests, both internally and externally developed, on our builds. If Opera becomes slower at any particular test, this will be flagged as a regression.
- Crash tests
- We create test cases for every single bug we analyze and fix, and SPARTAN runs most of these. Among our bug-based test cases are crash tests. If Opera can load these tests without crashing, the test has passed. If it crashes, we have reintroduced an old crash, and must fix it.
All in all, we currently run about 120 000 tests on each configuration in each build, but this number changes daily. We continuously write new test cases for bugs or test suites for new or old features, and we also copy any publicly available test suites we find useful. Right now we are also working on automating many of our previously manual tests, including memory tests.
Step three: Human intervention
Once the machines are done with their part of the job with any particular build, they will send an email to a human who will continue the work. SPARTAN will generate a report of changes between this build and the previous build. In most builds there are some tests that go from FAIL to PASS because we have fixed something. But there are also often regressions—tests that go from PASS to FAIL—because we accidentally broke something while fixing something else. This is expected, and is the reason for why we do regression testing. We know there will always be regressions, and need to find them as quickly as possible in order to fix them before they can cause any trouble for users or customers.
The human tester will analyze each regressed test. If a hundred different tests started failing at the same time, they could all have broke because of one regression, or there could be several different ones. For each unique regression identified the human tester will report a new bug and assign it to the developer responsible for the code that broke. Once a fix is ready, we will run all our tests again.


Tamil # 13. October 2009, 14:25
Originally posted by wilhelmja:
Why Linux test machines is more compared to Windows test machines?
Thanks.
lucideer # 13. October 2009, 14:29
Originally posted by Tamil:
More various Linux distribution? Plus more architectures (x86-64).
wilhelmja # 13. October 2009, 14:37
Originally posted by Tamil:
The virtual Linux machines, all running Debian, are really easy to set up and maintain. As the browser core is platform-independent, it doesn't really matter all that much which platform we use, so we go for the easiest option. Although most of the testing is on Linux, we do test nightly core builds on Windows too. The Windows test machines are however primarily used by the desktop team and other platform teams for testing their builds.
Tamil # 13. October 2009, 14:40
zoquete # 13. October 2009, 14:56
ouzoWTF # 13. October 2009, 15:23
Thanks for that deep look into the testing routines! Keep up your good work!
Chas4 # 13. October 2009, 15:37
ddrum # 13. October 2009, 16:07
serious # 13. October 2009, 19:03
SouthernCross # 13. October 2009, 20:48
DanielHendrycks # 14. October 2009, 02:21
Chas4 # 14. October 2009, 03:06
fearphage # 14. October 2009, 11:04
Originally posted by SouthernCross:
Fail is the opposite of pass. If they all said pass, there would be no core bugs... assuming that every core bug is programmatically testableOriginally posted by core:
Shouldn't regressions be the exception and not the rule? I've done QA for a long time. I've never worked at a place where I expected there to be regressions. I've never worked for a browser vendor either. That's why I personally view regression testing as the boring part. It usually involves running through the same steps over and over with the same results in my experience. Is there a reason that regressions are more plentiful in opera and/or browsers?That being said. A regression for you:
DSK-268040: Regression - image elements don't fire the error event more than once when the source is reassigned. This regressed build 1810. It was working fine in build 1799. This was previously fixed in v10.00.1491 on May 8, 2009.
netwolf # 14. October 2009, 14:24
I mean, 1) ebay is a very popular site 2) the bug is very obvious for the naked eye...
Interesting to read about things running in background, thank you for insights!
Purdi # 14. October 2009, 15:26
Originally posted by fearphage:
You mean apart from browsers being insanely complex and having to handle an ever-changing web?
Purdi # 14. October 2009, 15:29
Originally posted by netwolf:
What "obvious" bug? I don't have the problem.
By the way, you even participated in a thread showing that the problem was that eBay set some cookie. Deleting the cookie removed the problem. Why are you assuming that it's a bug or regression in Opera rather than an eBay bug regression? It sure looks like they are using some silly sniffing script to send different content depending on different criteria!
fearphage # 14. October 2009, 19:19
Originally posted by Purdi:
Well i notice (because i file them) a lot of regressions in opera. I use firefox, chromium, and safari daily but Opera is my main browser. It's possible that they all have as many regressions as Opera but I don't use them as much. I don't know if Opera has a ton of regressions or if browser vendors in general have lots of regressions. Do you think it's the latter? I know Opera has a lot of regressions but I have no way to know if it's more or less than any other browsers.Originally posted by Purdi:
Can you please link me to that thread? I didn't know that was the source of problem. I've been living with it for a while. I presume other browsers handle the erroneous cookie in the exact same way? Opera could have resolved this on a global scale via browser.js.d.i.z. # 14. October 2009, 22:18
Originally posted by fearphage:
The fix was backed out as it caused ugly problem with mouseovers. Proper fix will be in next core version because it's a bit too complicated to integrate in current codebase...
wilhelmja # 14. October 2009, 22:53
Originally posted by ddrum:
OperaWatir depends on core functionality that is not yet in any public build. You'll hear from us again when it is. (c:
Originally posted by DanielHendrycks:
It was an acronym at some point in time.
Originally posted by fearphage:
By the time the core code becomes available in a finished product? Yes, definitely. That is the goal.
But building a browser engine is extremely complex. The specifications the Web is built upon are often rather crappy, leaving entire areas such as error handling completely undefined. There are hardly any official test suites browser vendors can make use of when implementing something, and determining what is "correct" behavior often comes down to trial and error. And something that was "correct" yesterday might not be "correct" today, depending on what web developers and other browser vendors do.
Our thousands of regression tests make up a collective, infallible memory of all the thousands of small design decisions we've made over the last decade. No single human can keep track of all that information, so when we fix one bug, changing the behavior in a general area ever so slightly, the new code might unintentionally reintroduce a bug we fixed years ago, but no human can remember we fixed. But SPARTAN knows. It discovers that a test started failing again, and tells us. Then we can fix it.
Some regressions are hard to catch, though. Perhaps they require several steps of human interaction that are difficult to automate or automatically verify. Or perhaps the regression is so trivial, so obscure that neither our hundred thousand tests or any of the hundreds of billions of sites out there are affected by it. Until one day, when Facebook starts to depend on just that behavior. And it breaks.
Ideally, we'd have an infinite number of test cases, covering everything the Web and its billions of sites depend on. But we're not quite there yet. Until then, we'll have to depend on both our automated testing and human testers inside and outside of Opera Software. Thanks for finding those bugs for us. (c;
edvakf # 15. October 2009, 06:33
haavard # 15. October 2009, 08:14
fearphage # 15. October 2009, 09:52
For clarity's sake, is the js engine the only piece of core being entirely rewritten? I'm mostly interested in whether the HTML parser will be rewritten. I was told that was a requirement to fix 278053. I understand this is getting off-topic. Just anxious for cores beyond 2.2... :)
Sterkrig # 16. October 2009, 13:13
Hope there'll be posts about actual developing: SCM/VCS in use, developers' hierarchy etc. (-:E
Ockendorf # 20. October 2009, 05:24
BS-Harou # 29. November 2009, 21:05
netwolf # 30. November 2009, 11:01
Originally posted by BS-Harou:
Yes, more facts (less rumors) please
fearphage # 30. November 2009, 17:30
Originally posted by BS-Harou:
I'm tired of hearing people talk about it. I'd like to use some moderately current technology already. HTML5 called and asked why Opera couldn't come out to playlucideer # 30. November 2009, 18:37
Originally posted by fearphage:
You mean things like HTML5 Server-sent Events or Web Forms? "Current technologies" are defined by what people choose to use - which is defined by what browsers are popular. People don't choose browsers based on the features they support, people choose what features they use based on what their own default browser supports. This is why SVG has not taken off as much as it could have, and why very few web apps use persistent HTTP push, i.e. Comet.
Robert90 # 30. November 2009, 19:17
Originally posted by lucideer:
I don't agree, "current technologies" are defined by what is used on the web (sounds logic, doesn't it?), and what's getting used on the web, is what browsers support. The only problem is some browser from Redmond which lags ages behind. And because of the market share of that browser from Redmond, the "current technologies" also lag ages behind.
The user doesn't care how the web works, if it works it's fine. The user doesn't care if it takes a programmer/designer hours of (extra) work to make a site look good, or what hacks they have to use before the site works. The only thing they care about is that it works.
And there is a lot of (simple) stuff in HTML5 and CSS which can be used without breaking the web. Thinks like border-radius, the new input fields from HTML, et cetera. If Opera supports those new things (which it does in the case of the new form stuff) Opera users can benefit from it, but users of other browsers, won't notice anything has changed (<input type="date" /> just changes into <input type="text"/> so the web doesn't look better in the browser, it's just still the same, you can easily check if a browsers support type="date", when it doesn't you can add some "click here" calendar besides to the text field (default fallback) and it doesn't change the user experience)
lucideer # 30. November 2009, 19:43
Chas4 # 30. November 2009, 23:53
But I think that browsers using HTML 5 is a good thing as they can test what works and what does not
Robert90 # 1. December 2009, 07:36
And before the spec. can become final, isn't it a criteria that the spec. is already implemented by some browsers?
fearphage # 1. December 2009, 15:49
Originally posted by lucideer:
Opera's server-sent events was incomplete when they originally added it. It is now WAY outdated from the current spec. I'm aware of what in html5 Opera does and does not support. Before you try to school me, you should educate yourself.Originally posted by lucideer:
I'm not talking about "people". I'm talking about the browser I want to use can't be used on some sites because Opera simply doesn't support the technology. Bespin, for instance, cannot work in Opera because supports no (0, none, nada) canvas texta apis (filed as DSK-246923).lucideer # 1. December 2009, 16:44
Originally posted by fearphage:
As are Mozilla's implementations of HTML5 features - DOMStorage for example is MASSIVELY so.