Opera Core Concerns

Official blog for Core developers at Opera

Automated regression testing of the browser core

, , ,

The cornerstone of all testing done on the core of the Opera browser is our automated regression testing system, named SPARTAN. The system consists of a central server and about 50 test machines running our 120 000 automated tests on all core reference builds. The purpose of this system is to help us discover any new bugs we introduce as early as possible, so that we can fix them before they cause any trouble for our users.

Step one: Preparing a build

Before SPARTAN can test anything, it will require a build to test. Our build system automatically creates builds every night and pings SPARTAN when they are ready. Developers and testers can also request their own builds from the build system, using any build tag they want, to test stuff from their own experimental branches before this is eventually merged into the stable mainline we base our products on.

Unlike other browser vendors we ship our browser on a variety of different platforms. So our core build packages do not contain just one binary, but several. One for each general product category. Each of these profiles have the same feature set and memory constraints as the platform they correspond to. The whole set of tests are run on each of these profiles.

Step two: Testing

When the SPARTAN server is informed about the existence of a new build it will add this build to its testing queue and distribute a few hundred tests to each of the test machines the next time they ask for more work. Each test machine works independently with its assigned tests. It will download the Opera binaries it has been told to use, and run its assigned tests on it. Once it has finished its batch of tests, it will pass the test results back to the SPARTAN server, and again ask what to do next.

If it ever runs out of new builds to test, for example during the weekend, it will look back at older builds and run any newly added tests on them too. This to ensure that we have a full history for each test, and at any time can determine when a fix or regression was first introduced without having to manually test things again.

We have several different types of tests:

Unit tests
Unit tests (or selftests), written by the same developers who write the running code, tests individual functions and APIs.
JavaScript tests
Our JavaScript tests test a wide array of different features on a functional level. This includes for example tests for the Selectors API, tests for common JavaScript frameworks, or any other feature we can interact with through JavaScript.
Watir tests
Many tests require some sort of user interaction. To test forms, for example, one must click buttons or checkboxes or type in text fields. To avoid having to do this manually, we have implemented support for the cross-browser Watir API. While others use this API to test their web applications, we use it to test the browser itself.
Visual tests
To test our stylesheet and graphics code, we need to test that our visual test cases look right. Some of our visual tests automatically compare two pages or two elements to determine whether they are the same. On other tests, the test machines will take a screenshot of the rendered page and pass it back to the SPARTAN server. If the SPARTAN server has seen this screenshot before, it will know whether that particular rendering means PASS or FAIL. If SPARTAN has never seen it before, the screenshot must be labeled as PASS or FAIL by a human. This is labour-intensive work we intend to further reduce through reftests.
Performance tests
A modern browser must not only pass tests for all its different features. It must also be fast. SPARTAN runs a number of different performance tests, both internally and externally developed, on our builds. If Opera becomes slower at any particular test, this will be flagged as a regression.
Crash tests
We create test cases for every single bug we analyze and fix, and SPARTAN runs most of these. Among our bug-based test cases are crash tests. If Opera can load these tests without crashing, the test has passed. If it crashes, we have reintroduced an old crash, and must fix it.

All in all, we currently run about 120 000 tests on each configuration in each build, but this number changes daily. We continuously write new test cases for bugs or test suites for new or old features, and we also copy any publicly available test suites we find useful. Right now we are also working on automating many of our previously manual tests, including memory tests.

Step three: Human intervention

Once the machines are done with their part of the job with any particular build, they will send an email to a human who will continue the work. SPARTAN will generate a report of changes between this build and the previous build. In most builds there are some tests that go from FAIL to PASS because we have fixed something. But there are also often regressions—tests that go from PASS to FAIL—because we accidentally broke something while fixing something else. This is expected, and is the reason for why we do regression testing. We know there will always be regressions, and need to find them as quickly as possible in order to fix them before they can cause any trouble for users or customers.

The human tester will analyze each regressed test. If a hundred different tests started failing at the same time, they could all have broke because of one regression, or there could be several different ones. For each unique regression identified the human tester will report a new bug and assign it to the developer responsible for the code that broke. Once a fix is ready, we will run all our tests again.

Opera's site patchingNative JSON support in Opera

Comments

Tamil Tuesday, October 13, 2009 2:25:12 PM

Originally posted by wilhelmja:

120 000 automated tests

bigeyes

Why Linux test machines is more compared to Windows test machines?

Thanks.

lucideer Tuesday, October 13, 2009 2:29:32 PM

Originally posted by Tamil:

Why Linux test machines is more compared to Windows test machines?


More various Linux distribution? Plus more architectures (x86-64).

Wilhelm Joys Andersenwilhelmja Tuesday, October 13, 2009 2:37:45 PM

Originally posted by Tamil:

Why Linux test machines is more compared to Windows test machines?


The virtual Linux machines, all running Debian, are really easy to set up and maintain. As the browser core is platform-independent, it doesn't really matter all that much which platform we use, so we go for the easiest option. Although most of the testing is on Linux, we do test nightly core builds on Windows too. The Windows test machines are however primarily used by the desktop team and other platform teams for testing their builds.

Tamil Tuesday, October 13, 2009 2:40:46 PM

Thank you. smile

zoquete Tuesday, October 13, 2009 2:56:23 PM

when can I download the first nightly?

ouzowtfouzoWTF Tuesday, October 13, 2009 3:23:17 PM

I'm update addicted, give me the nightly builds!!!! bigsmile
Thanks for that deep look into the testing routines! Keep up your good work! smile

Charles SchlossChas4 Tuesday, October 13, 2009 3:37:08 PM

cool

dreamdrummerddrum Tuesday, October 13, 2009 4:07:48 PM

Any news on when you're releasing operawatir? There have been couple of mentions of it already on these blogs but nothing specific on releasing this tool for larger community to use.

serious Tuesday, October 13, 2009 7:03:03 PM

amazing, thx for the insight bigsmile

Andrew NguyenSouthernCross Tuesday, October 13, 2009 8:48:17 PM

I love how it says "Fail".

Daniel HendrycksDanielHendrycks Wednesday, October 14, 2009 2:21:17 AM

Very cool. Is SPARTAN an acronym?

Charles SchlossChas4 Wednesday, October 14, 2009 3:06:46 AM

This is.... I am guessing

MyOpera team, please fix this!fearphage Wednesday, October 14, 2009 11:04:02 AM

Originally posted by SouthernCross:

I love how it says "Fail".

Fail is the opposite of pass. If they all said pass, there would be no core bugs... assuming that every core bug is programmatically testable

Originally posted by core:

We know there will always be regressions, and need to find them as quickly as possible in order to fix them before they can cause any trouble for users or customers.

Shouldn't regressions be the exception and not the rule? I've done QA for a long time. I've never worked at a place where I expected there to be regressions. I've never worked for a browser vendor either. That's why I personally view regression testing as the boring part. It usually involves running through the same steps over and over with the same results in my experience. Is there a reason that regressions are more plentiful in opera and/or browsers?

That being said. A regression for you:

DSK-268040: Regression - image elements don't fire the error event more than once when the source is reassigned. This regressed build 1810. It was working fine in build 1799. This was previously fixed in v10.00.1491 on May 8, 2009.

netwolf Wednesday, October 14, 2009 2:24:47 PM

I wonder how so annoying and obvious bugs like e.g. the eBay one (with it's large white void) can slip through those test (since several versions).
I mean, 1) ebay is a very popular site 2) the bug is very obvious for the naked eye...

Interesting to read about things running in background, thank you for insights!

Purdi Wednesday, October 14, 2009 3:26:27 PM

Originally posted by fearphage:

Is there a reason that regressions are more plentiful in opera and/or browsers?


You mean apart from browsers being insanely complex and having to handle an ever-changing web?

Purdi Wednesday, October 14, 2009 3:29:00 PM

Originally posted by netwolf:

I wonder how so annoying and obvious bugs like e.g. the eBay one (with it's large white void) can slip through those test (since several versions).


What "obvious" bug? I don't have the problem.

By the way, you even participated in a thread showing that the problem was that eBay set some cookie. Deleting the cookie removed the problem. Why are you assuming that it's a bug or regression in Opera rather than an eBay bug regression? It sure looks like they are using some silly sniffing script to send different content depending on different criteria!

MyOpera team, please fix this!fearphage Wednesday, October 14, 2009 7:19:03 PM

Originally posted by Purdi:

Originally posted by fearphage:

Is there a reason that regressions are more plentiful in opera and/or browsers?

You mean apart from browsers being insanely complex and having to handle an ever-changing web?

Well i notice (because i file them) a lot of regressions in opera. I use firefox, chromium, and safari daily but Opera is my main browser. It's possible that they all have as many regressions as Opera but I don't use them as much. I don't know if Opera has a ton of regressions or if browser vendors in general have lots of regressions. Do you think it's the latter? I know Opera has a lot of regressions but I have no way to know if it's more or less than any other browsers.

Originally posted by Purdi:

By the way, you even participated in a thread showing that the problem was that eBay set some cookie. Deleting the cookie removed the problem.

Can you please link me to that thread? I didn't know that was the source of problem. I've been living with it for a while. I presume other browsers handle the erroneous cookie in the exact same way? Opera could have resolved this on a global scale via browser.js.

Rafald.i.z. Wednesday, October 14, 2009 10:18:55 PM

Originally posted by fearphage:

DSK-268040: Regression - image elements don't fire the error event more than once when the source is reassigned. This regressed build 1810. It was working fine in build 1799. This was previously fixed in v10.00.1491 on May 8, 2009.


The fix was backed out as it caused ugly problem with mouseovers. Proper fix will be in next core version because it's a bit too complicated to integrate in current codebase...

Wilhelm Joys Andersenwilhelmja Wednesday, October 14, 2009 10:53:02 PM

Originally posted by ddrum:

Any news on when you're releasing operawatir?



OperaWatir depends on core functionality that is not yet in any public build. You'll hear from us again when it is. (c:

Originally posted by DanielHendrycks:

Very cool. Is SPARTAN an acronym?



It was an acronym at some point in time.

Originally posted by fearphage:

Shouldn't regressions be the exception and not the rule?



By the time the core code becomes available in a finished product? Yes, definitely. That is the goal.

But building a browser engine is extremely complex. The specifications the Web is built upon are often rather crappy, leaving entire areas such as error handling completely undefined. There are hardly any official test suites browser vendors can make use of when implementing something, and determining what is "correct" behavior often comes down to trial and error. And something that was "correct" yesterday might not be "correct" today, depending on what web developers and other browser vendors do.

Our thousands of regression tests make up a collective, infallible memory of all the thousands of small design decisions we've made over the last decade. No single human can keep track of all that information, so when we fix one bug, changing the behavior in a general area ever so slightly, the new code might unintentionally reintroduce a bug we fixed years ago, but no human can remember we fixed. But SPARTAN knows. It discovers that a test started failing again, and tells us. Then we can fix it.

Some regressions are hard to catch, though. Perhaps they require several steps of human interaction that are difficult to automate or automatically verify. Or perhaps the regression is so trivial, so obscure that neither our hundred thousand tests or any of the hundreds of billions of sites out there are affected by it. Until one day, when Facebook starts to depend on just that behavior. And it breaks.

Ideally, we'd have an infinite number of test cases, covering everything the Web and its billions of sites depend on. But we're not quite there yet. Until then, we'll have to depend on both our automated testing and human testers inside and outside of Opera Software. Thanks for finding those bugs for us. (c;

edvakf Thursday, October 15, 2009 6:33:12 AM

If I file a bug report with 100% reproducible bug case (which I do sometimes), does it become a test case when a QA receives it? Or I won't until a developer works on it?

Haavardhaavard Thursday, October 15, 2009 8:14:34 AM

The eBay issue is something they promised to fix on their end, so as far as I know, we are just waiting for them. Regression testing doesn't necessarily pick up when a site changes its code, only when a change in Opera causes problems.

MyOpera team, please fix this!fearphage Thursday, October 15, 2009 9:52:07 AM

Thanks for the info gents. I understand that I probably don't/can't appreciate the complexity of browser development until I've done it.

For clarity's sake, is the js engine the only piece of core being entirely rewritten? I'm mostly interested in whether the HTML parser will be rewritten. I was told that was a requirement to fix 278053. I understand this is getting off-topic. Just anxious for cores beyond 2.2... :)

Шуйский Николай [krigstask, Ŝtérkrìg]Sterkrig Friday, October 16, 2009 1:13:40 PM

Very interesting post indeed, esp. for software developers.

Hope there'll be posts about actual developing: SCM/VCS in use, developers' hierarchy etc. (-:E

Kai OckendorfOckendorf Tuesday, October 20, 2009 5:24:42 AM

Thanks for the look inside jester

netwolf Wednesday, December 2, 2009 10:10:20 AM

Originally posted by haavard:

The eBay issue is something they promised to fix on their end, so as far as I know, we are just waiting for them.


After a couple of months without eBay fixing this issue, do you still think they are willing to?
I know, it's not Opera's fault, but it IS Opera's problem, so maybe if's time for Opera to fix it on their end...

BarryMah Wednesday, February 17, 2010 5:05:52 PM

a bit late... but...

I left Opera after many years of loyalty because of a crash caused by an inability to read index.in made it impossible to reload. I tried to get help but to no avail. Just wondered whether the SPARTAN app has ever tested this??

TangoDeltaDelta Tuesday, July 20, 2010 2:16:36 PM

Are we any closer to getting either operawatir or a webdriver that will work with Opera? (Please, some update on this??)

People/companies can't really say their complex web apps support the Opera browser without being able to run automated regression tests against the browser. . . . running huge suites of app regression tests manually for each browser and OS combination just isn't going to happen.

Write a comment

You must be logged in to write a comment. If you're not a registered member, please sign up.