Skip navigation.

Log in | Sign up

Sign up | Lost password? | Help

Opera Core Concerns

Official blog for Core developers at Opera

Automated regression testing of the browser core

, , ,

The cornerstone of all testing done on the core of the Opera browser is our automated regression testing system, named SPARTAN. The system consists of a central server and about 50 test machines running our 120 000 automated tests on all core reference builds. The purpose of this system is to help us discover any new bugs we introduce as early as possible, so that we can fix them before they cause any trouble for our users.

Step one: Preparing a build

Before SPARTAN can test anything, it will require a build to test. Our build system automatically creates builds every night and pings SPARTAN when they are ready. Developers and testers can also request their own builds from the build system, using any build tag they want, to test stuff from their own experimental branches before this is eventually merged into the stable mainline we base our products on.

Unlike other browser vendors we ship our browser on a variety of different platforms. So our core build packages do not contain just one binary, but several. One for each general product category. Each of these profiles have the same feature set and memory constraints as the platform they correspond to. The whole set of tests are run on each of these profiles.

Step two: Testing

When the SPARTAN server is informed about the existence of a new build it will add this build to its testing queue and distribute a few hundred tests to each of the test machines the next time they ask for more work. Each test machine works independently with its assigned tests. It will download the Opera binaries it has been told to use, and run its assigned tests on it. Once it has finished its batch of tests, it will pass the test results back to the SPARTAN server, and again ask what to do next.

If it ever runs out of new builds to test, for example during the weekend, it will look back at older builds and run any newly added tests on them too. This to ensure that we have a full history for each test, and at any time can determine when a fix or regression was first introduced without having to manually test things again.

We have several different types of tests:

Unit tests
Unit tests (or selftests), written by the same developers who write the running code, tests individual functions and APIs.
JavaScript tests
Our JavaScript tests test a wide array of different features on a functional level. This includes for example tests for the Selectors API, tests for common JavaScript frameworks, or any other feature we can interact with through JavaScript.
Watir tests
Many tests require some sort of user interaction. To test forms, for example, one must click buttons or checkboxes or type in text fields. To avoid having to do this manually, we have implemented support for the cross-browser Watir API. While others use this API to test their web applications, we use it to test the browser itself.
Visual tests
To test our stylesheet and graphics code, we need to test that our visual test cases look right. Some of our visual tests automatically compare two pages or two elements to determine whether they are the same. On other tests, the test machines will take a screenshot of the rendered page and pass it back to the SPARTAN server. If the SPARTAN server has seen this screenshot before, it will know whether that particular rendering means PASS or FAIL. If SPARTAN has never seen it before, the screenshot must be labeled as PASS or FAIL by a human. This is labour-intensive work we intend to further reduce through reftests.
Performance tests
A modern browser must not only pass tests for all its different features. It must also be fast. SPARTAN runs a number of different performance tests, both internally and externally developed, on our builds. If Opera becomes slower at any particular test, this will be flagged as a regression.
Crash tests
We create test cases for every single bug we analyze and fix, and SPARTAN runs most of these. Among our bug-based test cases are crash tests. If Opera can load these tests without crashing, the test has passed. If it crashes, we have reintroduced an old crash, and must fix it.

All in all, we currently run about 120 000 tests on each configuration in each build, but this number changes daily. We continuously write new test cases for bugs or test suites for new or old features, and we also copy any publicly available test suites we find useful. Right now we are also working on automating many of our previously manual tests, including memory tests.

Step three: Human intervention

Once the machines are done with their part of the job with any particular build, they will send an email to a human who will continue the work. SPARTAN will generate a report of changes between this build and the previous build. In most builds there are some tests that go from FAIL to PASS because we have fixed something. But there are also often regressions—tests that go from PASS to FAIL—because we accidentally broke something while fixing something else. This is expected, and is the reason for why we do regression testing. We know there will always be regressions, and need to find them as quickly as possible in order to fix them before they can cause any trouble for users or customers.

The human tester will analyze each regressed test. If a hundred different tests started failing at the same time, they could all have broke because of one regression, or there could be several different ones. For each unique regression identified the human tester will report a new bug and assign it to the developer responsible for the code that broke. Once a fix is ready, we will run all our tests again.

Opera's site patching

Comments

Tamil 13. October 2009, 14:25

Originally posted by wilhelmja:

120 000 automated tests

:eyes:

Why Linux test machines is more compared to Windows test machines?

Thanks.

lucideer 13. October 2009, 14:29

Originally posted by Tamil:

Why Linux test machines is more compared to Windows test machines?


More various Linux distribution? Plus more architectures (x86-64).

wilhelmja 13. October 2009, 14:37

Originally posted by Tamil:

Why Linux test machines is more compared to Windows test machines?


The virtual Linux machines, all running Debian, are really easy to set up and maintain. As the browser core is platform-independent, it doesn't really matter all that much which platform we use, so we go for the easiest option. Although most of the testing is on Linux, we do test nightly core builds on Windows too. The Windows test machines are however primarily used by the desktop team and other platform teams for testing their builds.

Tamil 13. October 2009, 14:40

Thank you. :smile:

zoquete 13. October 2009, 14:56

when can I download the first nightly?

ouzoWTF 13. October 2009, 15:23

I'm update addicted, give me the nightly builds!!!! :D
Thanks for that deep look into the testing routines! Keep up your good work! :smile:

Chas4 13. October 2009, 15:37

:cool:

ddrum 13. October 2009, 16:07

Any news on when you're releasing operawatir? There have been couple of mentions of it already on these blogs but nothing specific on releasing this tool for larger community to use.

serious 13. October 2009, 19:03

amazing, thx for the insight :D

SouthernCross 13. October 2009, 20:48

I love how it says "Fail".

DanielHendrycks 14. October 2009, 02:21

Very cool. Is SPARTAN an acronym?

Chas4 14. October 2009, 03:06

This is.... I am guessing

fearphage 14. October 2009, 11:04

Originally posted by SouthernCross:

I love how it says "Fail".

Fail is the opposite of pass. If they all said pass, there would be no core bugs... assuming that every core bug is programmatically testable

Originally posted by core:

We know there will always be regressions, and need to find them as quickly as possible in order to fix them before they can cause any trouble for users or customers.

Shouldn't regressions be the exception and not the rule? I've done QA for a long time. I've never worked at a place where I expected there to be regressions. I've never worked for a browser vendor either. That's why I personally view regression testing as the boring part. It usually involves running through the same steps over and over with the same results in my experience. Is there a reason that regressions are more plentiful in opera and/or browsers?

That being said. A regression for you:

DSK-268040: Regression - image elements don't fire the error event more than once when the source is reassigned. This regressed build 1810. It was working fine in build 1799. This was previously fixed in v10.00.1491 on May 8, 2009.

netwolf 14. October 2009, 14:24

I wonder how so annoying and obvious bugs like e.g. the eBay one (with it's large white void) can slip through those test (since several versions).
I mean, 1) ebay is a very popular site 2) the bug is very obvious for the naked eye...

Interesting to read about things running in background, thank you for insights!

Purdi 14. October 2009, 15:26

Originally posted by fearphage:

Is there a reason that regressions are more plentiful in opera and/or browsers?


You mean apart from browsers being insanely complex and having to handle an ever-changing web?

Purdi 14. October 2009, 15:29

Originally posted by netwolf:

I wonder how so annoying and obvious bugs like e.g. the eBay one (with it's large white void) can slip through those test (since several versions).


What "obvious" bug? I don't have the problem.

By the way, you even participated in a thread showing that the problem was that eBay set some cookie. Deleting the cookie removed the problem. Why are you assuming that it's a bug or regression in Opera rather than an eBay bug regression? It sure looks like they are using some silly sniffing script to send different content depending on different criteria!

fearphage 14. October 2009, 19:19

Originally posted by Purdi:

Originally posted by fearphage:

Is there a reason that regressions are more plentiful in opera and/or browsers?

You mean apart from browsers being insanely complex and having to handle an ever-changing web?

Well i notice (because i file them) a lot of regressions in opera. I use firefox, chromium, and safari daily but Opera is my main browser. It's possible that they all have as many regressions as Opera but I don't use them as much. I don't know if Opera has a ton of regressions or if browser vendors in general have lots of regressions. Do you think it's the latter? I know Opera has a lot of regressions but I have no way to know if it's more or less than any other browsers.

Originally posted by Purdi:

By the way, you even participated in a thread showing that the problem was that eBay set some cookie. Deleting the cookie removed the problem.

Can you please link me to that thread? I didn't know that was the source of problem. I've been living with it for a while. I presume other browsers handle the erroneous cookie in the exact same way? Opera could have resolved this on a global scale via browser.js.

d.i.z. 14. October 2009, 22:18

Originally posted by fearphage:

DSK-268040: Regression - image elements don't fire the error event more than once when the source is reassigned. This regressed build 1810. It was working fine in build 1799. This was previously fixed in v10.00.1491 on May 8, 2009.


The fix was backed out as it caused ugly problem with mouseovers. Proper fix will be in next core version because it's a bit too complicated to integrate in current codebase...

wilhelmja 14. October 2009, 22:53

Originally posted by ddrum:

Any news on when you're releasing operawatir?



OperaWatir depends on core functionality that is not yet in any public build. You'll hear from us again when it is. (c:

Originally posted by DanielHendrycks:

Very cool. Is SPARTAN an acronym?



It was an acronym at some point in time.

Originally posted by fearphage:

Shouldn't regressions be the exception and not the rule?



By the time the core code becomes available in a finished product? Yes, definitely. That is the goal.

But building a browser engine is extremely complex. The specifications the Web is built upon are often rather crappy, leaving entire areas such as error handling completely undefined. There are hardly any official test suites browser vendors can make use of when implementing something, and determining what is "correct" behavior often comes down to trial and error. And something that was "correct" yesterday might not be "correct" today, depending on what web developers and other browser vendors do.

Our thousands of regression tests make up a collective, infallible memory of all the thousands of small design decisions we've made over the last decade. No single human can keep track of all that information, so when we fix one bug, changing the behavior in a general area ever so slightly, the new code might unintentionally reintroduce a bug we fixed years ago, but no human can remember we fixed. But SPARTAN knows. It discovers that a test started failing again, and tells us. Then we can fix it.

Some regressions are hard to catch, though. Perhaps they require several steps of human interaction that are difficult to automate or automatically verify. Or perhaps the regression is so trivial, so obscure that neither our hundred thousand tests or any of the hundreds of billions of sites out there are affected by it. Until one day, when Facebook starts to depend on just that behavior. And it breaks.

Ideally, we'd have an infinite number of test cases, covering everything the Web and its billions of sites depend on. But we're not quite there yet. Until then, we'll have to depend on both our automated testing and human testers inside and outside of Opera Software. Thanks for finding those bugs for us. (c;

edvakf 15. October 2009, 06:33

If I file a bug report with 100% reproducible bug case (which I do sometimes), does it become a test case when a QA receives it? Or I won't until a developer works on it?

haavard 15. October 2009, 08:14

The eBay issue is something they promised to fix on their end, so as far as I know, we are just waiting for them. Regression testing doesn't necessarily pick up when a site changes its code, only when a change in Opera causes problems.

fearphage 15. October 2009, 09:52

Thanks for the info gents. I understand that I probably don't/can't appreciate the complexity of browser development until I've done it.

For clarity's sake, is the js engine the only piece of core being entirely rewritten? I'm mostly interested in whether the HTML parser will be rewritten. I was told that was a requirement to fix 278053. I understand this is getting off-topic. Just anxious for cores beyond 2.2... :)

Sterkrig 16. October 2009, 13:13

Very interesting post indeed, esp. for software developers.

Hope there'll be posts about actual developing: SCM/VCS in use, developers' hierarchy etc. (-:E

Ockendorf 20. October 2009, 05:24

Thanks for the look inside :jester:

BS-Harou 29. November 2009, 21:05

Please! Write some article about Presto 2.4 (2.5), Carakan and Vega .. how it works, whats the progress and when we should expect first build with that)

netwolf 30. November 2009, 11:01

Originally posted by BS-Harou:

Please! Write some article about Presto 2.4 (2.5), Carakan and Vega .. how it works, whats the progress and when we should expect first build with that)


Yes, more facts (less rumors) please :smile:

fearphage 30. November 2009, 17:30

Originally posted by BS-Harou:

Please! Write some article about Presto 2.4 (2.5), Carakan and Vega .. how it works, whats the progress and when we should expect first build with that)

I'm tired of hearing people talk about it. I'd like to use some moderately current technology already. HTML5 called and asked why Opera couldn't come out to play :worried:

lucideer 30. November 2009, 18:37

Originally posted by fearphage:

HTML5 called and asked why Opera couldn't come out to play :worried:


You mean things like HTML5 Server-sent Events or Web Forms? "Current technologies" are defined by what people choose to use - which is defined by what browsers are popular. People don't choose browsers based on the features they support, people choose what features they use based on what their own default browser supports. This is why SVG has not taken off as much as it could have, and why very few web apps use persistent HTTP push, i.e. Comet.

Robert90 30. November 2009, 19:17

Originally posted by lucideer:

are defined by what people choose to use


I don't agree, "current technologies" are defined by what is used on the web (sounds logic, doesn't it?), and what's getting used on the web, is what browsers support. The only problem is some browser from Redmond which lags ages behind. And because of the market share of that browser from Redmond, the "current technologies" also lag ages behind.

The user doesn't care how the web works, if it works it's fine. The user doesn't care if it takes a programmer/designer hours of (extra) work to make a site look good, or what hacks they have to use before the site works. The only thing they care about is that it works.

And there is a lot of (simple) stuff in HTML5 and CSS which can be used without breaking the web. Thinks like border-radius, the new input fields from HTML, et cetera. If Opera supports those new things (which it does in the case of the new form stuff) Opera users can benefit from it, but users of other browsers, won't notice anything has changed (<input type="date" /> just changes into <input type="text"/> so the web doesn't look better in the browser, it's just still the same, you can easily check if a browsers support type="date", when it doesn't you can add some "click here" calendar besides to the text field (default fallback) and it doesn't change the user experience)

lucideer 30. November 2009, 19:43

I don't see where you disagreed with me Robert90 - I agree with everything in your post.

Chas4 30. November 2009, 23:53

I wonder if people know that HTML 5 is not a final spec yet

But I think that browsers using HTML 5 is a good thing as they can test what works and what does not

Robert90 1. December 2009, 07:36

HTML 5 isn't a final spec. yet, but some parts of the spec. are already finished (like the forms module). The same for the CSS 3 spec. the spec. isn't final yet, but modules like "background & border" are already finished.

And before the spec. can become final, isn't it a criteria that the spec. is already implemented by some browsers?

fearphage 1. December 2009, 15:49

Originally posted by lucideer:

You mean things like HTML5 Server-sent Events or Web Forms?

Opera's server-sent events was incomplete when they originally added it. It is now WAY outdated from the current spec. I'm aware of what in html5 Opera does and does not support. Before you try to school me, you should educate yourself.

Originally posted by lucideer:

People don't choose browsers based on the features they support, people choose what features they use based on what their own default browser supports.

I'm not talking about "people". I'm talking about the browser I want to use can't be used on some sites because Opera simply doesn't support the technology. Bespin, for instance, cannot work in Opera because supports no (0, none, nada) canvas texta apis (filed as DSK-246923).

lucideer 1. December 2009, 16:44

Originally posted by fearphage:

Opera's server-sent events was incomplete when they originally added it. It is now WAY outdated from the current spec. I'm aware of what in html5 Opera does and does not support. Before you try to school me, you should educate yourself.


As are Mozilla's implementations of HTML5 features - DOMStorage for example is MASSIVELY so.

Write a comment

You must be logged in to write a comment. If you're not a registered member, please sign up.