Skip navigation.

My Opera News

Behind the scenes at My Opera

Posts tagged with "servers"

Back to our primary master database

, , , ...

My Opera is now back to our more powerful primary master database.
In fact, since the switch we have been running on our secondary (as primary :smile:
Now our old primary, that became secondary, is the primary again. :smile:

We officially switched part of the My Opera database to InnoDB. This has been on the wishlist for such a long time... Now it's finally done. But it's not over. We need to proceed the conversion for all our slaves. This will happen "in the background"...

The failover procedure was really smooth. We're getting better at it, and we plan to automate it completely. I'm not sure if we will ever achieve this, but it doesn't seems to be really hard.
We already have a switch-master script that we can launch against each slave database, and it will safely switch the master db, checking that everything is in order, and it worked quite well today.

Someone the other day on this blog asked about Postgres. I really like Pg, I used it for so many years. MySQL can be good too. We'll see... :smile:

My Opera now running on a master to master replication setup

, , , ...

The scheduled database maintenance was completed on Saturday. Well, it wasn't exactly maintenance, but rather an important change of setup. Now My Opera is running on a master to master replication setup, instead of the classic master-slave that has been running for some years now.

There's at least 2 good reasons why we chose that setup:
  • it allows for database master maintenance with practically no downtime, since you can have the site run on your secondary master, while the primary is taken offline. When maintenance is over, you can switch back to the primary.
  • in case of disaster or failures to the primary master, the secondary can kick in (manually for now) in a rather short time. Last time we had a primary master db crash, it took us something around 4-6 hours to be back online.


So, currently My Opera is running on the secondary master database, while the primary is being prepared. "Being prepared" in this case means that we're converting some of our heavy duty tables to the InnoDB storage engine. This was really needed because MyISAM table-level locking poses some hard limits on concurrency and scalability of your database, even if it's very fast. Add that we have more user activity, APIs and services than ever before, and you have the whole picture.

Of course, all this work should also improve raw performance of the site. Once we get back to our primary master server, which we will probably do in the next few days, the site will be definitely faster. Not ludicrous speed yet, but faster :smile:

Scheduled downtime for database maintainance

, , ,

We are planning a database maintainance operation today from 19:00 UTC, 20:00 CET. Estimated downtime should be around 1 hour.

We tested this already on other db systems and we definitely saw good results, so cross your fingers.

We're looking forward to have a faster My Opera.

EDIT (2009/11/14 00:13:37): There has been no downtime so far.
We're continuing the maintainance operation tomorrow. Still 1 hour of estimated downtime. That will probably happen during the afternoon or early evening.

Static resources on lighttpd

, , , ...

It's been a while since we deployed the first static server for My Opera.
It was a really necessary step, because we had, and we continue to have, lots of static resources to be served.
A while back, most of these resources were served by applications, causing much more load on servers than necessary.
Now most of the heaviest ones have been already moved to the static servers.

Some months ago we also added support for partitioning of the user resources in our storage software layer. That means making sure that if we have the need to scale serving of resources on different machines, we can either replicate the entire content on different ones, or split the content over 2 or more machines. That part worked nicely so far. That's why at some point there was a static.myopera.com and static02.myopera.com.

Now there's also static03.myopera.com :smile:

We changed our setup again, transparently (for you users), to consolidate the previously recycled/temporary hardware into a new shiny machine with more disk space. On this machine, we installed lighttpd instead of our usual Apache setup. We wanted to try out this software. For us it was the first time we tried it on a production setup.

Edoardo played with it for a while, prepared the setup and installed it on static03. As of today, it has been running perfectly for nearly 1 month with a really low load, and peaks of 250 accesses per second. It's serving around 14M hits per day for avatars, user pictures, skin thumbnails, etc...

files.myopera.com outage this morning

, , , ...

This morning around 8 AM CEST we found out that no file from files.myopera.com was accessible anymore.
If you were using My Opera at that time, you certainly saw lots of Forbidden (403s) errors.

The server couldn't access any file on the filesystem.
All pictures gone, all user css gone, every user file missing.

Connecting to the machine, and trying to get directory contents, we only got back I/O errors.
We tried to remount the folder, with no luck. We also tried to stop all services, unmount and remount. Didn't work. Then we rebooted the server, hoping for a clean mount of the filesystem.

It worked. Except that it was down again 2 hours later.



Our mighty sysadmins started working on the issue, and by lunch time, it was fixed.
Then we restarted the services, and everything looked good. At around 12:15 CEST, the files were back online. No data loss reported. Just panic for a while :-)

What happened is that a tiny little fibre channel connector, also known as GBIC, decided to leave this cruel world this morning.

Right on time to celebrate our 3,000,000 users!

Opera Unite release day

, , , ...

Tuesday 16th of June was a special day for My Opera too.

Opera Unite was released at 09:00 CEST, and just after that, the amount of traffic and visitors to My Opera increased quickly, to reach the highest peak around 18:00 UTC.

This was also the highest peak ever reached for My Opera. On that day, we finally broke the 2 million page views per day, that sums up to more than 100M hits and 600,000 unique visitors just that day.

We're very happy about this. Yes, the site was not completely usable, and sometimes it was serving pages very slowly, but still we managed to keep it running. We hacked together some quick and dirty fixes to achieve this. Some of those hacks have been removed now, and we're working on applying them permanently to the site.

For the curious/interested, I'm referring to what we call the "User" module, that provides most of the user-related information. When you open a page on My Opera, we call the user module at least for:
  • the visiting user (you), and
  • the user "owner" of the page (the my.opera.com/<this_bit_here>)


Not a case that this is the most called module throughout the whole web site, so we want to optimize it and make it really fast, ideally around a few milliseconds.

Here's a few munin charts, of the database connections (this is a weekly chart, so look around the "16"):



And of one of our backends, with a rough timeline of what happened:



Of course, there's still a lot of work to do, and we know which are the slower parts of the site. That might change while the community grows bigger and bigger every day. We will keep up. :-)

Disappearing user pictures

,

Lately we have noticed two different problems, one with user pictures and another one with default avatars.

Sometimes it happens that user pictures just come up broken. For some reason, they don't show. This is a bit weird, and difficult to reproduce. Every time you try to reload the broken pictures, they misteriously show again.

We think we found out why. Analyzing static.myopera.com access logs, we found that some requests were being served as 503, which, according to the HTTP spec, is:

10.5.4 503 Service Unavailable

   The server is currently unable to handle the request due to a
   temporary overloading or maintenance of the server. The implication
   is that this is a temporary condition which will be alleviated after
   some delay. If known, the length of the delay MAY be indicated in a
   Retry-After header. If no Retry-After is given, the client SHOULD
   handle the response as it would for a 500 response.

      Note: The existence of the 503 status code does not imply that a
      server must use it when becoming overloaded. Some servers may wish
      to simply refuse the connection.

So, 503s are sent to clients when the server can't serve more requests, or when there are bandwidth or requests per second limits. In fact, some months ago we set some per-client bandwidth and connection limits, to allow fair usage of resources for all our users. Now we have raised the limits a bit, and we're keeping the access log monitored, trying to minimize "503" errors.

Hopefully this should eliminate all the "missing-pictures" problems. Shout if you still see broken pictures.
About avatars, it might be a bit trickier, since avatars are not served from static.myopera.com, so it cannot be the same problem. So, is anyone in the community able to reproduce the "default-avatar-missing" problem with any browser?