Skip navigation.

Implementer's notes

What might get caught in the gears under the hood?

Posts tagged with "HTTP"

Refreshed Internet Drafts

, , , ...

I've once again refreshed my HTTP Cookie and Cache related Internet Drafts. The drafts have been discussed here a couple of times before, but to summarize:

draft-pettersen-dns-cookie-validate-04.txt (archive)

This Draft describes Opera's current heuristical approach to avoid sending cookies to registry-like domains like co.uk (The "Cookie Monster Bug"). First discussed here.

draft-pettersen-subtld-structure-04.txt (archive)

This Draft describes an improved approach to handling the "Cookie Monster Bug", using an online black list of registry-like domains. Also first discussed here.

The Mozilla team's work on "Effective TLDs" is based on an early version of this suggestion. The result of this work is now available from PublicSuffix.org, and is AFAIK currently used for one or more features by Chrome Beta, FF3, and IE8 Beta.

To reduce the complexity of the specification, and to avoid excluding possible solutions, I have now removed the previous suggestions for how the repository should be generated, and just shortly mention some possibilies in the Appendixes.

draft-pettersen-cookie-v2-03.txt (archive)

This Draft describes the ideal solution to the "Cookie Monster Bug", that everybody starts using a new format for cookies that completely remove the problem. First discussed here.

draft-pettersen-cache-context-03.txt (archive)

This Draft describes a method for giving sites a method to tell the client that a group of webpages are related, which can be used to better organize logouts. First discussed here.

draft-pettersen-dns-cookie-validate-04.txt
draft-pettersen-subtld-structure-04.txt
draft-pettersen-cookie-v2-03.txt
draft-pettersen-cache-context-03.txt


Refreshed Internet Drafts

, , , ...

I've just refreshed my HTTP Cookie and Cache related Internet Drafts. The drafts have been discussed here a couple of times before, but to summarize:

draft-pettersen-dns-cookie-validate-03.txt (archive)

This Draft describes Opera's current heuristical approach to avoid sending cookies to registry-like domains like co.uk (The "Cookie Monster Bug"). First discussed here.

draft-pettersen-subtld-structure-03.txt (archive)

This Draft describes an improved approach to handling the "Cookie Monster Bug", using an online black list of registry-like domains. Also first discussed here.

The Mozilla team's work on "Effective TLDs" is based on an early version of this suggestion.

draft-pettersen-cookie-v2-02.txt (archive)

This Draft describes the ideal solution to the "Cookie Monster Bug", that everybody starts using a new format for cookies that completely remove the problem. First discussed here.

draft-pettersen-cache-context-02.txt (archive)

This Draft describes a method for giving sites a method to tell the client that a group of webpages are related, which can be used to better organize logouts. First discussed here.

draft-pettersen-dns-cookie-validate-03.txt
draft-pettersen-subtld-structure-03.txt
draft-pettersen-cookie-v2-02.txt
draft-pettersen-cache-context-02.txt


New Cookie and Cache Internet-Drafts

, , , ...

I've just submitted updated versions of my HTTP Cookie and Cache related Internet-Drafts to the IETF.

I've covered the background for these drafts previously in these articles: Cookie-1, Cookie-2, Cache Context.

If you have comments, suggestions, corrections, etc., feel free to discuss them here, or directly with me. General discussion of the proposal should take place on the IETF's HTTP Work Group mailing list.

In a couple of days the drafts should become available in the IETF Internet-Draft repository at these locations:

draft-pettersen-dns-cookie-validate-02.txt

draft-pettersen-subtld-structure-02.txt

draft-pettersen-cookie-v2-01.txt

draft-pettersen-cache-context-01.txt


Archive copies can be found here:

draft-pettersen-dns-cookie-validate-02.txt

draft-pettersen-subtld-structure-02.txt

draft-pettersen-cookie-v2-01.txt

draft-pettersen-cache-context-01.txt

Introducing Cache Contexts, or: Why the browser does not know you are logged out

, , , ...

One of the Opera features I often see mentioned as "good" is our history navigation where you can navigate back and forwards in the history very quickly, and without making any revalidation/refetches of the content from the server.

This is made possible by our adherence to the principle stated in RFC 2616 (HTTP 1.1) section 13.13 about history list navigation.

13.13 History Lists

[...]History mechanisms and caches are different. In particular history mechanisms SHOULD NOT try to show a semantically transparent view of the current state of a resource. Rather, a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved.

By default, an expiration time does not apply to history mechanisms. If the entity is still in storage, a history mechanism SHOULD display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.[...]



Unfortunately, now and then (such as just last week) there's somebody reporting this as a bug or even a "security issue". They are in particular using the Security Issue tag when the navigation happens on a site handling sensitive data, such as online banking, after the user has logged out.

Well, I can understand that online banking sites and other sites handling sensitive information do not want the client to show information retrieved from the site while the user was logged in, after the user has logged out. It's just that they forget a simple fact, inherent in HTTP: The browser does not know the user has (been) logged out, and currently there is no way for the browser to find out!

Somebody will probably now say "but cookies/no-cache/must-revalidate can be used for that". Not really. Let's look at each of these options:

  • Cookies: More precisely known as "HTTP State Management Cookies" are used to keep information about what is going on with respect to a given client and user. However, the browser does not know anything about what a cookie means, it just stores it, and sends it back to the server(s) that are supposed to receive it. That a cookie is deleted (such as during logout) does not mean anything special to the client, and in fact a site need not delete a cookie to mark a user as logged out, it can just change a flag in its own database, without telling the browser.

  • no-cache: This Cache-Control response directive may very well be the most misunderstood parameter in all of HTTP, quite likely because of its name, and its use in requests. There seems to be a significant belief that a client must never store a document served with this flag, or produce it as a result of clicking on a link. That is not the function of this directive. RFC 2616 sec. 14.9.1 states:

    no-cache

    [...] a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. [...]


    This only means that when you click on a link, the client must first ask the server "Has this been modified since I loaded it?", and if the server says "No, it is not modified", then it may show it to you.

    This does not apply to history navigation, because "no-cache" defines the "expiration time" of the web page, nothing else.

  • must-revalidate: If "no-cache" is the most misunderstood parameter, "must-revalidate" may well be the most abused. According to RFC 2616 section 14.9.4:

    must-revalidate

    [...] When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. [...]



    Opera may not quite follow the letter of this specification, but that is because of how the directive will normally be used. We treat this directive's presence as an indication that 1) the resource is expired (same as with "no-cache") and 2) during history navigation of secure sites a web page is revalidated before it is displayed to the user.

    You probably noticed the "secure site" condition. The reason for this is the aforementioned abuse of the "must-revalidate" directive. Quite a lot of sites (I have personally seen a lot of PHP powered Wikis with this problem) are sending the cache-directive combination "no-cache, no-store, must-revalidate", which to Opera means "check every time the user asks for it, even during history navigation, and do not store it on disk, only in RAM while we have it".

    What this means is that when you visit such as site (and all the directives are obeyed), then the browser will ask the server to validate the content (even statics) of each page (in some cases even images) every time, which results in very slow navigation. When we introduced "must-revalidate" the forums overflowed with "slow navigation" posts, and long-time Opera supporter non-troppo actually had to patch his Opera Wiki server to keep the problem at bay.

    This abuse is the reason why must-revalidate is only obeyed for secure sites.


As indicated above, some web service providers consider failure to revalidate after logout to be a potential vulnerability for their system. This has caused them to invest quite a lot of effort into ensuring that the browser behaves as they want it to. They have not just used the above mentioned cache directives, but also various scripting technologies.

The major drawback of all the current systems is that they will most often increase traffic to the website using them, thus increasing the load on the servers, which means that to stay operative they need even more servers, which means higher cost of operating the site.

All of it (trouble, wasted money, complaints, etc.) because they are not able to tell the browser that the user has been logged out.

So, how can this be solved in a more economic and predictable manner?

A solution should have the following requirements:

  • The user should be able to navigate history as normal while logged in, no revalidation should take place.
  • Already loaded resources should not be refetched unless the proper functioning of the site requires it (for example, the account page should display the current amounts when the user click on a link going to the accounts summary page)
  • When the user is logged out, all sensitive documents should be removed, and if they are still in history, they should be revalidated with the server before being displayed to the user.

This set of requirements indicates that the server should have a way to tell the browser

  1. That a page is part of the sensitive document group.
  2. When the user is logged out (or should be logged out).


I've recently written, and submitted to the IETF, a document describing a system that tries to solve these problems: Cache Contexts.

In this new system the server uses a Cache Context directive to tell the browser that the document(s) served are part of a specific group of documents, a "context", and when the context has ended its usefulness, the server can discard it, which will also tell the browser that the documents in the context should no longer be displayed to the user, unless they have been confirmed by the server.

The server can tell the client how long it should let the documents in the context live, and it may also connect the lifetime of a context to the lifetime of a cookie; when the cookie is deleted, all the documents in the context are deleted, too.

These are just some of the features offered by Cache Contexts.

If you are interested, it will be available from the IETF's Internet Draft repository in a couple of days. If you cannot wait, a copy of it can be found here.

If you have comments, suggestions, corrections, etc., feel free to discuss them here, or directly with me. General discussion of the proposal should take place on the IETF's HTTP Work Group mailing list.

draft-pettersen-cache-context-00.txt

Updated Internet Drafts about HTTP cookie domain validation

, , , ...

A while back I submitted two Internet Drafts that try to fix a problem with limiting which domains a cookie can be set for (known as the "Cookie Monster Bug"), one using DNS to validate the domain (the method Opera is currently using), and one using a new protocol to retrieve information that can be used to validate the domain.

Reception to the drafts was mixed, as mentioned earlier some in the DNS/Registry community do not like the methods proposed because of the assumptions made about the DNS hierarchy. On the other hand the Mozilla developers have already started implementing a modified version of the -00 SubTLD draft.

I have now submitted updated drafts to the IETF.

The DNS validate draft is almost unchanged, there's just been some minor tuning.

The SubTLD draft has been updated with a new XML based file-format that makes the format more expandable and more readable. Thanks to Anne van Kesteren for the helping me with that.

These drafts, and the Cookie v2 draft I submitted last week are the alternatives we at Opera have been able to see for how to solve the cookie domain limitation problem. We would like suggestions not just for how to improve our proposals, but also alternative proposals (please consider submitting them as IETF Internet Drafts) that will fix the problem more efficiently that ours do.

While suggestions can be submitted directly to me, I would recommend that discussions be held in the IETF HTTP WG mailing list.

Links to the IETF Internet Drafts repository:

DNS Validate
SubTLD

Archive copies:

DNS Validate: draft-pettersen-dns-cookie-validate-01.txt
SubTLD: draft-pettersen-subtld-structure-01.txt

A new HTTP Cookie recipe

, , , ...

A while ago I presented some thoughts about whether or not a new specification for HTTP cookies is needed to address problems with limiting cookies to a real site's domain, and not to multiple sites within a registry-like domain, such as co.uk and city.state.us, a problem that I've also discussed earlier.

I've just submitted an Internet Draft that proposes some of the suggestions I put forward in my previous article.

Compared to RFC 2965, which my draft is based on, the document removes two of the attributes, replacing them with new attributes that change the domain and path rules of cookies, respectively making the server and default path define the widest domain and path distribution possible.

The draft will be available from the IETF's Internet Draft repository, but I'm also making an archive copy available here

Comments, suggestions and alternative solutions are welcome either directly to me, or in the IETF HTTP WG mailing list after the draft has been announced there (That is also where broader discussion of the issues should take place).

draft-pettersen-cookie-v2-00.txt

How to make sure the cookies don't burn your fingers?

, , , ...

Cookies, small chunks of data placed on your computer by websites so that they can be returned to the website, have in the past 10+ years become a mostly useful part of online life. They are used "everywhere", from the beneficial login credentials used in these forums and various shopping carts, to the more ambivalent uses for advertisement targeting and other user tracking.

It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.

Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.

Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.

Let's start with a simple example:

Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).

Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.

Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).

And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?

Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.

This is starting to look really complicated, to put it mildy.

What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.

Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.

Netscape, which wrote the initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.

The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.

The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.

Now, this looks much better, but ... hmmmm ... what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.

Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.

After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.

The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?

It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.

There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.

Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.

Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.

Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.

A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.

After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.

The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.

The best solution appears to be one often used in Computer Science: "Divide and Conquer"1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.

The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.

About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.

At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.

The drafts are available from the IETF's servers: DNS validate and subTLD

There are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.

Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.

Feel free to send me comments and suggestions, either direct or on the IETF's HTTP Work Group mailing list.




1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.


Update September 21st:

As the drafts have now expired I have uploaded archive copies. The links above has also been updated.

draft-pettersen-dns-cookie-validate-00.txt
draft-pettersen-subtld-structure-00.txt
November 2009
S M T W T F S
October 2009December 2009
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30