Cookies, small chunks of data placed on your computer by websites so that they can be returned to the website, have in the past 10+ years become a mostly useful part of online life. They are used "everywhere", from the beneficial login credentials used in these forums and various shopping carts, to the more ambivalent uses for advertisement targeting and other user tracking.
It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.
Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.
Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.
Let's start with a simple example:
Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).
Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.
Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).
And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?
Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.
This is starting to look really complicated, to put it mildy.
What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.
Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.
Netscape, which wrote the
initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.
The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.
The next versions of cookies (
RFC 2109,
RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.
Now, this looks much better, but ... hmmmm ... what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.
Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.
After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.
The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?
It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.
There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.
Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.
Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.
Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.
A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.
After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.
The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.
The best solution appears to be one often used in Computer Science: "Divide and Conquer"
1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.
The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.
About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.
At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.
The drafts are available from the IETF's servers:
DNS validate and
subTLDThere are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.
Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.
Feel free to send me comments and suggestions, either direct or on the
IETF's HTTP Work Group mailing list.
1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.Update September 21st:
As the drafts have now expired I have uploaded archive copies. The links above has also been updated.
draft-pettersen-dns-cookie-validate-00.txtdraft-pettersen-subtld-structure-00.txt