How to make sure the cookies don't burn your fingers?
Tuesday, 23. May 2006, 21:27:28
Cookies, small chunks of data placed on your computer by websites so that they can be returned to the website, have in the past 10+ years become a mostly useful part of online life. They are used "everywhere", from the beneficial login credentials used in these forums and various shopping carts, to the more ambivalent uses for advertisement targeting and other user tracking.
It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.
Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.
Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.
Let's start with a simple example:
Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).
Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.
Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).
And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?
Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.
This is starting to look really complicated, to put it mildy.
What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.
Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.
Netscape, which wrote the initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.
The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.
The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.
Now, this looks much better, but ... hmmmm ... what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.
Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.
After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.
The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?
It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.
There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.
Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.
Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.
Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.
A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.
After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.
The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.
The best solution appears to be one often used in Computer Science: "Divide and Conquer"1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.
The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.
About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.
At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.
The drafts are available from the IETF's servers: DNS validate and subTLD
There are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.
Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.
Feel free to send me comments and suggestions, either direct or on the IETF's HTTP Work Group mailing list.
1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.
Update September 21st:
As the drafts have now expired I have uploaded archive copies. The links above has also been updated.
draft-pettersen-dns-cookie-validate-00.txt
draft-pettersen-subtld-structure-00.txt
It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.
Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.
Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.
Let's start with a simple example:
Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).
Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.
Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).
And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?
Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.
This is starting to look really complicated, to put it mildy.
What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.
Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.
Netscape, which wrote the initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.
The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.
The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.
Now, this looks much better, but ... hmmmm ... what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.
Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.
After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.
The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?
It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.
There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.
Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.
Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.
Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.
A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.
After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.
The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.
The best solution appears to be one often used in Computer Science: "Divide and Conquer"1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.
The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.
About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.
At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.
The drafts are available from the IETF's servers: DNS validate and subTLD
There are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.
Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.
Feel free to send me comments and suggestions, either direct or on the IETF's HTTP Work Group mailing list.
1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.
Update September 21st:
As the drafts have now expired I have uploaded archive copies. The links above has also been updated.
draft-pettersen-dns-cookie-validate-00.txt
draft-pettersen-subtld-structure-00.txt









_Grey_ # 27. May 2006, 23:10
Could you please explain the following paragraph again? I try hard to understand it, but I seem to fail.
yngve # 28. May 2006, 00:01
What it means is that, with only that rule, the problem is still just as bad, even if the range of hostnames that can behave maliciously has been restricted.
(BTW: Fixed a typo in the example)
_Grey_ # 28. May 2006, 02:19
Almost all sites are put at the root domain, though ...
In this case you could check against the first part (of the uri) being an indication for a server or a domain. In the latter case you can act accordingly (e.g. restricting access to example.co.uk instead of .co.uk).
On a sidenote: A rule based entirely on such physical criteria as "dots" doesn't make any sense. The rule should be formulated in other means ... actually the means of it, not the visual markings of it. Just to state that, a "dot-rule" sounds like nonsense, like rubbish in my ears. Unfortunately I can't come up with a nice browser analogy as I had planned, but nevertheless.
Aux # 29. May 2006, 12:17
cookie-group: somesite.co.uk, www.somesite.co.uk, www.somesite.com
Then if we visit www.somesite.com it also sould send such string so we can check if this group is correct. If group matches then everything is fine. If not - then hastalavista cookie! I think this would do the trick without any heuristics and buggy methods.
yngve # 29. May 2006, 17:42
I am also afraid that your suggestion may have the same problem as the current cookie specifications: If you have a group specification "example.co.uk, www.example.co.uk", how do you know that example.co.uk is a valid domain, and not a subTLD? We can't trust the information sent to us from the server unless it is signed by a trusted third party.
Even if the group specification had to match or overlap before sending cookies, using that method would slow down the process as much as using HEAD request to a server would. A single OPTIONS request might work, though, but that may still require serious modifications to the servers. And it would also not stop malicious multisite cooperation, or abuse of opportunities due to bad configuration of the specification.
yngve # 29. May 2006, 17:56
Originally posted by _Grey_:
Sorry, I am not quite sure I understand what you mean _Grey_.
If you mean that we should try prepending "www." on the domain to see if it resolves, please try this URL: http://www.co.uk/
If OTOH you mean we should check for an IP address on the target domain, that is what Opera does at the moment, and is what is described in the DNS-validate draft.
If you mean a rule based on X number of dots in a name, I agree, it does not work. Unfortunately, the original cookie specification was written at a time when the structure of the internet namespace was still in flux, and the people writing it may also have been more familiar with the American based domain space (.com etc.), but at least they did try to put a policy in place, it just did not work very well.
_Grey_ # 29. May 2006, 22:01
If there is such a prefix, one could append the "rule with the dots" and eveything should work fine. In the case that there isn't, only allow to set cookies for the current domain, no "parent" or "reach" or what. Although this might be too restrictive to some ...
http://www.co.uk doesn't resolve anything. Can't seem to understand you. (anyone else seeing the irony? ;) )
But now I think you're right. What you suggested might be the only option for not being too restrictive and at the same time being secure. That involves further engagement from quite a few people, though. Maybe you/Opera/someone else should start a petition that can be handed over to Committees, Working Groups, Domain Registries, and so on. There needs to be a simple explanation of the problem, a presentation of the solution and a FAQ for people having questions like the ones I asked or that are likely to be asked.
Maybe, this just goes too far, though. *g*
yngve # 29. May 2006, 23:03
Aux # 30. May 2006, 10:08
yngve # 30. May 2006, 12:37
OTOH, it could be that you are seeing possibilities I don't see. Maybe you should flesh it out and submit it to the IETF as a Internet Draft?
Aux # 1. June 2006, 10:22
ResearchWizard # 24. July 2006, 22:31
Originally posted by yngve:
This approach sounds not too bad for me, just a little bit short-sighted, as you pointed out the problem with missing prefix www (what is usual for subdomains). www is not really a prefix, but just treat it like one and define it is superfluous. Instead of 'parent' domain the domain itself would be the valid target. There has to be a tweak for the www[nn] (ie www2) which should be easily catchable with regular expressions (to be in the line of RFC 2109, RFC 2965 you can remove www[nn] if any and virtually add www again to use the parent domain - it is just harmonise / normalise in a formal sense).
With this first rule www.co.uk would be invalid (maybe the rules for co.uk changed meanwhile??). It should not be too difficult to have an additional white list with all possible www[nn] in front of a TLD. Speaking of TLD in this cases doesn't mean the entry behind the last dot, but what is logically the TLD: co.uk is a TLD in this sense. Still the number is very limited and not object for frequent changes.
But i suppose it is not that easy. Maybe RFC 2109, RFC 2965 are not part of the real world and subdomain-cookies should be valid for the domain itself? Therefor subdomain1.example.com, subdomain2.example.com and example.com should belong to the same group. But of course example.suburb.city.state.us should be treated differently as a single group. In this case it is really bad and I wish you and all the other responsible people success and luck for a working solution.
puzanov # 17. March 2009, 17:49
You've wrote that Opera will not set a cross-domain cookie to a websites with domains that do not have an IP address (such as .examle.local). But what if I'am a web-developer and I simply want to test my cross-site auth (which greatly work in other browsers), e.g. when I login in test1.example.local I put the cookie for a group example.local, but other my site with domain test2.example.local can't access it.
I've populated my /etc/hosts file in my local machine with needed records so now I can resolve test1.example.com and test2.example.com but seems that Opera still doesn't want to share a cookie. Maybe my local dev domains should to be resolved with 127.0.0.1 address?
Any adviсe?
yngve # 18. March 2009, 01:48
BTW, do you mean Cross-domain, or domain-wide? "Cross-domain", to me, means that example.org tries to set (in a response from example.org) a cookie for example.com, which is not allowed.
Domain-wide cookies are allowed if the cookie is set for a second-level domain in a generic TLD, one level up in other TLDs, except on the second level, and anything else have to pass the IP address test, or be assigned a filter.
puzanov # 18. March 2009, 05:34
Thanks for the answer.