How to make sure the cookies don't burn your fingers?
Tuesday, May 23, 2006 9:27:28 PM
Cookies, small chunks of data placed on your computer by websites so that they can be returned to the website, have in the past 10+ years become a mostly useful part of online life. They are used "everywhere", from the beneficial login credentials used in these forums and various shopping carts, to the more ambivalent uses for advertisement targeting and other user tracking.
It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.
Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.
Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.
Let's start with a simple example:
Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).
Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.
Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).
And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?
Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.
This is starting to look really complicated, to put it mildy.
What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.
Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.
Netscape, which wrote the initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.
The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.
The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.
Now, this looks much better, but ... hmmmm ... what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.
Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.
After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.
The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?
It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.
There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.
Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.
Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.
Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.
A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.
After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.
The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.
The best solution appears to be one often used in Computer Science: "Divide and Conquer"1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.
The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.
About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.
At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.
The drafts are available from the IETF's servers: DNS validate and subTLD
There are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.
Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.
Feel free to send me comments and suggestions, either direct or on the IETF's HTTP Work Group mailing list.
1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.
Update September 21st:
As the drafts have now expired I have uploaded archive copies. The links above has also been updated.
draft-pettersen-dns-cookie-validate-00.txt
draft-pettersen-subtld-structure-00.txt
It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.
Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.
Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.
Let's start with a simple example:
Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).
Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.
Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).
And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?
Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.
This is starting to look really complicated, to put it mildy.
What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.
Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.
Netscape, which wrote the initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.
The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.
The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.
Now, this looks much better, but ... hmmmm ... what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.
Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.
After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.
The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?
It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.
There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.
Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.
Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.
Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.
A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.
After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.
The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.
The best solution appears to be one often used in Computer Science: "Divide and Conquer"1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.
The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.
About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.
At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.
The drafts are available from the IETF's servers: DNS validate and subTLD
There are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.
Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.
Feel free to send me comments and suggestions, either direct or on the IETF's HTTP Work Group mailing list.
1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.
Update September 21st:
As the drafts have now expired I have uploaded archive copies. The links above has also been updated.
draft-pettersen-dns-cookie-validate-00.txt
draft-pettersen-subtld-structure-00.txt








_Grey_ # Saturday, May 27, 2006 11:10:01 PM
Could you please explain the following paragraph again? I try hard to understand it, but I seem to fail.
Yngve Nysæter Pettersenyngve # Sunday, May 28, 2006 12:01:00 AM
What it means is that, with only that rule, the problem is still just as bad, even if the range of hostnames that can behave maliciously has been restricted.
(BTW: Fixed a typo in the example)
_Grey_ # Sunday, May 28, 2006 2:19:25 AM
Almost all sites are put at the root domain, though ...
In this case you could check against the first part (of the uri) being an indication for a server or a domain. In the latter case you can act accordingly (e.g. restricting access to example.co.uk instead of .co.uk).
On a sidenote: A rule based entirely on such physical criteria as "dots" doesn't make any sense. The rule should be formulated in other means ... actually the means of it, not the visual markings of it. Just to state that, a "dot-rule" sounds like nonsense, like rubbish in my ears. Unfortunately I can't come up with a nice browser analogy as I had planned, but nevertheless.
Aux # Monday, May 29, 2006 12:17:19 PM
cookie-group: somesite.co.uk, www.somesite.co.uk, www.somesite.com
Then if we visit www.somesite.com it also sould send such string so we can check if this group is correct. If group matches then everything is fine. If not - then hastalavista cookie! I think this would do the trick without any heuristics and buggy methods.
Yngve Nysæter Pettersenyngve # Monday, May 29, 2006 5:42:11 PM
I am also afraid that your suggestion may have the same problem as the current cookie specifications: If you have a group specification "example.co.uk, www.example.co.uk", how do you know that example.co.uk is a valid domain, and not a subTLD? We can't trust the information sent to us from the server unless it is signed by a trusted third party.
Even if the group specification had to match or overlap before sending cookies, using that method would slow down the process as much as using HEAD request to a server would. A single OPTIONS request might work, though, but that may still require serious modifications to the servers. And it would also not stop malicious multisite cooperation, or abuse of opportunities due to bad configuration of the specification.
Yngve Nysæter Pettersenyngve # Monday, May 29, 2006 5:56:57 PM
Originally posted by _Grey_:
Sorry, I am not quite sure I understand what you mean _Grey_.
If you mean that we should try prepending "www." on the domain to see if it resolves, please try this URL: http://www.co.uk/
If OTOH you mean we should check for an IP address on the target domain, that is what Opera does at the moment, and is what is described in the DNS-validate draft.
If you mean a rule based on X number of dots in a name, I agree, it does not work. Unfortunately, the original cookie specification was written at a time when the structure of the internet namespace was still in flux, and the people writing it may also have been more familiar with the American based domain space (.com etc.), but at least they did try to put a policy in place, it just did not work very well.
_Grey_ # Monday, May 29, 2006 10:01:29 PM
If there is such a prefix, one could append the "rule with the dots" and eveything should work fine. In the case that there isn't, only allow to set cookies for the current domain, no "parent" or "reach" or what. Although this might be too restrictive to some ...
http://www.co.uk doesn't resolve anything. Can't seem to understand you. (anyone else seeing the irony? ;) )
But now I think you're right. What you suggested might be the only option for not being too restrictive and at the same time being secure. That involves further engagement from quite a few people, though. Maybe you/Opera/someone else should start a petition that can be handed over to Committees, Working Groups, Domain Registries, and so on. There needs to be a simple explanation of the problem, a presentation of the solution and a FAQ for people having questions like the ones I asked or that are likely to be asked.
Maybe, this just goes too far, though. *g*
Yngve Nysæter Pettersenyngve # Monday, May 29, 2006 11:03:02 PM
Aux # Tuesday, May 30, 2006 10:08:46 AM
Yngve Nysæter Pettersenyngve # Tuesday, May 30, 2006 12:37:01 PM
OTOH, it could be that you are seeing possibilities I don't see. Maybe you should flesh it out and submit it to the IETF as a Internet Draft?
Aux # Thursday, June 1, 2006 10:22:55 AM
ChristianResearchWizard # Monday, July 24, 2006 10:31:23 PM
Originally posted by yngve:
This approach sounds not too bad for me, just a little bit short-sighted, as you pointed out the problem with missing prefix www (what is usual for subdomains). www is not really a prefix, but just treat it like one and define it is superfluous. Instead of 'parent' domain the domain itself would be the valid target. There has to be a tweak for the www[nn] (ie www2) which should be easily catchable with regular expressions (to be in the line of RFC 2109, RFC 2965 you can remove www[nn] if any and virtually add www again to use the parent domain - it is just harmonise / normalise in a formal sense).
With this first rule www.co.uk would be invalid (maybe the rules for co.uk changed meanwhile??). It should not be too difficult to have an additional white list with all possible www[nn] in front of a TLD. Speaking of TLD in this cases doesn't mean the entry behind the last dot, but what is logically the TLD: co.uk is a TLD in this sense. Still the number is very limited and not object for frequent changes.
But i suppose it is not that easy. Maybe RFC 2109, RFC 2965 are not part of the real world and subdomain-cookies should be valid for the domain itself? Therefor subdomain1.example.com, subdomain2.example.com and example.com should belong to the same group. But of course example.suburb.city.state.us should be treated differently as a single group. In this case it is really bad and I wish you and all the other responsible people success and luck for a working solution.
Oleg Puzanovpuzanov # Tuesday, March 17, 2009 5:49:44 PM
You've wrote that Opera will not set a cross-domain cookie to a websites with domains that do not have an IP address (such as .examle.local). But what if I'am a web-developer and I simply want to test my cross-site auth (which greatly work in other browsers), e.g. when I login in test1.example.local I put the cookie for a group example.local, but other my site with domain test2.example.local can't access it.
I've populated my /etc/hosts file in my local machine with needed records so now I can resolve test1.example.com and test2.example.com but seems that Opera still doesn't want to share a cookie. Maybe my local dev domains should to be resolved with 127.0.0.1 address?
Any adviсe?
Yngve Nysæter Pettersenyngve # Wednesday, March 18, 2009 1:48:48 AM
BTW, do you mean Cross-domain, or domain-wide? "Cross-domain", to me, means that example.org tries to set (in a response from example.org) a cookie for example.com, which is not allowed.
Domain-wide cookies are allowed if the cookie is set for a second-level domain in a generic TLD, one level up in other TLDs, except on the second level, and anything else have to pass the IP address test, or be assigned a filter.
Oleg Puzanovpuzanov # Wednesday, March 18, 2009 5:34:24 AM
Thanks for the answer.
egcrosser # Friday, May 28, 2010 9:25:52 PM
Yngve Nysæter Pettersenyngve # Friday, May 28, 2010 9:57:27 PM
Also, DNS is not necessarily available if you are behind a firewall that only allow access to external networks through a proxy. The only thing you can reasonably rely on being able to use in such a case is HTTP and HTTPS, DNS is out as an option.
There are even platforms where the client cannot even do DNS lookups for establishing sockets, you have to tell the OS the name of the server you want to connect to, you won't know til the connection fails if there really is a server with that name.
John Jardinemrgprime # Friday, May 28, 2010 10:50:51 PM
http://v1.lscache1.c.youtube.com/crossdomain.xml
What this says is any of these URLs are allowed to access this site. All others will be denied access. (Note, this is for access through the Flash player)
Imagine if something like this could be applied for cookies. If I own bob.co.uk and I want to set a cookie to bob.co.uk from hi.my.name.is.bob.co.uk, then I simply have to add some sort of an XML file at bob.co.uk that says this URL is allowed access it:
<cross-domain-policy>
<allow-access-from domain="*.bob.co.uk" secure="true" />
</cross-domain-policy>
This file would be hosted at http://bob.co.uk/cookies.xml, so there is no way we could grant ourselves access to read or write to cookies from ".co.uk"
This could even allow the flexibility to share cookies across totally separate domains, even with different TLDs:
http://bob.co.uk/cookies.xml
<cross-domain-policy>
<allow-access-from domain="*.bob.co.uk" />
<allow-access-from domain="*.bob.com" />
</cross-domain-policy>
I understand this would never happen, but hey.. it's nice to dream
Yngve Nysæter Pettersenyngve # Saturday, May 29, 2010 1:26:56 AM
Benjamin Smithmcrbids # Saturday, May 29, 2010 4:49:03 AM
I run a fairly extensive cluster of application servers all secured by wildcard SSL. Whenever I need to "jump" from one subdomain to another, I simply "pre-set" the session on the target server via a secured connection from the source server and then hand a header redirect to the target server to the end-user's browser. Since the target server is expecting the hit, it sets the cookie as soon as it recognizes the browser and the hand-off is complete. The process is nearly instantaneous and invisible to the end user, and it only requires a post on a private backplane network to enable it.
I never even bothered to *try* cross-domain cookies, and I think the idea is fundamentally flawed, as this post articulates.
Yngve Nysæter Pettersenyngve # Saturday, May 29, 2010 10:08:49 AM
gojomohr # Monday, May 31, 2010 8:55:15 AM
Yngve Nysæter Pettersenyngve # Monday, May 31, 2010 9:56:06 AM
gojomohr # Tuesday, June 1, 2010 5:28:29 AM
In particular, to the extent the publicsuffix.org list is widely used, and doesn't seem to have any fatal flaws, it seems a strong counterexample to the suggestion, "It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it."
Yngve Nysæter Pettersenyngve # Tuesday, June 1, 2010 9:53:47 AM
Originally posted by gojomohr:
Work on the Mozilla list did not start until after the above article was written.
The Mozilla list is crowd sourced, they used several years to produce it, and they have AFAIK been contacting every TLD registrar to ask them to help assure the quality of the list.
And the list will soon have to be massively extended, since we have started to get IDN TLD (TLDs using non-alphabethic scripts, such as Cyrillic and Indian), and ICANN is about to open the Generic TLD floodgates.
My point in the article and the IDs is that for a third party to stay on top of such changes, as well as changes in the existing domain structure, will be costly, whether you count money or person hours spent, and time consuming, and it will likely be lagging behind actual events. It would be far more efficient if the registrars themselves maintained the information and made it available through some common repository.