Tuesday, 27. February 2007, 17:23:52
One of the Opera features I often see mentioned as "good" is our history navigation where you can navigate back and forwards in the history very quickly, and without making any revalidation/refetches of the content from the server.
This is made possible by our adherence to the principle stated in
RFC 2616 (HTTP 1.1) section 13.13 about history list navigation.
13.13 History Lists
[...]History mechanisms and caches are different. In particular history mechanisms SHOULD NOT try to show a semantically transparent view of the current state of a resource. Rather, a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved.
By default, an expiration time does not apply to history mechanisms. If the entity is still in storage, a history mechanism SHOULD display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.[...]
Unfortunately, now and then (such as just last week) there's somebody reporting this as a bug or even a "security issue". They are in particular using the Security Issue tag when the navigation happens on a site handling sensitive data, such as online banking, after the user has logged out.
Well, I can understand that online banking sites and other sites handling sensitive information do not want the client to show information retrieved from the site while the user was logged in, after the user has logged out. It's just that they forget a simple fact, inherent in HTTP: The browser does not
know the user has (been) logged out, and currently there is no way for the browser to find out!
Somebody will probably now say "but cookies/no-cache/must-revalidate can be used for that". Not really. Let's look at each of these options:
-
Cookies: More precisely known as "HTTP State Management Cookies" are used to keep information about what is going on with respect to a given client and user. However, the browser does not know anything about what a cookie means, it just stores it, and sends it back to the server(s) that are supposed to receive it. That a cookie is deleted (such as during logout) does not mean anything special to the client, and in fact a site need not delete a cookie to mark a user as logged out, it can just change a flag in its own database, without telling the browser.
-
no-cache: This Cache-Control response directive may very well be the most misunderstood parameter in all of HTTP, quite likely because of its name, and its use in requests. There seems to be a significant belief that a client must never store a document served with this flag, or produce it as a result of clicking on a link. That is not the function of this directive. RFC 2616 sec. 14.9.1 states:
no-cache
[...] a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. [...]
This only means that when you click on a link, the client must first ask the server "Has this been modified since I loaded it?", and if the server says "No, it is not modified", then it may show it to you.
This does not apply to history navigation, because "no-cache" defines the "expiration time" of the web page, nothing else.
-
must-revalidate: If "no-cache" is the most misunderstood parameter, "must-revalidate" may well be the most abused. According to RFC 2616 section 14.9.4:
must-revalidate
[...] When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. [...]
Opera may not quite follow the letter of this specification, but that is because of how the directive will normally be used. We treat this directive's presence as an indication that 1) the resource is expired (same as with "no-cache") and 2) during history navigation of secure sites a web page is revalidated before it is displayed to the user.
You probably noticed the "secure site" condition. The reason for this is the aforementioned abuse of the "must-revalidate" directive. Quite a lot of sites (I have personally seen a lot of PHP powered Wikis with this problem) are sending the cache-directive combination "no-cache, no-store, must-revalidate", which to Opera means "check every time the user asks for it, even during history navigation, and do not store it on disk, only in RAM while we have it".
What this means is that when you visit such as site (and all the directives are obeyed), then the browser will ask the server to validate the content (even statics) of each page (in some cases even images) every time, which results in very slow navigation. When we introduced "must-revalidate" the forums overflowed with "slow navigation" posts, and long-time Opera supporter non-troppo actually had to patch his Opera Wiki server to keep the problem at bay.
This abuse is the reason why must-revalidate is only obeyed for secure sites.
As indicated above, some web service providers consider failure to revalidate after logout to be a potential vulnerability for their system. This has caused them to invest quite a lot of effort into ensuring that the browser behaves as they want it to. They have not just used the above mentioned cache directives, but also various scripting technologies.
The major drawback of all the current systems is that they will most often increase traffic to the website using them, thus increasing the load on the servers, which means that to stay operative they need even more servers, which means higher cost of operating the site.
All of it (trouble, wasted money, complaints, etc.) because they are not able to tell the browser that the user has been logged out.
So, how can this be solved in a more economic and predictable manner?
A solution should have the following requirements:
- The user should be able to navigate history as normal while logged in, no revalidation should take place.
- Already loaded resources should not be refetched unless the proper functioning of the site requires it (for example, the account page should display the current amounts when the user click on a link going to the accounts summary page)
- When the user is logged out, all sensitive documents should be removed, and if they are still in history, they should be revalidated with the server before being displayed to the user.
This set of requirements indicates that the server should have a way to tell the browser
- That a page is part of the sensitive document group.
- When the user is logged out (or should be logged out).
I've recently written, and submitted to the IETF, a document describing a system that tries to solve these problems: Cache Contexts.
In this new system the server uses a Cache Context directive to tell the browser that the document(s) served are part of a specific group of documents, a "context", and when the context has ended its usefulness, the server can discard it, which will also tell the browser that the documents in the context should no longer be displayed to the user, unless they have been confirmed by the server.
The server can tell the client how long it should let the documents in the context live, and it may also connect the lifetime of a context to the lifetime of a cookie; when the cookie is deleted, all the documents in the context are deleted, too.
These are just some of the features offered by Cache Contexts.
If you are interested, it will be available from the IETF's
Internet Draft repository in a couple of days. If you cannot wait, a copy of it can be found
here.
If you have comments, suggestions, corrections, etc., feel free to discuss them here, or directly with me. General discussion of the proposal should take place on the IETF's HTTP Work Group
mailing list.
draft-pettersen-cache-context-00.txt