HTTP Cookies and Hard Problems in Computer Science

24 Feb 2011

There are only two hard things in Computer Science: cache invalidation and naming things

or


There are 2 hard problems in computer science: caching, naming, and off-by-1 errors




With HTTP cookies we get all three. Upgrading your cookie metadata such as the domain and security attributes can frequently lead to duplicate values being passed in and weird data shadowing problems (delete one value, and another shines through). And upgrading the format of the data in the cookie value is dangerous since there is no rollback with out severe data loss. However by treating the cookie formats/metadata as a row in a MVCC database, a lot of pain can go away. The analogy isn't quite exact, but helps illustrate the ideas.




Cache Invalidation -- Upgrading Cookies




Cookies are "the most distributed user database." Upgrading is somewhat painful as you don't know when someone is going back to your site with an old outdated version, or with a crappy browser. (Ok this isn't quite cache invalidation, but close enough. I can't directly cleanout everyone's old cookie on demand. I have to wait for them to show up.)




Lots of people put a version number in a cookie value to aid in doing an upgrade. This is certainly useful but if there is a screwup (frequently caused by one of the domain issues described below), you just over-wrote your old data. No going back now with out data loss.




Naming -- The Asymmetric Protocol




The HTTP Cookie protocol we all know and love is woefully asymmetric. As a server, you use Set-Cookie and provide the name/value, expiration, domain, the path, if it's secure or not, and if it's http-only or not, but the incoming Cookie header only has the name and value. You loose all the other stuff.

To make it more interesting is that you can get multiple values back for the same name. How so?




Set-Cookie: foo=bar; domain=www.client9.com; path=/
Set-Cookie: foo=goo; domain=.client9.com; path=/




can result in the client returning:




Cookie: foo=bar; foo=goo




(and perhaps not in this order). To make it more interesting, most web platforms provide the application with an associative array of cookies (for instance PHP's $_COOKIE), meaning, it's picking one of those cookies to present back to you. You have no way of knowing which one you are getting.




The same issue can occur also with the cookie path as well. However, I strong recommend you just set path to "/". It creates needless complication, and fine grain logic is best handled by the application. (>And another. While I'm not certain, a different type of problem occurs when switching a cookie from "plain" to secure only SSL.)




If you go this far, you might be thinking, "yeah, but why would ever use the same name twice for overlapping domains". It happens. You switch your canonical domain name from "www.client9.com" to "client9.com" or vice versa. Or the site is growing fast and people are slapping cookies everywhere sometimes using different cookie domains, and then at some point you need to consolidate. Or you starting using subdomains for something unanticipated. It happens.




And if it happens, it's tricky to fix. You get two (or more) cookies with the same name - you you aren't sure how to delete them unless you know exactly how they were originally set. Which by now is probably lost or buried in your source control log, somewhere.




Solution: Duplicate your Metadata




The solution to all of this is oddly simple looks wasteful and looks like it looks like it violates the DRY principal. And the cookie protocol is so wacky is hard to see that it's really a database row and you can MVCC on it (well, almost).




To fix the upgrade pain, you add a version in the name of the cookie. You are free to put another version in the value of the cookie too of course, but in the name is really important. When you upgrade the cookie value format, or when you change any of the meta data of the cookie (secure or not secure, httponly, the domain), you increment the version. This is just like writing a new row in a MVCC database.

On update you can leave the old cookie alone, and write the code to do the upgrade. If you launch and fail, no problem, you can rollback and the old data is still there. If it works, you can write your VACUUM code, which deletes the old cookie or the broken new cookie when it sees it. Or at some point they'll expire on their own.




Now to fix the duplicate values. Here we add to cookie name, the domain the cookie was set to, if it's secure and if's it http-only. We add all this since it segments the namespace so it's impossible for one to get duplicate cookies. It looks gross, but cookies aren't here to win beauty contests.




Set-Cookie: foo-v1-0-1-.client9.com; ....




There isn't anything special about this format, but here foo is the original name, v1 is the version, and the next two bits say if was set with secure (in this case no) or httponly (in this case yes), and finally the domain. Keeping track of secure, httponly isn't necessarily as the version could tell you that indirectly, but I find it useful for at a glance inspection. It also completely self-describes the cookie and provides everything you need to be able to delete the cookie correctly. Expiration is not encoded, but that's ok.




Off-By-One: If you are using PHP...




And of course if you are using PHP, an "undocumented feature" is turning "." into "_" in the cookie name. So the cookie above in $_COOKIE will be




foo-v1-0-1-_client9_com




Sigh.