Basic cookie handling

Cookies are poorly understood so let's take a bit of time to understand what cookies are and examine some of the implications of poor cookie management. We're going to spend a fair amount of time on how cookies are implemented and the security implications of this because it's very important that you understand what is going on. Cookies are very, very easy to get wrong.

As you know by this point in the course, HTTP is what is known as a stateless protocol. This means that each request is completely independent of other requests. The server handling the requests really has no way of knowing if the request coming in is a follow-up on a previous request. That's kind of like clicking "save" in a program and not having the program know what it's supposed to be saving.

With RFC 2109, an HTTP State Management Mechanism based on two new headers, Cookie and Set-Cookie, was introduced (and this was in turn based on an earlier Netscape proposal which is why you sometimes still hear the term "Netscape Magic Cookie".) These two headers allowed the user-agent (again referreed to as browser for the rest of the lesson) and server to exchange a bit of text which carried state information. This has been both a blessing and a curse.

In simpler terms, this just means that when you log into a web site, it now has a way of "remembering" who you are. Now let's look at the actual header before we consider the security implications.

Only servers set cookies. The header line looks something like this (see the RFC for far more information than you ever wanted to know):

Set-Cookie: name=value; expires=date; domain=domain; path=path; secure

Once the server send a cookie, the client receives it and probably stores it. (For security reasons, there are several cases where the client may refuse to store a cookie. Please see the RFC.) The location of where it's stored varies quite a bit depending upon the browser. When the browser visits the domain again, it consults its cookie and if it finds a cookie which matches the domain, the path, and is before the expiration date, it returns the cookie (there's more, but those are the high points.)

Cookie: name=value; name=value;...

You'll note that there is more than one name/value pair in this example. This is because a server may set more than one cookie at a time. However, because browsers are not required to store more than 300 cookies, this is considered bad manners as cookies which are not used as frequently will get bumped when you exceed your cookie limit. This is why those who surf the web heavily often find that sites they thought they had logged into have forgotten them.

Once the server receives the cookie(s) from the browser, it can use the name/value combinations to determine who the browser is. So let's see how this is typically handled. First, you visit the web site and the server (or more likely some process running on the server) sees that you don't have a cookie, so it creates a difficult-to-guess ID which is set in a cookie (broken over two lines for readability):

Set-Cookie our_user=6a204bd89f3c8348afd5c77c717a097;
expires=Friday, 01-Jan-06 00:00:00 GMT; path=/;

Since the domain was not specified, it defaults to the host/domain of the server. When the user returns, her browser sends back the following cookie:

Cookie: our_user=6a204bd89f3c8348afd5c77c717a097;

Now the server looks up the ID and sees that this user was here before. The value might be a key in a database table which also tells the user preferences, where on the site the user has been or other information which those running the server wish to maintain.

The Dangers

Now that you have a rough idea of how cookies work, here are some horror stories for you. They are all true.

Anecdote 1: a rather large, well-known company decided to create a web portal where people could log in and save their personal data. Unfortunately, this company decided that the value they would send in the cookie would be a number which was incremented by 1 (one) each time. Thus, subsequent cookies would look like this:

Set-Cookie id=1190872349876;
Set-Cookie id=1190872349877;
Set-Cookie id=1190872349878;
Set-Cookie id=1190872349879;
Set-Cookie id=1190872349880;

For most people, this was never a problem. However, for a wily cracker, this is an opportunity. All the cracker had to do was change her id up or down a digit or two and hijack someone else's session. The server has no other valid way of identifying you. Sure, they could identify your browser and your operating system (amongst other things), but that information is easily spoofed.

The fix:Let each id contain long strings of random digits; this makes them impossible to deduce from any patterns.

Anecdote 2: A certain teenager had a free account on a popular blogging site and had spent a lot of time writing about how how her parents don't understand her, and other typical teenager stuff. Her parents found the journal, whereupon they forced her to delete it.

How did they find out about the journal? She had been pretty careful to clear the browser cache and history. She neglected, however, to clear the cookies. The parents were able to determine the web site, since cookies always identify the site they're associated with. This mere fact would have been innocent enough... but there was one little problem: the cookie (in this case) also stored her username! This led the parents right to her journal.

Given how insecure the web tends to be, one can be excused for being reluctant to do business over the web. But one can also be excused for taking advantage of the convenience of conducting business this way, such as paying a credit card bill on line. An on-line credit card user decided to inspect the information stored in the cookie from the billing site. Here's what it contained:

This is absolutely horrifying. Anyone sniffing that user's traffic could have had a field day with this information. Of course, if someone has access to your computer (either by being physically present, cracking it, or running some sort of spyware on it), they could still potentially get this information. The user immediately sent an email to the bank alerting them to this issue. A couple of months later, the bank sent an apology and a note of thanks for alerting them to the issue, which they had subsequently fixed. They also sent a free toaster.

The fix: Don't do business with idiots like that, and never, ever send sensitive information via a cookie.

Programming cookies.

CGI.pm greatly simplies programming for cookies. First, let's create a simple web form.

<html>
  <head>
    <title>Cookie test</title>
  </head>
  <body>
    <h1>Cookie tracking</h1>
    <form method="post">
      <input type="text"   name="my_name" value="Enter your name" /><br />
      <input type="submit" name="save"    value="Save your name" /><br />
      <input type="submit" name="delete"  value="Delete your name" />
    </form>
  </body>
</html>

But now that we have the form, how do we save the sessions securely? You don't want to plow blindly ahead and at this point, we haven't learned anything about databases or how to store the user's information safely in a file. Further, how we do we create secure session IDs? Well, fortunately most of this is not a serious problem because CGI::Session handles most of this for us transparently. The implementation we will use will actually use a file to store the data, but you can also use DB_File or a proper database if you wish. Read the docs for more information.

The following script is complete. You can drop this into your web server's cgi-bin/ directory or equivalent and it should run just fine, with the possibility that you'll have to change the shebang line.

#!/usr/local/bin/perl -T
use strict;
use warnings;
use CGI;
use CGI::Session;
use HTML::Entities;
use File::Spec;

use constant DEFAULT => 'Enter your name';

my $cgi     = CGI->new;
my $session = CGI::Session->new(
    "driver:File",
    $cgi,
    { Directory => File::Spec->tmpdir }
);
$session->expire('+1h'); # this is for security!

my $cookie = $cgi->cookie(
    -name    => 'CGISESSID',
    -value   => $session->id,
    -expires => '+1h',
);

if ($cgi->param('delete')) {
    $cgi->param(name => DEFAULT);
    $session->delete; undef);
}
my $name = $cgi->param('name');
if (! $name || DEFAULT eq $name) {
    $name = $session->param('name');
}
$session->param(name => $name);

my $greeting = $name
    ? $cgi->p("Hello, ", encode_entities($name))
    : '';

print $cgi->header(-cookie => $cookie),
      $cgi->start_html(-title => 'Cookie test'),
      $cgi->h1("Cookie tracking"),
      $greeting,
      $cgi->start_form,
      $cgi->textfield(
        -name    => 'name',
        -default => DEFAULT,
      ),
      $cgi->br,
      $cgi->submit(
        -name  => 'save',
        -value => 'Save your name'
      ),
      $cgi->br,
      $cgi->submit(
        -name  => 'delete',
        -value => 'Delete your name',
      ),
      $cgi->end_form,
      $cgi->p('session name: ',encode_entities($session->param('name'))),
      $cgi->p('session id: ',$session->id),
      $cgi->end_html;