Reading Form Data

As was mentioned in the previous lesson, you want to capture the following information:

Grabbing this data is simple. Merely use the CGI::param() method. For instance, if you have a form element named "foo", you merely use a statment like "my $form_foo = $q->param( 'foo' );". Pretty easy, huh? Now, let's take a look at the three form elements we need to capture for Weird Sports:

<input type="text"     size="30" maxlength="30" name="username">
<input type="password" size="30" maxlength="30" name="password">
<input type="checkbox"                          name="remember">

(Here, the attributes are aligned for easy reading, but this is not necessary.)

Here's the program which will read in these values:

#!/usr/bin/perl -wT
use strict;

use CGI qw/:standard/;
my $tainted_username = param( 'username' ) || '';
my $tainted_password = param( 'password' ) || '';
my $tainted_remember = param( 'remember' ) || '';

# the rest of the program goes here.

Now, wasn't that easy? As you recall, lesson 2 showed you how terribly confusing and buggy a hand-rolled CGI parsing routine could get. By using CGI.pm, we don't have to worry about those issues. No more tracking down obscure bugs or worrying about the limitations of our code. We can just get to work.... Almost. We're going to take that little snippet of code and turn this into a working application, but it will take more time than you might guess. Remember, security and convenience are inversely proportional to one another. It takes a little more work up front, but promises greater rewards down the road. It's like car insurance: you may grumble at the cost until you have an accident.

Security Checkpoint Because we are sending a password, this form should be run over SSL. Remember, the password field does nothing to encrypt your password. Instead, it merely hides the password from prying eyes standing near your computer. This information can still be sniffed if someone has access to part of the network over which your data travels. We'll cover more about this when we get to a later lesson and discuss cookies and session ids.

Using the Form Data

Security Checkpoint You've probably noticed that all of the variable names are prefixed with $tainted_. This is so there will be no confusion. This does mean that we'll be duplicating some data in our script, but that's okay. This is just for illustrative purposes, and it's far better to duplicate data than inadvertently open a security hole.

Now, how are we going to use this data? Obviously, we want to allow this person to log into the Weird Sports web site. This means we need to authenticate this user. Authentication is merely a fancy way of saying "is this person who they claim to be?" In reality, there is no way to authenticate someone electronically. You're just receiving bits of electricity over a network and hoping that the person at the other end isn't a bad guy. Authentication is merely a method of reducing the risk to an acceptable level. "Acceptable", of course, is highly dependant on the needs of your application. For the Weird Sports web site, the risk is minimal, as "authenticating" someone simply means that can access "members only" pages. In the real world, security is typically implemented at many layers throughout a system.

After determining that security isn't that big of a deal, and since you don't have a database handy, you've decided to do the following: every user will have a file, named after their user name, in which two pieces of information will be stored. This information is their password information, their session ID, and whether or not they want to have a permanent session ID stored on their computer in the form of a cookie. Let's tackle this one at a time.

Note: The actual use of sessions IDs and cookies will not be covered until lesson 5.

Before we figure out how to save a file with the same name as the user, let's take a look at our file structure for the Apache web server.

    image of folder. Apache
    image of folder. bin
    image of folder. cgi-bin
    image of folder. conf
    image of folder. data
    image of folder. htdocs
    image of folder. java
    image of folder. cgi_course

Now, if you're familiar at all with Apache, you know that there are a few other folders involved, but let's just take a look at what we have here. If we want to save files for users which contain data we don't want others to see, we should not put it in the htdocs folder or any subfolders of htdocs. No matter how hard we try to secure these files, sooner or later, someone is going to do something wrong and these files are going to be viewable by the bad guys. While this is not inevitable, as a matter of standard security practice, you should assume that it is.

In this case, let's go ahead and create a new folder ("directory", for those who are fans of real operating systems) called "users" and put it in the data directory. This is where we will store all of our user information.

    image of folder. Apache
    image of folder. bin
    image of folder. cgi-bin
    image of folder. conf
    image of folder. data
    image of folder. users
    image of folder. htdocs

Now that we have the folder, we need some way of opening the folder to read and store user information. Let's assume that the user is named "Ovid", his password is "youwish", and he doesn't want a permanent cookie. His data file might resemble this (without the line numbers - and remember, the second line is a session id):

[file: ../data/users/Ovid]
1: youwish
2: some_session_id
3: 0

The process of getting that is simple. We try and open a folder with this username and, if it exists, grab the data, verify that the password is correct, and check to see if they want a permanent cookie.

Security Checkpoint The session id is a critically important component of all of this. Some web sites make the mistake of storing the user's password in the cookie. Every time the user goes to a new page, the cookie data is checked against the actual password and, if they match, the user is considered to be authenticated. This is quick and easy, but it's also extremely dangerous. First, the password can be sniffed very easily. Second, even if this is run over a secure connection, if the site gets reconfigured, what happens if the secure connection goes away? This can actually happen. You might have your cookies set up to be sent only over SSL (not a bad idea), but then someone else could go in and disable that without you noticing. One (imperfect) mitigation is to use random session id's, but there are still other problems with that — session hijacking, for instance. But a discussion of that is beyond the scope of this course. You might argue that it still doesn't matter because you don't have any really sensitive information; but remember that most people reuse passwords. They won't be happy if their password is stolen.

Here's a naïve way of trying to get the information in question.

01:  #!/usr/bin/perl -wT
02:  use strict;
03:
04:  # 'taint' in the import list is currently a no-op.  See the documentation
05:  use CGI qw/:standard/;
06:  use constant USER_DATA => '../data/users/';
07:
08:  my $username = param( 'username' ) || '';
09:  my $password = param( 'password' ) || '';
10:  my $remember = param( 'remember' ) || '';
11:
12:  my $userfile = USER_DATA . $username;
13:  my $message = 'Bad password';
14:  open USER, "< $userfile"
15:      or display_page( "No user named $username was found" ), exit;
16:  chomp ( my ( $real_password, $sessionID, $remember ) = <USER> );
17:  close USER;
18:
19:  if ( $password eq $real_password )
20:  {
21:     $message = "Hello, $username.  You gave me a good password";
22:  }
23:
24:  display_page( $message );
25:  exit;
26:
27:  sub display_page
28:  {
29:      my $message = shift;
30:      print
31:          header,
32:          start_html,
33:          p( $message ),
34:          end_html;
34:  }

You could save that in cgi-bin (after changing the shebang line to point to the perl interpreter) and then use the following query string:

http://localhost/cgi-bin/bad_password.cgi?username=myname;password=youwish;remember=0

Now, all in all, this doesn't look too bad. However, it has several serious security flaws. Yet programs like this are being used all the time! Here's a rundown of the flaws:

Let's go ahead and fix these problems. The first is fairly simple. Change all error messages to the following:

Your username and password information did not match.  Check to
see that you do not have Caps Lock on, hit the back button, and try
again.

The username problem is a bit trickier. Generally, it is a bad idea to let user data near the shell. If you must do this, make sure that you explicitly state what you will allow. In this case, we assume that our users know they can only use letters, digits, and underscores in their username. We create a regular expression which will match this data and try again.

01:  #!/usr/bin/perl -wT
02:  use strict;
03:
04:  # use the function oriented interface
05:  use CGI qw/:standard/;
06:  use constant USER_DATA => '../data/users/';
07:
08:  my $tainted_username = param( 'username' ) || '';
09:  my $password = param( 'password' ) || '';
10:  my $remember = param( 'remember' ) || '';
11:
12:  my $username = '';
13:  my $message = 'Your username and password information did not match.'
14:              . 'Make sure your Caps Lock if off and try again.';
15:
16:  # Note the regular expression which states very explicitly what we will allow
17:  if ( $tainted_username =~ /^([a-zA-Z\d_]+)$/ )
18:  {
19:      $username = $1;
20:  }
21:  else
22:  {
23:      display_page( $message );
24:      exit;
25:  }
26:
27:  my $userfile = USER_DATA . $username;
28:
29:  open USER, "< $userfile" or display_page( $message ), exit;
30:  chomp ( my ( $real_password, $sessionID, $remember ) = <USER> );
31:  close USER;
32:
33:  if ( $password eq $real_password )
34:  {
35:      $message = "Hello, $username.  You gave me a good password";
36:  }
37:
38:  display_page( $message );
39:  exit;
40:
41:  sub display_page
42:  {
43:      my $message = shift;
44:      print
45:          header,
46:          start_html,
47:          p( $message ),
48:          end_html;
49:  }

That's much better, but we still need to deal with the password. The technique we'll show you for handling passwords does not do much to improve security with the current program. However, as you delve deeper into programming and start storing passwords in databases, it can be quite a help. We're going to use the Digest::MD5 module to create an apparently random 'digest' (that is, a condensation, or distillation) of the password. This digest will be stored in the file and a new digest will be created from the user supplied password and these digests will be compared. If they match, we know we have a good password. You'll want to read the Digest::MD5 documentation for full information about this module.

A basic method of using Digest::MD5 is as follows:

use Digest::MD5 qw/md5_base64/;
my $digest = md5_base64( $password, $salt );

The first line asks that that md5_base64 method be exported to your namespace. The second line takes the password and a salt and generates a 22 character digest of that information. The digest will always be the same for a given set of information and it is mathematically unlikely that it will be duplicated with different sets of information.

The salt is something with which people sometimes have difficulty. The salt should be a random sequence of characters which does not change. One programmer asked why his password function wasn't working; it turned out that he was generating a random salt every time! Now generating a different salt for every password is not a bad idea (though it's difficult to manage this data), but he was generating a different salt every time he tried to access the same password. Needless to say, this does not work.

So what's the salt for? If a cracker manages to grab your password file, he or she should never see the passwords. If you are on a UNIX-like system, take a look at your /etc/passwd or, if you're using shadow passwords, the /etc/shadow file. You won't see a single password because they typically use a similar method to create one-way digests for passwords. This digest cannot be reversed to get the original data. However, if a cracker gets the file, the cracker might take a program which generates digests (they're pretty easy to write) for common passwords and see if any of them match. The salt ensures that there is a very difficult-to-guess component of the digest, thus making such hacks difficult to pull off. Of course, if someone can actually grab the password files, you have plenty of other problems to worry about, but security is defense in depth. Don't skimp on it!

How do we get the salt? Well, let's create another folder in the "data: directory and call it "config". This is where we will store program configuration data.

    image of folder. Apache
    image of folder. bin
    image of folder. cgi-bin
    image of folder. conf
    image of folder. data
    image of folder. config
    image of folder. users
    image of folder. htdocs

We're not going to store this data in the "conf" folder because that is Apache configuration. Instead, we create a special file called "weird_sports" and we'll put our configuration information there. To get this information, we're going to use a special form of the do function which allows us to "do" a file. This will evaluate the Perl code in the file and return the results. It's similar to use or require, but it's fairly lightweight. First, we want to create a data structure in the config file. We'll use a hash ref and store the results in a scalar. Here's the config file:

{
    salt  => ')a8*!--&',
    users => '/Program Files/Apache Group/Apache/data/users/'
}

Note how apparently random the salt is. Second, you'll see that the path to the user information is here because, if we're going to use a configuration file, we may as well use it. If we ever feel the need to move the user's information, we merely update this file and all programs will have the new configuration information. To use it, we merely include my $config = do( $config_file ); Also, before we change our program, we need to generate the digest for the password. The following one-liner does the job:

perl -MDigest::MD5=md5_base64 -e 'print md5_base64("youwish",")a8*!--&")'

This generates the following digest: +W4GC34W9VKFXY5J3PFGrg. We just add that line to the user file, and we're ready make our program changes.

[file: ../data/users/Ovid]
1: +W4GC34W9VKFXY5J3PFGrg
2: some_session_id
3: 0

Here's the revised login program:

01:  #!C:\Perl\bin\perl.exe -wT
02:  use strict;
03:  use CGI   qw/:standard/;
04:  use CGI::Carp   qw/fatalsToBrowser/;
05:  use Digest::MD5 qw/md5_base64/;
06:
07:  use constant CONFIG => '../data/config/weird_sports';
08:
09:  my $config = do( CONFIG );
10:
11:  my $tainted_username = param( 'username' ) || '';
12:  my $user_digest      = create_digest( param('password'), $config->{salt} );
13:  my $remember         = param( 'remember' ) || '';
14:
15:  my $username = '';
16:  my $message = 'Your username and password information did not match.'
17:              . 'Make sure your Caps Lock if off and try again.';
18:
19:  if ( $tainted_username =~ /^([a-zA-Z\d_]+)$/ )
20:  {
21:      $username = $1;
22:  }
23:  else
24:  {
25:      display_page( $message );
26:      exit;
27:  }
28:
29:  my $userfile = $config->{ users } . $username;
30:
31:  open USER, "$userfile" or display_page( $message ), exit;
32:  chomp ( my ( $real_digest, $sessionID, $remember ) = <USER> );
33:  close USER;
34:
35:  if ( $user_digest eq $real_digest )
36:  {
37:      $message = "Hello, $username.  You gave me a good password";
38:  }
39:
40:  display_page( $message );
41:  exit;
42:
43:  sub display_page
44:  {
45:      my $message = shift;
46:      print
47:          header,
48:          start_html,
49:          p( $message ),
50:          end_html;
51:  }
52:
53:  sub create_digest
54:  {
55:      my $password = shift || '';
56:      my $salt     = shift;
57:      return md5_base64( $password, $salt );
58:  }

Line 9 is where we get our configuration information. Line 12 creates a digest from the user-supplied password, using the salt from the configuration file. Line 29 uses the path from the configuration file to determine the user file and line 35 is the test to see if the computed digest matches the stored digest. Lines 53 through 58 are the subroutine to compute the digest. This is in a separate subroutine because, in reality, this would be in a security module so that it can be shared by many programs.

Exercises

  1. If the passwords can't be seen by anyone over the 'net, why do we care about whether or not they are stored in plain text?
  2. What is a 'salt' used for?
  3. Why do we store user information in a separate folder which is not in the htdocs path?
  4. Write the code which will read in the following form data. Be careful! This one is a bit tricky. But if you've read all of the CGI.pm documentation, it's pretty easy.
    <form>
    <input type="text" name="foo" value="bar" />
    <select name="sport">
      <option>Water Balloon Shot Put</option>
    
      <option>17-legged race</option>
      <option>Cement overcoat swimming</option>
    </select>
    <p>Choose your favorite types of wrestling:</p>
    
    <input type="checkbox" name="wrestling" value="mud" /> Mud<br />
    <input type="checkbox" name="wrestling" value="jello" /> Jello<br />
    <input type="checkbox" name="wrestling" value="natural" /> Au naturale<br />
    
    </form>
Answers to Lesson 5 Exercises

Next Lesson: Easy CGI Programs