Basic CGI Security

Note about terminology: In the following, we use the term cracker to refer to the bad guys who try to break (into) your system. This kind of person is often called a hacker, particularly in the popular press. While we have no strong objection to the use of the word "hacker" in this more narrow, we use "cracker" simply to avoid the ambiguity.

CGI programming presents some unique security concerns not present in other forms of programming. Since this lesson deals primarily with security issues, you should infer the security icon on every paragraph here!

The first rule of thumb to remember about CGI security is to assume that nothing on your system offers any safeguards except what you build into your script. Here are some common security misconceptions, along with their corrections:

The person who administers the box the web server runs on is a genius and a security guru.
Reality: Ok... What if that person is fired? Quits? Dies? What if your company decides to switch from Apache to IIS and your security guru doesn't know squat about IIS? What if a junior admin changes some settings and doesn't tell his/her boss? You get the picture.
No one can get the source code to my script.
Reality: Sure, and that disgruntled employee down the hall is upset at being passed over for promotion. (Incidentally, the majority of corporate espionage incidents are inside jobs, committed by the company's own employees.) Maybe someone moves your script and creates a link from the old location to the new one. Oopsie! Someone with a browser might request the new location directly and a poorly configured web server can dump the source code. Maybe someone else wrote an insecure script which will allow a cracker to access the source code of your script. There's also the null byte hack (explained below) which is particularly insidious.
My script is password protected.
Reality: Sigh.

Password protection is the area of the one of the most common security weaknesses. Let's say you have a password field in your HTML. If it's not on a secure server, the password is sent in plain text and can be obtained with a packet sniffer. If you're using a .htaccess file with Basic authentication, your browser sends the username and password joined with a colon and then base 64 encoded. This data is then sent over in the Authorization: header. While this has the advantage of not sending the password in plain text, base 64 encoding is not terribly secure and can be cracked by a determined cracker. Additionally, some web servers have a bug which allows anyone to view the contents of a .htaccess file simply by entering a URL similar to the following:
http://somedomain.com/cgi-bin/path/to/script/.htaccess
Further, many users choose easily guessable passwords or can be tricked into revealing their password. At best, this could lead to having your account terminated. When a phisher acquires your login credentials, they can use your account to send spam. Naturally, ISPs take a very dim view of such activity, and it can be very hard to prove that you are not the perpetrator.
I only send sensitive data through hidden fields.
Reality:This attempt at security is so weak as to be no security at all, yet it seems to be a perennial favorite of people who don't know any better. Go to Altavista and type in type=hidden name=price (for some reason, Altavista finds better results here than Google) and see what pops up. Bingo! A whole bunch of shopping cart scripts which embed the price of an item in a hidden field. It's a trivial matter to copy the source and change the value of that hidden field to whatever price you want. Most scripts which allow prices to be changed like this don't verify that price! If, for some reason, you must pass sensitive data in hidden fields, you can't hide this data (unless you first encrypt it), but you can use a digest. Basically, you create a digest using all of the data which you are passing in the form, plus an extremely random key. Then, when the data is returned, recompute the digest with the same random key. If they match, the data has probably passed through unchanged. Of course, the security of this depends upon the randomness of the key.
The data I'm sending is more secure because I use POST instead of GET.
Reality: No, it's not. The only advantage to using a POST is that the POST data is not recorded in the web server logs, but GET data is. Both GET and POST are equally vulnerable to sniffing.

POST and GET have two different uses and shouldn't be used interchangeably. GET is for getting information (e.g. a search engine). GET requests should not alter information on the server. POST is for posting information, such as to a guest book. In fact, many browsers recognize the difference between the two. If you hit the refresh button on your browser, the browser may silently reissue a GET request but prompt you for confirmation on a POST, since a POST may very well alter the data on the server.

The above list is hardly intended to be exhaustive, but it does list some of the most frequently encountered misconceptions. To get a better handle on the issues involved with web security, check out the World Wide Web Security FAQ. This also has detailed information about safe CGI scripting and safe scripting in Perl. Now, on with the show!

Denial of Service Attacks

If you have rights to modify CGI.pm, there are a couple of quick changes to make to the module which will help avoid denial of service (DoS) attacks. Open the source code and look for lines like the following:

# Set this to a positive value to limit the size of a
POSTing # to a certain number of bytes: $POST_MAX = -1;

# Change this to 1 to disable uploads entirely:
$DISABLE_UPLOADS = 0;

The first thing to do is disable uploads. Change the appropriate line to $DISABLE_UPLOADS = 1;. Most of the scripts you will write do not require file uploads. However, CGI.pm allows them by default. File uploads are saved to a temporary file and it's relatively simple for a cracker to upload large files to your server if this is not disabled. This can have the effect of using up your disk and eventually crashing the server when it runs out of swap space. The above change will disable this. Then, if a particular script needs to process file uploads, you can include $CGI::DISABLE_UPLOADS = 0; to re-enable them. This still presents security concerns, but we'll cover those in a later lesson dealing with file uploads.

$POST_MAX is a bit trickier. This sets the maximum size of a POST in bytes (the maximum size of a query string used with the GET method is generally handled by the web server). You'll need to figure out how large you realistically want this amount to be and set this appropriately. For example, if you want to limit all posts to 50K, change this line to read $POST_MAX = 51_200; (50 * 1024). If necessary, you can override this in your scripts on a case-by-case basis by including $CGI::POST_MAX = $max_post;, where $max_post is the maximum posting size in bytes.

Unfortunately, for a variety of reasons you may not have direct access to CGI.pm. In that case, you can still do this by including the following two lines in the top of every script you write:

$CGI::DISABLE_UPLOADS = 1;         # Disable uploads
$CGI::POST_MAX        = $max_post; # Maximum number of bytes per post

Note that these values must be set prior to the instantiation of the CGI object (when using the object oriented interface) or before the first call to a CGI.pm method, when using the function oriented interface.

use CGI;

my $q = CGI->new; # WRONG!!!  CGI.pm has now parsed the data, but didn't see what you
                  # had intended for handling uploads and POST limits

$CGI::DISABLE_UPLOADS = 1;         # Disable uploads
$CGI::POST_MAX        = $max_post; # Maximum number of bytes per POST

User Input

Here's a good rule to remember: Always trust your users. Never trust their input. Always trust your users means just that: they're good folk. Most people who use your scripts actually have no malicious intent. They genuinely mean well and just want to go about what they are doing with no thought to how you've handled it in the background. Unfortunately, even well-meaning users can cause problems.

A user anecdote:

I once was testing my company's web site. In one field I entered data similar to the following: "Tom's Toys". Seems innocuous enough, but I crashed the CGI script which was running that page. It turned out that the script was using the data inside a generated SQL statement; in SQL, apostrophe in "Tom's" looked like the end of a field value. The SQL interpreter thought that "s Toys" was the continuation of the SQL, but recognized it as invalid. Further examination revealed that I could append my own SQL after any apostrophe and do lots of damage to the database.

The initial error was simply an example of a user entering unexpected data, but a wily cracker could use this knowledge for nefarious purposes. The moral: Never trust user input!

How do we deal with user input? Remember the -T switch which we said must be included at the beginning of all CGI scripts? That's what we're going to cover here. This puts Perl in "taint checking" mode. While this generally thought of as a tool to prevent damaging data from crackers from entering the system, it also allows you to better control your input and makes it less likely that a casual user might break your system. We'll cover how to handle taint checking later in this lesson.

Security Through Obscurity

One word: STUPID.

Security Through Obscurity (STO) relies on the following premise: "Since crackers don't know what security holes might be present on my system, they can't exploit them." Normally, analogies are dangerous in that they tend to oversimplify situations; but here is an analogy which explains this concept perfectly:

Robbers will ignore my house because the front door is closed and they don't know it's not locked.

Does that sound really stupid? Good. Do everything you can to make security as tight as possible. Do due diligence. In the post mortem analysis of a security incident, you don't want your work to be what they point the finger at.

One point of clarification: STO does not mean that you should just let people have any information they want. STO attempts to rely on luck. You hope that crackers don't find the loopholes. By paying proper attention to security, you can remain relatively confident that the loopholes are not there, but there is no need to give out too much information in the event that you missed something, or that a new vulnerability is discovered.

Theoretically, in a perfectly secure environment, you could let everyone know what web server you are running, the OS it is running on, the Perl version you are using, a full list of user names, etc. But that would be a Bad Thing. Play your cards close to your chest. One mistake you'll often find in CGI scripts is the following:

use CGI::Carp qw(fatalsToBrowser);

What that line does is display all errors to your browser in addition to writing them out to the log. You can use it liberally when you're writing and debugging CGI programs. However, when the program is ready and it's time to move it into production, some programmers do not remove it. The problem with that line is that if a would-be cracker manages to crash your script, all sorts of information shows up in the browser window. Since bugs, by definition, are unplanned, you have no idea what information you would be giving out. Lock your scripts down tight, but don't help potential crackers by giving users information they don't need. Even if your script is perfectly solid, you often have little control over what happens to it in the future.

Don't Do Things You Don't Understand

This is a biggie. Here's a common mistake in code where the programmer didn't appreciate the benefit of using strict. The programmer wrote a CGI script with a lot of subroutines in it. While many of the subs called one another, many of them could be accessed directly by the browser. The programmer set up a param called "process" and had the value of process set to the name of the subroutine to be executed. This is not necessarily a bad way to go about things, but the programmer decided to take a shortcut (example simplified for clarity):

my $process = $q->param( 'process' );
{
    no strict 'subs';
    &{$process}();
}

Rather than create a long list of sub calls to be executed depending upon the value of process, the programmer used a variable for a subroutine name. Many modules were included in this script and if a cracker figured out what was going on, he or she could have called any subroutine at will. There were literally hundreds of subs involved and stamping out the security holes would have been horrendous. When this was explained to the programmer, he was Clearly, he didn't understand this well enough to see the pitfall. Even the best of programmers can make such mistakes.

The code was fixed with something similar to the following:

my $unsafeProcess = $q->process( 'process' );
my $safeProcess   = $1 if $unsafeProcess =~ /^(\w+)$/;

securityLog( 'process', $unsafeProcess ) if ! defined safeProcess;

main()      if $safeProcess eq 'main';
order()     if $safeProcess eq 'order';
checkUser() if $safeProcess eq 'checkUser';

# and so on...

There were about 20 valid subs in the main script and it took a little more work to do it this way, but it was safe. It was also sloppy, but retrofitting security often is. It also illustrates another point: Security and convenience are inversely proportional to one another. You're just going to have to get used to that.

Don't Let User Data Near the Shell

This one get's a lot of people tripped up. Sometimes, it's almost impossible to avoid letting user data near the shell. This can result in terrible security holes. Imagine that you want to write a script allowing a user to view the contents of any directory. For some reason, rather than using opendir with readdir or File::Find, you've decided to open a pipe directly to the 'ls' utility. You have the following lines of code in your program:

my $LS  = '/usr/bin/ls';
my $dir = $query->param( 'dir' );

local *PIPE;
open PIPE, "$LS $dir |" or die "Cannot open pipe to $LS: $!";
my @stuff = <PIPE>;

If everyone plays by the rules and makes no mistakes, you have no problem. If a cracker stumbles across this, the cracker could easily enter something like `rm -fr /` for the "dir" parameter and you're looking for a new job.

Here's a slightly more insidious example. Imagine your boss comes to you and says "We need to let everyone have the ability to read the contents of .dat files from a browser."

You're conscientious, you know security, but you have a problem. A quick examination of file names reveals that just about every character ever invented has been used. Since you have little to no control over the file names, you'll have to allow just about anything through. However, you've decided that, to make it secure, the end user supplies the name of the file and you'll tack a .dat on the end. This way, the only files which can be opened and read are those which have that extension.

Consider the following code (the following illustrates what is known as "The Poison Null Byte"):

1.  #!/usr/bin/perl -wT
2.  use strict;
3.  use CGI;
4.
5.  # Do not run this script on a server connected to the 'Net
6.  # It is supplied as a bad example
7.
8.  my $cgi  = CGI->new();
9.  my $file = $cgi->param( 'file' );
10.
11. # Bad taint checking!
12. # This is, amongst other things, a deliberately incomplete list
13. # of shell metacharacters
14. my $data = $1 if $file =~ m#([^./\\`$"'&]+\.?[^./\\`$"'&]+)$#;
15.
16. $data .= '.dat';
17. my $userInfo;
18.
19. open FILE, "<$data" or die "Cannot open $data: $!\n";
20. {
21.     local $/;
22.     $userInfo = <FILE>;
23. }
24. close FILE;
25.
26. print $cgi->header,
27.       $cgi->start_html,
28.       $cgi->pre( $userInfo ),
29.       $cgi->end_html;

Hmm.... what's wrong with it? The author uses warnings, strict, and taint checking. Users need to be able to get at the contents of .dat files on the server, but filenames can be very unpredictable. Obviously, we need to let user data near the shell. The author, however, knows this is a security hole, so he conventiently supplies a list of shell metacharacters which he doesn't want. That's on line 14, which is a deliberately bad example of taint checking, (which we'll cover in the next section). Further, he appends a ".dat" (line 16) extension to guarantee that the user can only view files with said extension. Some people might think this looks pretty secure.

The list of shell metacharacters was deliberately left incomplete so you wouldn't be tempted to use this technique.

So what's the problem? Perl, as many of you know, is written in C. C recognizes the null byte (ASCII zero) as the end of a string. Perl does not. If Perl tries to make a system call (such as open FILE, $somefile) with an embedded null byte in the data, the data is passed to the C internals, which happily ignore any information from the null byte on. So how does that affect our script?

The script was renamed insecure.cgi, and tested through a local browser with the following URL:

http://localhost/cgi-bin/insecure.cgi?file=insecure.cgi%00loser!

See that embedded %00? Hmm... it's right after the script name. It passes our taint checking and Perl thinks it's trying to open insecure.cgi%00loser!.dat. However, the C underbelly tries to open insecure.cgi, and does! Then, the rest of the script happily dumps its source code to your browser.

The quick fix for this is to add a line after line 14:

$data =~ s/\x00//g;

That removes all null bytes from the string. However, the real fix is to avoid letting user data near the shell. How do we do that? One way would be to read a list of all .dat files from our target directory. Stuff those file names into a hash. Check to see if a hash key matching the user-supplied name exists and if so, open it. Yes, that's jumping through a lot of hoops to avoid letting user data near the shell. It's probably overkill in this case. But honestly, would you have known about the null byte problem? Now you do.

You may encounter circumstances when you are unavoidably forced to pass user data to the shell. We'll cover these as needed. You are strongly advised to read through the WWW Security FAQ to ensure that you understand the issues involved with this.

Taint Checking

You should be somewhat familiar with regular expressions. You'll need them for taint checking. You don't need to be an expert, but you should be familiar with the basics. If you are not, check out the regex tutorial in the standard Perl documentation, and perhaps follow that up with these tutorials on PerlMonks.

Perl automatically enters "taint mode" when you run a setuid or setgid program. You can also explicitly turn on taint mode by using the -T switch as we do in all of our examples. If a scalar contains any data which has been obtained from outside of the program, Perl considers the data 'tainted' until you untaint it. But what does this do? To quote from the security faq:

Once a variable is tainted, Perl won't allow you to use it in a system(), exec(), piped open, eval(), backtick command, or any function which affects something outside the program (such as unlink).

Actually, the above quote, while verbatim, is not quite accurate. Variables are never tainted. The data they contain are tainted. Consider the following code:

#!/usr/bin/perl -wT
use strict;
use CGI qw/:standard/;

my @files = qw/ revenue.csv gambling.txt cheat_the_irs.doc /;
$files[2] = param( 'irs' );

foreach ( @files ) {
    open OUT, ">> $_" or die "Cannot open $_ for appending: $!";
    # do stuff here
}

Perl will have absolutely no problem opening the first two files (if they exist), but the program will die on the third file because Perl received the filename from outside of the program and therefore tainted the data. In trying to open the file in append mode, Perl realizes that you are using tainted data to affect something outside of the program, and kills the script. The first two elements of the array are not tainted because they have been explicitly set with data from within the program. However, if you try to assign a tainted value to another scalar, the new scalar will also be tainted.

A scalar can only be untainted by extracting it as a substring with a regular expression. Untainting a scalar is simple, but can be filled with pitfalls. Let's consider a simple situation. You want a user to be able to open a file based upon a filename the user submits. If you limit this to a particular directory and specify that filenames can only contain letters, numbers, underscores and one period, the following regular expression can untaint the supplied filename:

my $unsafeName = $q->param( 'filename' );
my $safeName   = $1 if $unsafeName =~ /(\w+\.\w+)$/;
if ( defined $safeName ) {
    # proceed as normal
}

And a somewhat cleaner way of writing the untainting routine:

my ( $safeName ) = ( $unsafeName =~ /(\w+\.\w+)$/ );

Here's how that snippet works. An assignment to a scalar is generally in "scalar context" and an assignment to an array or a hash is in "list context." However, if you put parentheses around a scalar name when assinging, the assignment is once again in "list" context.

my $name   = param('name');  # scalar context
my @name   = param('name');  # list context
my ($name) = param('name');  # still list context!

That means, if whatever is doing the assigning is aware of context, it can return appropriate results. Regular expressions are aware of context and in list context the entire expression will return a list of values the backreferences (the parts with parentheses around them: (\w+\.\w+)) matched. By putting parentheses on both sides of the assignment, both sides get evaluated in list context, thus ensuring that the data is passed to $safeName.

With (\w+\.\w+), we've ensured that only the characters we specify are allowed to be in a filename. Further, we've used the end of line anchor ($) to make sure that we're not getting path data. That's really not terribly important in this example and could probably be left out, but we want to ensure that the regex matches exactly what we want. For example, the user could enter C:\windows\data.3\somefile_3.txt and the $safeName will be set to somefile_3.txt. Without the end of line anchor, we would have data.3 as the name of the file and have unpredictable results.

Don't be tempted to do the following:

my $unsafeName = $q->param( 'filename' );
$unsafeName    =~ /(\w+\.\w+)$/;

# Hey, how do we know that the previous regex successfully matched?

my $safeName   = $1;
if ( defined $safeName ) {
    # proceed as normal
}

This is a very common, and unfortunate, error. If you have already successfully untainted a variable, $1 will be defined. If $unsafeName =~ /(\w+\.\w+)$/ does not return a match, then $safeName will be set to the previous value of $1. That's probably not what you wanted and could be a large security problem.

You might be wondering what's wrong with the following regular expression, which was used above:

my $data = $1 if $file =~ m#([^./\\`$"'&]+\.?[^./\\`$"'&]+)$#;

Aside from the fact that it uses an incomplete list of shell metacharacters, it is also using negated character classes to figure out what it's trying to match. This is almost never a good idea. In case you don't recall, a negated character class has a caret (^) after the opening bracket and means "don't match anything in the brackets." For example, [^\da-z] means "match anything except digits and lower case letters."

The reason negated character classes are such a bad idea is you're trying to specify what you don't want. Usually, this is a far larger list than what you initially think. It only takes overlooking one character to put you in hot water. In the shell metacharacter regex above, the pipe | metacharacter (amongst others) has been left out. Whoops! If a file name has a pipe on the end, an open command will try to execute the file instead of open it. That's what led to the "read contents of directory" security hole shown above. Another example of the pipe problem is discussed in this PerlMonks thread.

It's far easier to specify what you will allow than what you won't allow. If you are too strict, you can evaluate your options and loosen up. If you specify what you won't allow, you only need to miss one little thing (like a null byte!) to have your site cracked.

There are certainly cases where you won't need to untaint incoming data, but we'll cover those as they arise. A safe rule of thumb: if you're not sure, untaint it.

Exercises

Here's an easy one to start with. What's wrong with the following snippet of HTML:
<input type="hidden" password="BOB">
Why would you use the following line in a CGI script:
```
$CGI::DISABLE_UPLOADS = 1;
```

Is the following safe or not? Please explain:

my $user     = $query->param( 'username' );
my $safeUser = $1 if $user =~ /(.+)$/;

How would you fix the following code:

my $unsafeName = $q->param( 'filename' );
$unsafeName    =~ /(\w+\.\w+)$/;
my $safeName   = $1;

Answers to Lesson 3 Exercises

Next Lesson: Creating Web Pages