What is CGI?

If you need to know how to get CGI scripts up and running, see Appendix One: Getting your scripts to execute. That's usually the first hurdle most aspiring programmers need to overcome.

A simple answer: The Common Gateway Interface, or CGI, is a standard way of communicating information to a CGI program. CGI, along with the Hypertext Transport Protocol (HTTP) is the way your web browser and a web server tell each other what's going on while you're surfing. Perl is not CGI. Perl is just another programming language capable of interpreting CGI information. Many other languages, even COBOL, can be used to write CGI programs. Perl is exceptionally well suited for this, since the web, at its heart, is text-based. Perl excels at text processing and manipulation. However, the presumption here is that you want to get up to speed quickly; therefore, we won't go into detail on this process. Suffice it to say that if you are serious about CGI programming, you will need to learn this sooner or later. It's not really difficult, so check it out.

Your First CGI Script

I thought about providing you with downloadable examples, of the scripts, but I didn't. Get used to a lot of typing. You can read all you want, but until you actually get in there and start doing it, it's meaningless.

A bad example

Let's just jump write [sic] in! The first program you are about to see is coded in a style which is, unfortunately, all too common on the web. Here's what your first CGI script should not be!:

#!/usr/bin/perl

# Do not use this script!  It has no security holes,
# but it's bad programming practice.
# It should have the -w switch, use strict,
# and probably use the -T switch (not necessary
# in this example, but a good habit to get into)

print "Content-type: text/html\n\n";
print "<html><head><title>My First CGI Script</title></head>";
print "<body bgcolor=\"#ffffcc\">";
print "<h1>This is a pretty lame web page</h1>";
print "<p>";
print "Who is this Ovid guy, anyway?";
print "</body></html>";

This should produce a web page similar to the following:

Output of first CGI program

Actually, there's nothing really wrong with this program, but it's bad programming style. Throughout this course, we'll concentrate on doing things the right way. This means that we'll occasionally have things in our programs which, strictly speaking, may not be necessary. For instance, we'll use the -T switch on our shebang (this is the first line of the program. Typically it begins with a '#!', known as a sharp and a bang - hence, shebang) line on every script. This turns on a feature called taint checking. We'll discuss taint checking more thoroughly later, but for now, just know that it's used to protect against damaging information being entered into your program.

But wait a minute! We don't have any information coming into our program. Why bother with taint checking? Simple: it's good practice. How many times have you needed some quick code and just grabbed another program and modified it? Grab that code above and quickly modify it to process form data and you might just kiss your web site (or job) good-bye.

Learn this now and learn it well: they're coming for you. They want to destroy you and your site. They want to get you fired, they're going to steal your credit card data and probably make your milk turn sour. Are you paranoid yet? Good. Always, always, always pay attention to security concerns. You only need to overlook it once for all to be lost.

A good example

Now here's the script the way you'll usually see it written in this course:

#!/usr/bin/perl -wT
use strict;
use CGI;

my $query = CGI->new();

print $query->header( "text/html" ),
  $query->start_html(-title   => "My First CGI Script",
                     -bgcolor => "#ffffcc" ),
  $query->h1( "This is a pretty lame web page" ),
  $query->p( "Who is this Ovid guy, anyway?" ),
  $query->end_html;

What is all of that stuff? There's no HTML and it doesn't do much of anything.

In fact, this teaches you quite a bit about what you will be continuously exposed to throughout the rest of this course: good CGI programming practices. I'm going to break this script down line by line, so by the time we're done, you'll know what was happening.

The Analysis

Lets take a look at the shebang. It has some stuff in it which you may find unfamiliar.

#!/usr/bin/perl -wT

The #!/usr/bin/perl line is fairly standard (on a Windows box, it will resemble something like #!C:/perl/bin/perl.exe -wT). I've added two switches which should be on every CGI program -w and -T.

If you've been programming in Perl for a while, the -w should be familiar to you: it turns on warnings. These warnings won't stop your program at any point, but it will point out many potential problems with your code. It will warn you if a variable is only mentioned once (you probably mistyped a variable name), attempts to write to a read-only file handle, using arrays as scalars and many other things. This switch will save you many hours of debugging and should be used on all Perl programs, not just CGI programs.

Security Checkpoint The -T switch may be new to you. This switch enables what is known as "taint checking". Under certain circumstances, Perl will turn on taint checking without you asking it to do so. Don't rely on this. Most CGI programs will require taint checking and this switch should be explicitly set. Taint checking, in a nutshell, makes Perl assume that all data entering the program from outside of itself is tainted. If Perl uses this "tainted" data to perform any actions which it considers dangerous, it will kill the script. To use this data, you will have to "untaint" it. More on this later.

Security Checkpoint Incidentally, if you try to run scripts from the command line by typing perl somescript.cgi or ./somescript.cgi and you have the -T switch on your shebang line, you may see a warning saying 'Too late for "-T" switch at somescript.cgi line 1'. If this occurs, you will need to pass the -T switch directly to the Perl interpreter to run it from the command line:

perl -T somescript.cgi

use strict;

This instructs perl to use the strict.pm module. Where the -w switch was your best friend whispering to you that you're unzipped, the strict module is the nun in math class who raps your knuckles with a ruler when she sees you counting on your fingers. strict, as its name implies, is unforgiving. It does not allow you to use what it considers to be "unsafe" constructs. It instructs the compiler to terminate your program if it encounters any of these constructs. Because of the power of strict, I'll give a brief overview of what it does, as you're likely to face this issue repeatedly during your programming.

The strict pragma (that's Perlish for "compiler directive") covers three things: vars, subs, and refs.

  1. use strict 'vars';

    You must predeclare a variable, fully qualify it (e.g.: $package::foo), or import it. In practice, you'll find predeclaring variables to be your most significant issue.

  2. use strict 'subs';

    This prohibits you from using barewords to call a subroutine, unless the subroutine has already been declared (e.g., using "somesub" instead of "&somesub" or "somesub()"). Personally, I find the major effect of this is to inhibit Perl Poetry. However, some people will call a sub simply by using a bareword. Without going into too much detail, call subs as follows sub_name(optional argument list);. Use those parentheses and strict will leave your knuckles alone. There are other ways to call subs and we'll discuss these as appropriate.

  3. use strict 'refs';

    This pragma prohibits you from using symbolic references. If you already know what this means, I apologize for having you wade through all of this strict stuff. If you don't know what this means, suffice it to say that we'll deal with it as it comes up.

If, for some reason, you only wanted to use one of the strict pragmas listed above, you would simply type in the appropriate use strict 'whatever'; line. However, we'll have use strict; in all of our programs, which turns on all three of the pragmas above. I've met many programmers who complain about these restrictions, but rarely have I met a good programmer who complains about them. These will force you to write tighter, more robust code.

use CGI;

This uses Lincoln Stein's CGI.pm module. Don't write CGI scripts without this module.

Ever.

Occasionally, you'll hear people complain about how large this module is and how it slows things down. If you get to the point where this is a worry, you'll be looking into other options like mod_perl or FastCGI. Your CGI scripts themselves will probably be getting complicated enough that trying to write your own version of this module would be disastrous. People try invariably fail. Again, we'll get into this more later.

What this module does is allow you to focus on writing your programs and it handles all of the details of the CGI interface. Taking away this level of detail and handling it for you is known as "abstraction" and is one of the greatest strengths of this module. You just write your code and let CGI.pm handle the ugly stuff. We'll discuss CGI.pm more in the next lesson.

my $query = CGI->new();

This instantiates a new CGI object called $query. If this sounds intimidating, don't let it worry you. Basically, we'll use $query to access the methods and properties of the CGI module. It's really pretty simple and you won't have to learn object-oriented programming to use it. You can actually use CGI.pm without using the object method. For instance,

print $query->header( "text/html" );

is the same as

print header( "text/html" );

If you prefer the non-object (standard) method, you'll need to make some changes to the above script:

However, I recommend against this when you are first getting into CGI programming, or if you work with someone who may not be familiar with this module. One of the main benefits of using the object oriented method is that it's extremely clear where the method is located. For example, if you are debugging a large program and run across "print header( "image/pjpeg");", you may want to know what it does. However, if you see "print $query->header( "image/pjpeg" );", you'll know instantly that this is in the CGI module and from there it's merely a matter of reading a man page (or rereading this course!). If you are very familiar with CGI.pm and its methods, feel free to use the object-oriented method or the standard method, as you will. However, we'll use the object-oriented method almost exclusively in this course.

If you've heard a little bit about Perl's handling of object orientation, you may be concerned that this slows down your script. Typically, calling a method on an object runs about 30% to 50% slower than just calling a subroutine. So why are we using it here (we'll skip the usual OO pro/con arguments)? Because CGI.pm has to jump through a lot of hoops to export its functions. This often results in the object-oriented method running as fast or faster than the non-object oriented one.

A further advantage of the object-oriented method is that you can create subroutines with the same name as CGI.pm methods and not worry about collisions. While I don't recommend that you run out and write sub header {} any time soon, what if you're working with someone else's code? It's not that uncommon a problem.

print $query->header( "text/html" ),

Here, we are creating the header which is sent back to the web browser. The browser uses this header to determine what type of document you have sent. You can also set cookies, expiration dates for the page and other things with this. We'll cover some of these issues later. Also, you can skip the "text/html" part if you wish. That's the default for the header method.

For those who want to get under the hood a little, you've probably seen standard HTML documents. In fact, browsers will often render HTML documents which aren't wrapped in <HTML></HTML> tags. How does the browser know it's a web document? Because the first information which is sent to the browser is the header. It often looks like this:

Status: 200 OK
Content-type: text/html

<html><head><title>My First CGI Script</title></head>

<body bgcolor="#ffffcc">
Some html here...

That first line is the status code. This is information which the browser uses to determine what to do next. You can use this to redirect the user's browser to another URL, not update the page (useful if you want to submit information and keep the same page), or just let the browser know that everything is fine. There are other status codes and we'll cover them as necessary.

You can specify your own status, but it's not necessary. Your web server will happily provide it for you. The second line is the content type which tells the browser what type of document to expect. This is followed by two newlines. These two newlines let the browser know that the headers are over and the actual document (if there is one) is next. Headers often get more complicated, but we'll leave it at that for now.

$query->start_html(-title => "My First CGI Script" ... $query->end_html;

Yup, we're just slurping up all of the rest of that as we'll go into using CGI.pm more in the next lesson. For now, just trust us, it works. What follows is an alternate method of generating that web page using a what is known as a 'here' document.

#!/usr/bin/perl -wT
use strict;
use CGI;

my $query = new CGI;

print $query->header( "text/html" );
print <<END_HERE;
<html><head><title>My First CGI Script</title></head>
<body bgcolor="#ffffcc">

<h1>This is a pretty lame web page</h1>
<p>
Who is this Ovid guy, anyway?
</body></html>
END_HERE

This has the same result as the "improved" script above. First, we print the header, then everything between print <<END_HERE; and the final END_HERE (be sure to remember to leave the semi-colon off!) is printed. Many people prefer this and it does offer the advantage that it doesn't require learning a new syntax. However, it mixes HTML and code, which usually becomes messy after a while. Many web sites consist of scripts containing 90% HTML mixed in with a little bit of code. For just a small HTML document, a literal string (such as HERE document) is probably okay. In fact, in one of the exercises below, you'll create a CGI script using a here document. All you'll need to do is cut and paste the above code (possibly changing the shebang line to fit your system) and modify the HTML in the here document.

Later in this course, we will discuss the pros and cons of several different methods of producing HTML. We will demonstrate those different methods and occasionally change little things here and there so you can get used to some variations which you are likely to see in other programs.

Exercises

  1. What's wrong with the following script?
    #!/usr/bin/perl -wT
    use strict;
    use CGI;
    
    print $query->header( "text/html" ),
      $query->start_html(-title   => "My First CGI Script",
                         -bgcolor => "#ffffcc" ),
      $query->h1( "This is a pretty lame web page" ),
      $query->p( "Who is this Ovid guy, anyway?" ),
      $query->end_html;
  2. Why should you use the -w switch on the shebang line?
  3. This ones a little trickier: write a web page which displays the current time.
Answers to Lesson 1 Exercises

Next Lesson: Why use CGI.pm?