Creating Web Pages

Generally, Perl and HTML should be separated. Often, this is accomplished using a template system such as Template Toolkit. However, we occassionally have a need for a small program which is self-contained and can be distributed anywhere. This is usually seen in software packages offered by "script archives". These are usually easy to implement and often do not require other modules to be installed. The downside of them is that they are usually terribly written. One problem with them is they print the HTML directly or use HERE documents to format the HTML. The following will show you how to create a simple email list sign-up form and take full advantage of the HTML generating functions of CGI.pm.

As with creating everything from a small program to a large, robust web application, care should be taken to design the HTML pages you wish to implement. Rather than go through the entire design process here, we'll simply list the basic requirements of the web page and write some HTML. Once that is done, we can create the actual program which will reproduce this page.

We're going to start out with something small. Here, we'll just have a simple login page. The user enters his or her username/password combination and there will be a checkbox for "remembering the user" so the user has the option of skipping the login page in the future. We're not going to implement the actual login functionality here. We're just going to create a small script which generates the page and walks us through everything step-by-step.

You want to capture the following information:

Security Checkpoint In addition to discussing how to create this web page using CGI.pm's HTML functions, this lesson also discusses some very basic security issues with this type of login screen. This means that it may seem a little disjointed at times. But the need for security in web programming development cannot be overemphasized.

Here's what a simple web page which would capture this information might look like:

Log in to your account

Here you can log in to the Weird Sports mailing list archives. In theory, you will be able to change your user settings. In reality, however, you can't because this is just an example from a stupid CGI programming course.

User Name:
Password:
Remember my ID on this computer.


The HTML for this page might resemble the following:

<html>
  <head>
    <title>Log in to Weird Sports</title>
  </head>
  <body style="background-color: #ffffff;">
    <div style="color:#000000;
         font-family: Tahoma, helvetica, arial;"
         align="center">
      <h1>Log in to your account</h1>
      <p>Here you can log in to the Weird Sports mailing list
      archives.  In theory, you will be able to change your user
      settings.  In reality, however, you can't because this is
      just an example from a
      <a href="index.html">stupid
      CGI programming course</a>.</p>
      <form action="login.cgi"
            method="post"
            enctype="application/x-www-form-urlencoded">
        <table cellspacing="1"
               border="0"
               cellpadding="2"
               bgcolor="#000000"
               style="font:
               10pt;">
          <tr style="background-color:#CCCCCC">
            <td><strong>User Name:</strong></td>
            <td><input type="text"
                       size="30"
                       maxlength="30"
                       name="username"></td>
          </tr>
          <tr style="background-color:#CCCCCC">
            <td><strong>Password:</strong></td>
            <td><input type="password"
                       size="30"
                       maxlength="30"
                       name="password"></td>
          </tr>
          <tr>
            <td colspan="2"
                style="background-color:#CCCCCC">
              <input type="checkbox"
                     name="remember"> Remember my ID on this computer.
            </td>
          </tr>
        </table>
        <p><input type="submit"
                  value="Login"> <input
                  type="reset"></p>
      </form>
    </div>
  </body>
</html>

Let's go over this HTML carefully as there are a variety of tags with which you may not be familiar. First, let's take a close look at the <form> tag:

<form action="login.cgi"
      method="post"
      enctype="application/x-www-form-urlencoded">

The form tag tells the browser that this is the start of a form which will be used to submit data to a web server. The 'action' attribute's value tells the browser which resource the form is being submitted to. This can be a relative or absolute link, just link an anchor tag, an img tag, or any of a number of tags which identify a resource to the browser.

The 'method' attribute's value, for this form, is POST. For web-based forms, you will generally use a POST or a GET method. As mentioned previously in this course, the GET method will send the form's data in the query string (embedded in the URL) and the POST method will send the data in the body of the request, after the headers. If POST data is sent, the form of the data depends upon the 'enctype' attribute, which we will discuss next.

The 'enctype', attribute's value, for this form, is 'application/x-www-form-urlencoded'. For this enctype, you can actually leave off the attribute as this is the default encoding type for a web form. When this type is specified for POST data, the data is encoded just as it would be for the query string (name/value pairs with special characters converted to their hexadecimal equivalent, joined by and equals sign, with respective pairs joined by an ampersand). Then, after the headers are sent, the body of the document has the form data. There is another common 'enctype', however, which is 'multipart/form-data'. This is usually used for file uploads, as trying to encode the file data would be difficult, at best, and grossly bloat the size of the data sent. We'll cover file uploads in a later lesson.

Now, let's look at the 'input' tag:

<input type="text" size="30" maxlength="30" name="username">

The above tag will create an input box like the following:

The 'type' attribute tells the browser the type of input box to display. The 'size' and 'maxlength' attributes specify the size of the box and the maximum number of characters which may be typed in it. The 'name' attribute is the name which will be used to reference this data when it is sent to the CGI script.

The next tag is for the password:

<input type="password" size="30" maxlength="30" name="password">

That produces the following input box:

The main difference is that 'type' is now 'password'. This will cause the text which the user types to be obscured (usually represented as asterisks (*).

Security Checkpoint There are a couple of security issues which some programmers fail to take into account when dealing with the above form elements. First, the 'maxlength' attribute prevents the user from typing more than that many characters. However, a clever cracker can easily save the form, remove or increase the 'maxlength' attribute, and resubmit the form with as much data as he likes. If you are entering this data into a database and only 30 characters are allowed in the appropriate database field, you may very well cause the CGI script to break and behave unpredictably unless proper error checking is done.

Security Checkpoint For the password box, it's important to remember that the only thing the 'password' type does is ensure that someone peering over your shoulder cannot read the password you type in. The password is still sent clear text to the server, unless the form data is sent over a secure connection. Further, if someone is looking over your shoulder, they can still determine the number of characters in your password, thus making a brute force crack much easier.

The next form element is the checkbox:

<input type="checkbox" name="remember">

That line produces this:

By now, you should have a general understanding of what these attributes mean. There is an important difference in how these form elements are handled. For most form elements, if no value is supplied, the browser will send the name, followed by an equals sign, but with no value. For checkboxes, the name of the form element is not sent if it is not checked. Further, if a value is not specified, the value of 'On' will automatically be supplied.

The last two form elements are the 'Submit' and 'Reset' buttons:

<input type="submit" value="Login"> <input type="Reset">

That code produces the following:

Note that for this page, the 'Submit' button has been disabled.

The submit button is what actually causes the form data to be encoded and sent to the server. The reset button merely resets the form to its default state and plays no role in CGI programming.

Converting the HTML to CGI.pm functions

Clearly, the simple way of converting this document to a 'dynamic' web page is to put in a literal string (such as a HERE document):

#!C:/perl/bin/perl.exe -wT
use strict;
use CGI qw/:standard/;

print header;
print <<END_OF_HTML;
<html>
  <head>
    <title>Log in to Weird Sports</title>
  </head>

  <body style="background-color: #ffffff;">

  [ HTML goes here ]

  </body>
</html>
END_OF_HTML

You will hear arguments about whether it's good to use CGI.pm's HTML generating functions. Certainly, when building large web-based applications, they are innapropriate. For such things, literal HTML strings are also inappropriate. But what about for a small, standalone application? Should you bother with CGI.pm's HTML shortcuts? Yes, for a few reasons:

Some programmers balk at using these tags, and it's not really an "essential of the faith", so we won't spend much time on them. Learning them is not required to benefit from the rest of this course. But since we are of the opinion that they are a Good Thing at least in some common situations, we give a brief overview on their use.

The HTML functions can be used in a variety of ways. Let's take another look at the first input tag:

<input type="text" size="30" maxlength="30" name="username">

The name of this tag is "input" and it has several attributes: "type", "size", "maxlength" and "name". These attributes have values associated with them: "text", "30", "30" and "username", respectively. To print this tag with one of the HTML shortcuts, simply call the function with the same name as the tag and with the attributes and their values passed as an anonymous hash reference (a list surrounded by curlies '{}'). Attribute names should be preceeded by a dash:

print input({
    -type      => "text",
    -size      => "30",
    -maxlength => "30",
    -name      => "username"
});

It looks a bit strange at first, but it's pretty easy once you get the hang of it. Incidentally, the above code was using the function oriented interface to CGI.pm. If you are using the object oriented interface and have a reference to a CGI object in $cgi, the above code becomes:

print $cgi->input({
    -type      => "text",
    -size      => "30",
    -maxlength => "30",
    -name      => "username"
});

For clarity in this lesson, we'll stick with the function oriented interface. Now let's look at a paragraph tag:

<p>Are you still reading this?  Go out and play!</p>

Generating that tag is pretty easy:

print p( "Are you still reading this?  Go out and play!" );

Okay, but what if we want to print a paragraph tag which also has attributes? A good rule of thumb to remember is when you have opening and closing tags with text between them and the opening tags has attributes, simply supply the attributes as the first element in a list. For example, to print the following tag:

<p class="notice">Are you still reading this?  Go out and play!</p>

Use this syntax:

print p( { -class => "notice" }, "Are you still reading this?  Go out and play!" );

You can also nest tags, if you like. To create this:

<em><strong>Are you still reading this?  Go out and play!</strong></em>

Use this:

print em( strong( "Are you still reading this?  Go out and play!" ) );

Once again, nothing here is terribly complicated. Of course, we know that HTML can get very complicated. "td" tags are nested in "tr" tags which are nested in "table" tags which in turn might be nested in "form" tags. This can be very difficult to keep track of and this is why many programmers eschew the HTML shortcuts and stick with literal strings. "After all", they argue, "I already know HTML."

Given that, you won't have to know them in order to use the rest of this course. Even so, you will see them used in some code snippets. This overview has been given so that you will understand what you are looking at when you are reading through the rest of the lessons. If you would like to learn more about these functions, read about them in the CGI.pm documentation. If you wish to use these functions, make sure you read the documentation carefully. Some tags are not "self-closing" ("body" and "form" tags, for example). Other tags have upper-case function names to avoid conflict with built-in Perl functions (these tags are Select, Tr, Link, Delete, Accept and Sub).

Without further ado, here is the full program which will print the "Log in to Weird Sports" page:

#!C:/perl/bin/perl.exe -wT
use strict;
use CGI qw/:standard/;

print header,
      start_html( "-title" => "Log in to Weird Sports"),
      div( { -align => "center",
             -style => "color:#000000; font-family: Tahoma, helvetica, arial;"},
        h1( "Log in to your account" ),
        p( "Here you can log in to the Weird Sports mailing list archives. " .
           "In theory, you will be able to change your user settings. In reality, " .
           "however, you can't because this is just an example from a " .
           a( { -href => "index.html" },
                "stupid CGI programming course" ), "." ),
        start_form( { -action  => "login.cgi",
                      -enctype => "application/x-www-form-urlencoded",
                      -method  => "post" } ),
        table( { -bgcolor     => "#000000",
                 -border      => "0",
                 -cellpadding => "2",
                 -cellspacing => "1",
                 -style       => "font: 10pt;" },
          Tr( { -style => "background-color:#CCCCCC" },
            td( strong( "User Name:" ) ),
            td( input( { -maxlength => "30",
                         -name      => "username",
                         -size      => "30",
                         -type      => "text"} )
            ) # end td
          ), # end Tr
          Tr( { -style => "background-color:#CCCCCC"},
            td( strong( "Password:" ) ),
            td( input( { -maxlength => "30",
                         -name      => "password",
                         -size      => "30",
                         -type      => "password"} )
            ) # end td
          ), # end Tr
          Tr(
            td( { -colspan => "2",
                  -style   => "background-color:#CCCCCC" },
                input ( { -name => "remember",
                          -type => "checkbox"} ),
                " Remember my ID on this computer. ",
            ) # end td
          ), # end Tr
        ), # End table
        p( input( { -type  => "submit",
                    -value => "Login"} ),
           " ",
           input( { -type  => "reset"} ),
        ), # end p
        end_form,
      ), # End div
      end_html;

Don't be dismayed by the apparent length of the code; it only looks long because of the rather liberal use of line breaks in the formatting. And the actual HTML generated by this program will be mostly on one line. This is useful because omitting linefeeds from the transmitted content makes the document about 10 to 20% smaller, thus saving bandwidth and processing time. If, for debugging purposes, you want the transmitted HTML to be prettily formatted, try changing the "use CGI" line to the following:

use CGI::Pretty qw/:standard/;

CGI::Pretty is a subclass of CGI.pm which nicely formats your HTML. If you have problems getting it to work properly, make sure that you upgrade to the latest version of CGI.pm (the bundle from CPAN includes CGI::Pretty). Older versions of CGI::Pretty have been a bit buggy and have been known to strip attributes, amongst other things.

In Lesson four, part 2, we will take this form and actually make it do something!

Exercises

  1. Setting the 'maxlength' attribute of an input tag to '20' will not guarantee that a maximum of 20 characters will be sent to your CGI script for that form element. Why?
  2. What is CGI::Pretty used for?
  3. Write the CGI.pm HTML shortcut which will reproduce the following HTML (use object oriented syntax for this answer):
    <h1>Log in to my web site</h1>
    <p>Enter your username and password:</p>
  4. This one is a bit more difficult, but not terribly hard. Write the HTML shortcuts for this (use the function oriented syntax):
    <table border="1">
    
      <tr>
        <td>This is a table cell.</td>
        <td>This is another one.</td>
      </tr>
      <tr>
    
        <td>Are we there yet?</td>
        <td>I'm getting hungry!</td>
      </tr>
    </table>
Answers to Lesson 4 Exercises

Next Lesson: Reading Form Data