Serving Perl Pages to the Web - Interacting with MySQL Using Perl - Learning MySQL (2007)

Learning MySQL (2007)

Part V. Interacting with MySQL Using Perl

Chapter 18. Serving Perl Pages to the Web

In this chapter, we’ll see how to write simple web database clients using Perl. Web applications written using Perl can take advantage of the Apache module mod_perl and the HTTP::Mason library for robust and high-performance web sites. We don’t discuss these in detail in this book. We should note that unlike Perl, PHP was designed from the start to be used for scripting web pages and is probably a better choice for any major new web application.

In Chapter 2, we saw how to install the Apache web server on a Linux system, and the XAMPP package on Windows and Mac OS X. We can make our Perl scripts accessible from a web server by placing them in a location that the web server can access.

On a Linux system, the scripts should be placed in the cgi-bin directory under the web server document root, typically /var/www/cgi-bin or /usr/local/apache/htdocs/cgi-bin. For a Windows system, the scripts should be placed in C:\Program Files\xampp\cgi-bin. For a Mac OS X system using XAMPP, use /Applications/xampp/htdocs.

Depending on the system configuration, you may need to have system root or administrator privileges to write files to this location. You can assume these privileges under Linux by typing su -, under Windows by logging in as a user with Windows Administrator privileges, or under Mac OS X prefacing commands with the sudo keyword.

Let’s write a small script to generate the following HTML page:

<html>

<head>

<title>My first Perl CGI script</title>

</head>

<body>

<b>Hello, World!</b><br />

This is <b>very</b> interesting!

</body>

</html>

We can use the print comand to generate the HTML page. Since the web server sends the script output directly to the browser, we need to add a bit of extra information to tell the web browser to expect HTML text, rather than, say, a JPEG image:

print "Content-type: text/html; charset=UTF-8\n\n";

The charset tells the browser that the server will send text using the UTF-8 character set.

Example 18-1 is the complete script to generate the HTML page.

Example 18-1. A first CGI Perl script that just displays some text

#!/usr/bin/perl

print "Content-type: text/html; charset=UTF-8\n\n";

print "

<html>

<title>My first Perl CGI script</title>

<body>

<b>Hello, World!</b><br />

This is <b>very</b> interesting!

</body>

</html>

";

For CGI scripts to work properly, they must print the Content-type line before sending any other data to the browser. Save this file as HelloWorld.cgi.pl in the appropriate location under the web server document root as discussed earlier.

The CGI approach requires the web server to execute the program and display the results; this means that the web server should have the necessary access permissions to read and execute the Perl scripts. On Linux or Mac OS X systems, permissions are assigned by three categories:

§ The user who owns the file (user)

§ Users in the group that the file is in (group)

§ Everyone else (other)

The Apache web server typically runs as the user nobody, who isn’t in any group, so if you want the web server to be able to run a file, you’ll need to give permission for everybody (the o permission, which stands for “other”) to read and execute the script.

For example, you can set appropriate permissions for the file HelloWorld.cgi.pl by typing the command:

$ chmod u=rwx,g=rx,o=rx HelloWorld.cgi.pl

We discuss permission settings in Restricting access to files and directories” in Chapter 2. The permissions for the XAMPP web server directories on Windows are less stringent, and a nonprivileged user can place files in the cgi-bin directory for delivery by the server.

Ensure your web server is running, then open the file in a web browser. On a Linux system or a Windows system using XAMPP, the script can be accessed at http://hostname/cgi-bin/HelloWorld.cgi.pl, while on a Mac OS X system using XAMPP, the address to use ishttp://hostname/xampp/HelloWorld.cgi.pl. The hostname is the name of the computer containing the scripts. If the web server is on the same computer as your web browser, you can use the value localhost for the hostname, so the address to use would be http://localhost/cgi-bin/HelloWorld.cgi.pl on a Linux system, and http://localhost/xampp/HelloWorld.cgi.pl on a Windows or Mac OS X system using XAMPP.

If your browser reports that you don’t have authorization to access the page, you should check the permission settings for the file or the directory it’s in. It’s often helpful to check the Apache error logfile; we describe how to find this file in The Apache Error Log” in Chapter 2. Open the error logfile in a text editor and look near the bottom; you might find a line such as this one:

[Thu Jun 29 02:26:35 2006] [error] [client 127.0.0.1]

Options ExecCGI is off in this directory:

/var/www/cgi-bin/mysql.cgi.animals.popup.pl

This means that Apache has not been configured to allow CGI scripts to be run in this directory. The solution is to put the scripts in a directory where CGI scripts are allowed or to configure your server to allow CGI scripts to run in the directory you want. To do the latter, you need to create a new entry in the Apache configuration file; we describe how to locate this file in The Apache Configuration File” in Chapter 2. For example, to allow CGI scripts to be executed in the directory /var/www/cgi-bin, you would write:

<Directory "/var/www/cgi-bin">

AllowOverride All

Options ExecCGI

Order allow,deny

Allow from all

AddHandler cgi-script.pl

</Directory>

Back to our script. This first web page was not dynamic; for a slightly more complex example, let’s write a CGI script that connects to the MySQL database and lists the animals in our pet database, as shown in Example 18-2.

Example 18-2. A CGI Perl script that lists animals from the MySQL database

#!/usr/bin/perl

use strict;

# Connect to the MySQL server, run the query, store the result

use DBI;

my $dbh=DBI->connect("DBI:mysql:host=localhost;database=AnimalDB",

"the_username",

"the_password",

{PrintError=>0, RaiseError=>1});

my $results = $dbh->selectall_hashref('SELECT * FROM Animals', 'Name');

$dbh->disconnect();

my $result=

"Content-Type: text/html; charset=UTF-8\n\n".

"<html>".

"<head>".

"<title>List of Animals</title>".

"</head><body>".

"<h1>Pet roll call</h1>".

"<table border='true'>".

"<tr align='CENTER' valign='TOP' bgcolor='LIGHTGREEN'>".

"<th>Name</th><th>Count</th></tr>";

foreach my $Name (keys %$results)

{

$result.=

"<tr>".

"<th align='left' bgcolor='SKYBLUE'>$Name</th>".

"<td bgcolor='PINK'>".$results->{$Name}->{Count}."</td>".

"</tr>";

}

$result.=

"</table>".

"</body></html>";

print $result;

Here, we build up the HTML content as a string in the $result variable and print this string out at the end of the script. We’ve made liberal use of whitespace to help make the script clearer. If it’s still daunting to you, try replacing the variable $Name with an animal’s name, and $results->{$Name}->{Count} with a number. The results will then be essentially HTML markup, with some Perl wrapping—for example:

<tr>

<th align='left' bgcolor='SKYBLUE'>cats</th>

<td bgcolor='PINK'>2</td>

</tr>

The Perl CGI Module

The Perl CGI module has some helpful functions to simplify generating common snippets of HTML. We can enable all these by modifying our use statement to:

use CGI ':all'

We can then generate HTML elements by calling the corresponding function. For example, we can include text within a level-one heading tag pair (<h1>text</h1>) by writing h1("text").

Many of these functions take attributes that are reflected in the generated HTML. For example, to generate the tag <th align="LEFT" bgcolor="SKYBLUE">, we would write:

th({-align=>"LEFT", -bgcolor=>"SKYBLUE"}, $Name).

Example 18-3 rewrites our our previous example using CGI functions.

Example 18-3. The CGI Animal list script rewritten with CGI functions

#!/usr/bin/perl

use strict;

# Connect to the MySQL server, run the query, store the result

use DBI;

my $dbh=DBI->connect("DBI:mysql:host=localhost;database=AnimalDB",

"the_username",

"the_password",

{PrintError=>0, RaiseError=>1});

my $results = $dbh->selectall_hashref('SELECT * FROM Animals', 'Name');

$dbh->disconnect();

# Prepare and display the results in HTML format

use CGI ':all';

my @AnimalsDataArray;

foreach my $Name (keys %$results)

{

my $AnimalsDataArrayRow =

th({-align=>"LEFT", -bgcolor=>"SKYBLUE"}, $Name).

td({-bgcolor=>"PINK"}, [$results->{$Name}->{Count}]);

push @AnimalsDataArray, $AnimalsDataArrayRow;

}

my $result=

header(-type=>"text/html", -charset=>'UTF-8').

start_html(-title=>"List of Animals", -encoding => 'UTF-8').

h1("Pet roll call").

table(

{-border=>'true'},

Tr({-align=>"CENTER",-valign=>"TOP", -bgcolor=>"LIGHTGREEN"},

[th(['Name','Count'])]),

Tr([@AnimalsDataArray])

).

end_html;

print $result;

Instead of printing out the table rows one by one, this example defines the array $AnimalsDataArray, and uses the push command to append each row of data to this array. We build up the $result string using functions provided by the CGI module and place @AnimalsDataArray in a table row. Finally, we print the whole $result string.

The charset parameter of the header() function and the encoding parameter of the start_html() function are optional, and are used to specify the character encoding that’s used in the document. Together, they help the browser determine the appropriate fonts to use to correctly display the page. If you omit these parameters, the default value ISO-8859-1 will be used.

Note that the CGI function to create a table row is Tr() rather than tr(); this is to avoid confusion with a completely different “transliteration replacement” function tr that is available by default in Perl.

Processing User Input

User interaction with a web application is typically a two-way affair; the system displays content to the user, who provides input, such as through a web form, that the system uses in further processing. For example, an online store application displays items for sale; the user chooses items and enters purchase information, and the system displays a receipt.

For our pet roll call example, we could allow the user to enter the name of an animal. The system could then process this information and return the number of animals of that type. This requires two web pages. First, a form is displayed to accept user input. When this form is submitted, a second page is displayed with the query result.

You can generate an HTML form that passes the submitted values to the process_form.pl file using the HTTP POST method by writing:

print

start_form(-action=>"process_form.pl", -method=>'POST').

...form content....

end_form;

Example 18-4 generates a web form with a simple text input field.

Example 18-4. A CGI Perl script that generates a simple form

#!/usr/bin/perl

use strict;

use CGI ':all';

print

header(-type=>"text/html", -charset=>'UTF-8').

start_html(-title=>"Search Page", -encoding => 'UTF-8');

if(param())

{

print "<br />The string you entered was: '".param('query')."'";

print "<br />Dumping all the submitted data...";

foreach my $Name (param())

{

print "<br /><b>$Name: $Name);

}

}

else

{

#start_form can take -action, -method...

print

start_form.

"Query String: ".

textfield(-name=>'query', -value=>'type your query here').

submit(-name=>'submit_button', -value=>'Submit query').

end_form;

}

print end_html;

The param() function can access data submitted through a web form. Our script checks to see whether the param() function returns a nonzero value—that is, whether any data has been received. If the function returns a nonzero value, we know that the form has been submitted and can then access the submitted data. Here, we print out the value from the form’s query field, which we can access as param('query'). For good measure, we then iterate through each submitted item using the foreach loop and print the name and value of the field.

The generated web page would look something like this (we’ve added some whitespace for readability):

<!DOCTYPE html

PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">

<head>

<title>Search Page</title>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

</head>

<body>

<br />The string you entered was: 'type your query here'<br />

Dumping all the submitted data...<br />

<b>query:</b> type your query here<br /><b>submit_button:</b> Submit query

</body>

</html>

We’ve started a new line before Dumping and <b>query:</b> for readability; you won’t see these in the output of the program if you run it.

If you have a relatively small number of possible values, it’s probably more helpful for the user if you generate a drop-down list populated with values from the database. The CGI module has a useful popup_menu function that generates a drop-down list. To use this, you need to first load the list entries into an array:

@results = @{ $dbh->selectcol_arrayref ("SELECT Name FROM Animals")};

and then pass this array to the popup_menu function’s values parameter:

print

start_form(-action=>"process_form.pl", -method=>'POST').

"Animal Name: ".

popup_menu(-name => "Name", -values => \@results).

submit.

end_form;

The resulting form would look like this:

<form method="post" action="http://localhost/cgi-bin/mysql.cgi.animals.popup.pl"

enctype="multipart/form-data">

Animal Name:

<select name="Name" tabindex="1">

<option value="cats">cats</option>

<option value="dogs">dogs</option>

<option value="emus">emus</option>

<option value="fish">fish</option>

</select>

<br />

<input type="submit" tabindex="2" name="submit_button" value="Submit query" />

</form>

Figure 18-1 shows a screenshot of this form.

The animals form using the CGI pop-up menu

Figure 18-1. The animals form using the CGI pop-up menu

Using One Script for the Form and for Processing

It’s convenient to write a single script to generate both the form and the results web pages. The script tests whether there is any form data. If there isn’t any form data, the form is displayed; otherwise, the results are shown. In such a script, the value used for the form action field will be the name of the script itself. For example, the script query.pl would have:

start_form(-action=>"query.pl", -method=>'POST')

Rather than hardcoding the filename into the program code, we can use the url() function to automatically provide the address of the current script:

start_form(-action=>url(), -method=>'POST')

Note that the url() function provides the absolute URL of the current script, for example:

http://localhost/cgi-bin/query.pl

while the related self_url() function returns the URL of the script, together with the query string that preserves entered form information—for example:

http://localhost/cgi-bin/popup_menu.pl?Username=saied;Host=wombat.cs.rmit.edu.au

Example 18-5 is a single script to display a drop-down list of animal names, and display the count of the animal that the user selects.

Example 18-5. A CGI Perl script that generates a form with a drop-down list

#!/usr/bin/perl

use strict;

# Connect to the MySQL server, run the query, store the result

use DBI;

my $dbh=DBI->connect("DBI:mysql:host=localhost;database=AnimalDB",

"the_username",

"the_password",

{PrintError=>0, RaiseError=>1});

# Prepare and display the results in HTML format

use CGI ':all';

print header(-type=>"text/html", -charset=>'UTF-8');

my @results;

if(!param())

{

@results = @{ $dbh->selectcol_arrayref

("SELECT Name FROM Animals ORDER BY Name")};

print

start_html(-title=>"Animal Selection Page", -encoding => 'UTF-8').

h1("Select Animal").

p("Select the animal name from the list: ").

start_form(-action=>url(), -method=>'POST').

"Animal Name: ".

popup_menu(-name => "Name", -values => \@results).

br().

submit(-name=>'submit_button', -value=>'Submit query').

end_form;

}

else

{

my $Query="SELECT Count FROM Animals where Name='".param("Name")."'";

@results = @{ $dbh->selectcol_arrayref ($Query)}

or die("Problem with query $Query: ".$DBI::errstr);

print

start_html(-title=>"Animal Counts Page", -encoding => 'UTF-8').

h1("Query result").

"The count of ".

b(param("Name")).

" is: ".

@results;

}

$dbh->disconnect();

print

hr().

a({-href=>"http://www.ora.com"}, "O'Reilly Media").

end_html;

The bulk of this script consists of two blocks executed on different runs of the script. The first block follows the if(!param()) statement; this block runs when the user first calls up the URL without parameters, and it simply displays the form with the initial animal listing. The second block follows the else statement; this block runs after the user has filled out the form and submitted it.

The page includes a horizontal rule (hr) and link to the O’Reilly Media home page at the bottom.

A Note on mod_perl

The mod_perl Apache module moves the processing of Perl scripts into the Apache web server itself. This has two advantages. First, it’s more efficient, because the Perl interpreter doesn’t need to be started each time a script is called. Second, you don’t need to include the path to the Perl interpreter on an initial #! line at the top of each file.

We won’t go into the details of mod_perl here, but we recommend you use it for any production site that uses Perl CGI scripts running on an Apache web server. You can find more information on this at perl.apache.org and in particular on thehttp://perl.apache.org/docs/2.0/user/intro/start_fast.html web page.

Perl Security

Programmers often assume that their script will be used in a particular way and that users will behave as expected. When writing a script, you should always keep in mind that everybody makes mistakes, and some people deliberately try to break things. For example, if your script expects the number 2 but the user types two, what will happen? This is particularly important if you make your scripts available via the Web. You should never trust user input and use it directly for sensitive operations such as opening files or running commands on the server.

Perl has a taint mode that warns you if the script injects user input directly into a sensitive operation. You can turn on the Perl taint mode by adding the -T switch after the path to the Perl interpreter at the top of your script, for example:

#!/usr/bin/perl -T

Unfortunately, this taint mode does not recognize variables passed to the script from a web form via the param() function, so you’ll need to manually check that the user input is what you expect. This is typically performed using regular expressions, where we match a string against a template.

For example, we can ensure that the form variable Age is a number between 10 and 99:

if(param())

{

if(!(param('Age')=~/^([1-9][0-9])$/))

{

print

font({-color=>'red'}, 'Invalid age: must be between 10 and 99 inclusive.');

exit;

}

my $user_age = "$1";

print $user_age;

}

Perl offers some cryptic syntax for regular expressions, but they make the expressions very easy to integrate into a program. The =~ operator means “check the item on the left against the regular expression on the right.” The regular expression itself is enclosed in forward slashes, although you can choose another character if you find this inconvenient (for instance, if the regular expression itself contains a slash). Thus, the if statement just shown checks whether the Age parameter matches the regular expression /^([1-9][0-9])$/. The expression itself is simpler than it at first appears.

The numbers in the brackets express a range. Here, we want two digits: the first between one and nine, and the second between zero and nine. The parentheses delimit the part of the string that matches these two characters. The caret (^) symbol marks the start of the string, and the dollar ($) symbol marks the end of the string; together, these two anchors ensure that there’s nothing before or after the two digits. After the regular expression check, the substring enclosed in the first set of parentheses is available as $1, the substring enclosed between the second set of parentheses is available as $2, and so on. We can assign this to a variable and use it; here, we’re just printing it on the screen. If the check fails, we print an error message and stop the program with the exit keyword. Note that the die() function won’t produce the display you want in a web environment because its message is sent to the system standard error location, rather than to standard output.

As another example, we can check that an entered name is less than 10 characters long and that only its first letter is capitalized; we allow an initial character in the range A-Z, and between 0 and 9 characters in the range a-z:

if(param())

{

if(!(param('Name')=~/^([A-Z][a-z]{0,9})$/))

{

print

font({-color=>'red'},

'Invalid name: must comprise only letters, '.

'be at most ten characters long, ',

'and have only the first letter capitalized.');

exit;

}

my $Name = "$1";

print $Name;

}

The square brackets and braces have very different meanings in regular expressions. The square brackets around [a-z] mean “any character in the range from a to z,” while the braces around {0,9} means “where the preceding item appears zero to nine times.”

Resources

§ For more on the CGI module, read the documentation at http://search.cpan.org/dist/CGI.pm/CGI.pm.

§ For more on web scripting security, read the WWW security FAQ at http://www.w3.org/Security/faq.

§ Regular expressions are a powerful tool. There are many good books on regular expressions, but try to choose one that focuses on Perl (there are slight variations in the regular expression syntax used in different tools). We recommend you read Mastering Regular Expressions by Jeffrey E. F. Friedl (O’Reilly).

Exercises

1. How is a Perl CGI script different from a command-line one?

2. What does the Perl CGI module do?

3. Write a regular expression that matches any word starting with “tele”; the match should not be case-sensitive.