MySQL Cookbook (2007)

Chapter 19. Processing Web Input with MySQL

Introduction

The previous chapter describes how to retrieve information from MySQL and display it in web pages using various types of HTML constructs such as tables or hyperlinks. That’s a use of MySQL to send information in one direction (from web server to user), but web-based database programming is also useful for collecting information sent in the other direction (from user to web server), such as the contents of a submitted form. For example, if you’re processing a survey form, you might store the information for later use. If the form contains search keywords, you’d use them as the basis for a query that searches the database for information the user wants to see.

MySQL comes into these activities in a fairly obvious way, as the repository for storing information or as the source from which search results are drawn. But before you can process input from a form, you have to create the form and send it to the user. MySQL can help with this, too, because it’s often possible to use information stored in your database to generate form elements such as radio buttons, checkboxes, pop-up menus, or scrolling lists:

§ You can select a set of items from a table that lists countries, states, or provinces and convert them into a pop-up menu in a form that collects address information.

§ You can use the list of legal values for an ENUM column that contains allowable salutations (Mr., Mrs., and so forth) to generate a set of radio buttons.

§ You can use lists of available colors, sizes, or styles stored in an inventory database to construct fields for a clothing ordering form.

§ If you have an application that enables the user to pick a database or table, you can query the MySQL server for a list of databases or tables and use the resulting names to create a list element.

By using database content to generate form elements, you lessen the amount of explicit knowledge your programs must have about table structure and content, and you enable them to determine what they need automatically. A script that uses database content to figure out for itself how to generate form elements will also adaptively handle changes to the database. To add a new country, create a new row in the table that stores the list of countries. To add a new salutation, change the definition of the ENUM column. In each case, you change the set of items in a form element by updating the database, not by modifying the script; the script adapts to the change automatically, without additional programming.

The first part of this chapter covers the following topics relating to web input processing:

Generating forms and form elements

One way to use database content for form construction is by selecting a list of items from a table and using them to create the options in a list element. But metadata can be used as well. There is a natural correspondence between ENUM columns and single-pick form elements like radio button sets or pop-up menus. In both cases, only one from a set of possible values may be chosen. There is a similar correspondence between SET columns and multiple-pick elements like checkbox groups; any or all of the possible values may be chosen. To construct metadata-based form elements, obtain the column description from the table metadata stored in INFORMATION_SCHEMA, extract the set of legal ENUM or SET values, and use them for the items in the form element.

Initializing forms using database contents

In addition to using the database to create structural elements of forms, you can also use it to initialize forms. For example, to enable a user to modify an existing record, retrieve it from the database and load it into a form’s fields before sending the form to the user for editing.

Processing input gathered over the Web

This includes input not only from form fields, but also the contents of uploaded files, or parameters that are present in URLs. Regardless of how you obtain the information, you’ll face a common set of issues in dealing with it: extracting and decoding the information, performing constraint or validity checking on it, and re-encoding the information for statement construction to avoid generating malformed statements or storing information inaccurately.

The second part of the chapter illustrates several ways to apply the techniques developed in the first part. These include applications that show how to use MySQL to present a web-based search interface, create paged displays that contain next-page and previous-page links, implement per-page hit counting and logging, and perform Apache logging to a database.

Scripts to create the tables used in this chapter are located in the tables directory of the recipes distribution. The scripts for the examples can be found under the directories named for the servers used to run them. For Perl, Ruby, PHP, and Python examples, look under the apache directory. Utility routines used by the example scripts are found in files located in the lib directory. (See Using Apache to Run Web Scripts for information on configuring Apache so that scripts can be run by it and find their library files.) For Java (JSP) examples, look under the tomcat directory; you should already have installed these examples in the process of setting up the mcb application context (Using Tomcat to Run Web Scripts).

Note that although the scripts in this chapter are intended to be invoked from your browser after they have been installed, many of them (JSP pages excepted) can also be invoked from the command line if you want to see the raw HTML they produce. (See Using Apache to Run Web Scripts.)

To provide a concrete context for discussion, many of the form-processing examples in this chapter are based on the following scenario. In the lucrative field of “construct-a-cow” business endeavors, you run an operation that manufactures build-to-order ceramic bovine figurines, and you want to design an online ordering application that lets customers make selections for several aspects of the product. For each order, it’s necessary to collect several types of information:

Cow color

The particular list of colors available at any particular time changes occasionally, so for flexibility, the values can be stored in a database table. To change the set of colors that customers can choose from, just update the table.

Cow size

There is a fixed set of sizes that doesn’t change often (small, medium, large), so the values can be represented as elements of an ENUM column.

The all-important cow accessory items

These include a bell, horns, a sporty-looking tail ribbon, and a nose ring. Accessories can be represented in a SET column, because a customer may want to select more than one of them. In addition, you know from past experience that most customers order horns and a cow bell, so it’s reasonable to use those for the column’s default value.

Customer name and address (street, city, state)

The possible state names are already stored in the states table, so we can use them as the basis for the corresponding form element.

Given the preceding discussion, a cow_order table can be designed like this:

CREATE TABLE cow_order

(

id INT UNSIGNED NOT NULL AUTO_INCREMENT,

# cow color, figurine size, and accessory items

color VARCHAR(20),

size ENUM('small','medium','large') DEFAULT 'medium',

accessories SET('cow bell','horns','nose ring','tail ribbon')

DEFAULT 'cow bell,horns',

# customer name, street, city, and state (abbreviation)

cust_name VARCHAR(40),

cust_street VARCHAR(40),

cust_city VARCHAR(40),

cust_state CHAR(2),

PRIMARY KEY (id)

);

The id column provides a unique identifier for each row. It’s a good idea to have such a value, and in fact will be necessary when we get to Loading a Database Record into a Form, which shows how to use web forms to edit existing rows. For that type of activity, you must be able to tell which row to update, which is difficult to do without a unique row identifier.

The list of available colors is maintained in a separate table, cow_color:

CREATE TABLE cow_color (color CHAR(20));

For purposes of illustration, assume that the cow_color table contains the following rows:

+---------------+

| color |

+---------------+

| Black |

| Black & White |

| Brown |

| Cream |

| Red |

| Red & White |

| See-Through |

+---------------+

An application can use the tables just described to generate list elements in an order entry form, making it unnecessary for the application to have a lot of specialized built-in knowledge about the available options. The next several recipes describe how to do this, and how to process the input that you obtain when a user submits a form.

Writing Scripts That Generate Web Forms

Problem

You want to write a script that gathers input from a user.

Solution

Create a form from within your script and send it to the user. The script can arrange to have itself invoked again to process the form’s contents when the user fills it in and submits it.

Discussion

Web forms are a convenient way to enable your visitors to submit information such as a set of search keywords, a completed survey result, or a response to a questionnaire. Forms are also beneficial for you as a developer because they provide a structured way to associate data values with names by which to refer to them.

A form begins and ends with <form> and </form> tags. Between those tags, you can place other HTML constructs, including special elements that become input fields in the page that the browser displays. The <form> tag that begins a form should include two attributes, action andmethod. The action attribute tells the browser what to do with the form when the user submits it. This will be the URL of the script that should be invoked to process the form’s contents. The method attribute indicates to the browser what kind of HTTP request it should use to submit the form. The value will be either get or post, depending on the type of request you want the form submission to generate. Collecting Web Input discusses the difference between these two request methods; for now, we’ll always use post.

Most of the form-based web scripts shown in this chapter share some common behaviors:

§ When first invoked, the script generates a form and sends it to the user to be filled in.

§ The action attribute of the form points back to the same script, so that when the user completes the form and submits it, the web server invokes the script again to process the form’s contents.

§ The script determines whether it’s being invoked by a user for the first time or whether it should process a submitted form by checking its execution environment to see what input parameters are present. For the initial invocation, the environment will contain none of the parameters named in the form.

This approach isn’t the only one you can adopt, of course. One alternative is to place a form in a static HTML page and have it point to the script that processes the form. Another is to have one script generate the form and a second script process it.

If a form-creating script wants to have itself invoked again when the user submits the form, it should determine what its own pathname within the web server tree is and use that value for the action attribute of the opening <form> tag. For example, if a script is installed as /cgi-bin/myscriptin your web tree, you could write the <form> tag like this:

However, each of our language APIs provides a way for a script to obtain its own pathname. That means no script needs to have its pathname hardwired into it, which gives you greater latitude to install the script where you want.

Perl

In Perl scripts, the CGI.pm module provides three methods that are useful for creating <form> elements and constructing the action attribute. start_form() and end_form() generate the opening and closing form tags, and url() returns the script’s own pathname. Using these methods, a script generates a form like this:

print start_form (-action => url (), -method => "post");

# ... generate form elements here ...

print end_form ();

start_form() supplies a default request method of post, so you can omit the method argument if you’re constructing post forms.

Ruby

In Ruby scripts, create a cgi object, and use its form method to generate a form. The method arguments provide the <form> tag attributes, and the block following the method call provides the form content. To get the script pathname, use the SCRIPT_NAME member of the ENV hash:

cgi.out {

cgi.form("action" => ENV["SCRIPT_NAME"], "method" => "post") {

# ... generate form elements here ...

}

The form method supplies a default request method of post, so you can omit the method argument if you’re constructing post forms.

The script pathname is also available from the cgi.script_name method.

PHP

In PHP, a script’s pathname can be obtained from the PHP_SELF member of the $HTTP_SERVER_VARS array or the $_SERVER array. Unfortunately, checking multiple sources of information is a lot of fooling around just to get the script pathname in a way that works reliably for different versions and configurations of PHP, so a utility routine to get the path is useful. The following function, get_self_path(), shows how to use $_SERVER if it’s available and fall back to $HTTP_SERVER_VARS otherwise. The function thus prefers the most recently introduced language features, but still works for scripts running under older versions of PHP:

function get_self_path ()

{

global $HTTP_SERVER_VARS;

$val = NULL;

if (isset ($_SERVER["PHP_SELF"]))

$val = $_SERVER["PHP_SELF"];

else if (isset ($HTTP_SERVER_VARS["PHP_SELF"]))

$val = $HTTP_SERVER_VARS["PHP_SELF"];

return ($val);

}

$HTTP_SERVER_VARS is a global variable, but must be declared as such explicitly using the global keyword if used in a nonglobal scope (such as within a function). $_SERVER is a “superglobal” array and is accessible in any scope without being declared as global.

The get_self_path() function is part of the Cookbook_Webutils.php library file located in the lib directory of the recipes distribution. If you install that file in a directory that PHP searches when looking for include files, a script can obtain its own pathname and use it to generate a form as follows:

include "Cookbook_Webutils.php";

$self_path = get_self_path ();

print ("<form action=\"$self_path\" method=\"post\">\n");

# ... generate form elements here ...

print ("</form>\n");

Python

Python scripts can get the script pathname by importing the os module and accessing the SCRIPT_NAME member of the os.environ object:

import os

print "<form action=\"" + os.environ["SCRIPT_NAME"] + "\" method=\"post\">"

# ... generate form elements here ...

print "</form>"

Java

In JSP pages, the request path is available through the implicit request object that the JSP processor makes available. Use that object’s getRequestURI() method as follows:

<form action="<%= request.getRequestURI () %>" method="post">

<%-- ... generate form elements here ... --%>

</form>

See Also

The examples shown in this section have an empty body between the opening and closing form tags. For a form to be useful, you’ll need to create body elements that correspond to the types of information that you want to obtain from users. It’s possible to hardwire these elements into a script, but Recipes and describe how MySQL can help you create the elements on the fly based on information stored in your database.

Creating Single-Pick Form Elements from Database Content

Problem

A form needs to present a field that offers several options but enables the user to select only one of them.

Solution

Use a single-pick list element. These include radio button sets, pop-up menus, and scrolling lists.

Discussion

Single-pick form elements enable you to present multiple choices from which a single option can be selected. Our construct-a-cow scenario involves several sets of single-pick choices:

§ The list of colors in the cow_color table. These can be obtained with the following statement:

§ mysql>SELECT color FROM cow_color ORDER BY color;

§ +---------------+

§ | color |

§ +---------------+

§ | Black |

§ | Black & White |

§ | Brown |

§ | Cream |

§ | Red |

§ | Red & White |

§ | See-Through |

+---------------+

Note that some of the colors contain a & character, which is special in HTML. This means they will need HTML-encoding when placed into list elements. (We’ll perform encoding for all the list elements in the form as a matter of course, but those values illustrate why it’s a good idea to get in that habit.)

§ The list of legal figurine sizes in the size column of the cow_order table. The column is represented as an ENUM, so the possible values and the default value can be obtained from INFORMATION_SCHEMA:

§ mysql>SELECT COLUMN_TYPE, COLUMN_DEFAULT

§ -> FROM INFORMATION_SCHEMA.COLUMNS

§ -> WHERE TABLE_SCHEMA='cookbook' AND TABLE_NAME='cow_order'

§ -> AND COLUMN_NAME='size';

§ +--------------------------------+----------------+

§ | COLUMN_TYPE | COLUMN_DEFAULT |

§ +--------------------------------+----------------+

§ | enum('small','medium','large') | medium |

+--------------------------------+----------------+

§ The list of state names and abbreviations. These are available from the states table:

§ mysql>SELECT abbrev, name FROM states ORDER BY name;

§ +--------+----------------+

§ | abbrev | name |

§ +--------+----------------+

§ | AL | Alabama |

§ | AK | Alaska |

§ | AZ | Arizona |

§ | AR | Arkansas |

§ | CA | California |

§ | CO | Colorado |

§ | CT | Connecticut |

...

The number of choices varies for each of the lists just described. As shown, there are 3 figurine sizes, 7 colors, and 50 states. The differing numbers of choices lead to different decisions about how to represent the lists in a form:

§ The figurine size values are best represented as a set of radio buttons or a pop-up menu; a scrolling list is unnecessary because the number of choices is small.

§ The set of colors can reasonably be displayed using any of the single-pick element types; it’s small enough that a set of radio buttons wouldn’t take a lot of space, but large enough that you may want to enable scrolling—particularly if you make additional colors available.

§ The list of states is likely to have more items than you’d want to present as a set of radio buttons, so it’s most suitable for presentation as a pop-up menu or scrolling list.

The following discussion describes the HTML syntax for these types of elements and then shows how to generate them from within scripts:

Radio buttons

A group of radio buttons consists of <input> elements of type radio, all with the same name attribute. Each element also includes a value attribute. A label to display can be given after the <input> tag. To mark an item as the default initial selection, add a checked attribute. The following radio button group displays the possible cow figurine sizes, using checked to mark medium as the initially selected value:

<input type="radio" name="size" value="small" />small

<input type="radio" name="size" value="medium" checked="checked" />medium

<input type="radio" name="size" value="large" />large

Pop-up menus

A pop-up menu is a list that begins and ends with <select> and </select> tags, with each item in the menu enclosed within <option> and </option> tags. Each <option> element has a value attribute, and its body provides a label to be displayed. To indicate a default selection, add a selected attribute to the appropriate <option> item. If no item is so marked, the first item becomes the default, as is the case for the following pop-up menu:

<option value="Black">Black</option>

<option value="Black & White">Black & White</option>

<option value="Brown">Brown</option>

<option value="Cream">Cream</option>

<option value="Red & White">Red & White</option>

<option value="See-Through">See-Through</option>

</select>

Scrolling lists

A scrolling list is displayed as a set of items in a box. The list may contain more items than are visible in the box, in which case the browser displays a scrollbar that the user can use to bring the other items into view. The HTML syntax for scrolling lists is similar to that for pop-up menus, except that the opening <select> tag includes a size attribute indicating how many rows of the list should be visible in the box. By default, a scrolling list is a single-pick element; Creating Multiple-Pick Form Elements from Database Content discusses how to allow multiple picks.

The following single-pick scrolling list includes an item for each U.S. state, of which six will be visible at a time:

<option value="AL">Alabama</option>

<option value="AK">Alaska</option>

<option value="AZ">Arizona</option>

<option value="AR">Arkansas</option>

<option value="CA">California</option>

...

<option value="WV">West Virginia</option>

<option value="WI">Wisconsin</option>

<option value="WY">Wyoming</option>

</select>

Radio button sets, pop-up menus, and scrolling lists have several things in common:

A name for the element

When the user submits the form, the browser associates this name with whatever value the user selected.

A set of values, one for each item in the list

These determine which values are available to be selected.

A set of labels, one for each item

These determine what the user sees when the form is displayed.

An optional default value

This determines which item in the list is selected initially when the browser displays the list.

To produce a list element for a form using database content, issue a statement that selects the appropriate values and labels, encode any special characters they contain, and add the HTML tags that are appropriate for the kind of list you want to display. Should you desire to indicate a default selection, add a checked or selected attribute to the proper item in the list.

Let’s consider how to produce form elements for the color and state lists first. Both of these are produced by fetching a set of column values from a table. Then we’ll construct the figurine size list, which takes its values from a column’s definition rather than its contents.

In JSP, you can display a set of radio buttons for the colors using JSTL tags as follows. The color names are used as both the values and the labels, so you print them twice:

<sql:query dataSource="${conn}" var="rs">

SELECT color FROM cow_color ORDER BY color

</sql:query>

<c:forEach items="${rs.rows}" var="row">

<input type="radio" name="color"

value="<c:out value="${row.color}"/>"

/><c:out value="${row.color}"/><br />

</c:forEach>

<c:out> performs HTML entity encoding, so the & character that is present in some of the color values is converted to & automatically and does not cause display problems in the resulting web page.

To display a pop-up menu instead, the retrieval statement is the same, but the row-fetching loop is different:

<sql:query dataSource="${conn}" var="rs">

SELECT color FROM cow_color ORDER BY color

</sql:query>

<c:forEach items="${rs.rows}" var="row">

<c:out value="${row.color}"/></option>

</c:forEach>

</select>

The pop-up menu can be changed easily to a scrolling list. All you need to do is add a size attribute to the opening <select> tag. For example, to make three colors visible at a time, generate the list like this:

<sql:query dataSource="${conn}" var="rs">

SELECT color FROM cow_color ORDER BY color

</sql:query>

<c:forEach items="${rs.rows}" var="row">

<c:out value="${row.color}"/></option>

</c:forEach>

</select>

Generating a list element for the set of states is similar, except that the labels are not the same as the values. To make the labels more meaningful to customers, display the full state names. But the value that is returned when the form is submitted should be an abbreviation, because that is what gets stored in the cow_order table. To produce a list that way, select both the abbreviations and the full names, and then insert them into the proper parts of each list item. For example, to create a pop-up menu, do this:

<sql:query dataSource="${conn}" var="rs">

SELECT abbrev, name FROM states ORDER BY name

</sql:query>

<c:forEach items="${rs.rows}" var="row">

<c:out value="${row.name}"/></option>

</c:forEach>

</select>

The preceding JSP examples use an approach that prints each list item individually. List element generation in CGI.pm-based Perl scripts proceeds on a different basis: extract the information from the database first, and then pass it all to a function that returns a string representing the form element. The functions that generate single-pick elements are radio_group(), popup_menu(), and scrolling_list(). These have several arguments in common:

name

What you want to call the element.

values

The values for the items in the list. This should be a reference to an array.

labels

The labels to associate with each value. This argument is optional; if it’s missing, CGI.pm uses the values as the labels. Otherwise, the labels argument should be a reference to a hash that associates each value with its corresponding label. For example, to produce a list element for cow colors, the values and labels are the same, so no labels argument is necessary. However, to produce a state list, labels should be a reference to a hash that maps each state abbreviation to its full name.

default

The initially selected item in the element. This argument is optional. For a radio button set, CGI.pm automatically selects the first button by default if this argument is missing. To defeat that behavior, provide a default value that is not present in the values list. (This value cannot beundef or the empty string.)

Some of the functions take additional arguments. For radio_group(), you can supply a linebreak argument to specify that the buttons should be displayed vertically rather than horizontally. scrolling_list() takes a size argument indicating how many items should be visible at a time. (The CGI.pm documentation describes additional arguments that are not used here at all. For example, there are arguments for laying out radio buttons in tabular form, but we’re not going to be that fancy.)

To construct a form element using the colors in the cow_color table, begin by retrieving them into an array:

my $color_ref = $dbh->selectcol_arrayref (

"SELECT color FROM cow_color ORDER BY color");

selectcol_arrayref() returns a reference to the array, which is the kind of value needed for the values argument of the CGI.pm functions that create list elements. To create a group of radio buttons, a pop-up menu, or a single-pick scrolling list, invoke the functions as follows:

print radio_group (-name => "color",

-values => $color_ref,

-linebreak => 1); # display buttons vertically

print popup_menu (-name => "color",

-values => $color_ref);

print scrolling_list (-name => "color",

-values => $color_ref,

-size => 3); # display 3 items at a time

The values and the labels for the color list are the same, so no labels argument need be given; CGI.pm will use the values as labels by default. Note that we haven’t HTML-encoded the colors here, even though some of them contain an & character. CGI.pm functions for generating form elements automatically perform HTML-encoding, unlike its functions for creating nonform elements.

To produce a list of states for which the values are abbreviations and the labels are full names, we do need a labels argument. It should be a reference to a hash that maps each value to the corresponding label. Construct the value list and label hash as follows:

my @state_values;

my %state_labels;

my $sth = $dbh->prepare ("SELECT abbrev, name

FROM states ORDER BY name");

$sth->execute ();

while (my ($abbrev, $name) = $sth->fetchrow_array ())

{

push (@state_values, $abbrev); # save each value in an array

$state_labels{$abbrev} = $name; # map each value to its label

}

Pass the resulting list and hash by reference to popup_menu() or scrolling_list(), depending on which kind of list element you want to produce:

print popup_menu (-name => "state",

-values => \@state_values,

-labels => \%state_labels);

print scrolling_list (-name => "state",

-values => \@state_values,

-labels => \%state_labels,

-size => 6); # display 6 items at a time

The Ruby cgi module also has methods for generating radio buttons, pop-up menus, and scrolling lists. You can examine the form_element.rb script to see how to use them. However, I am not going to discuss them here because I find them awkward to use, particularly when it’s necessary to ensure that values are properly escaped or that certain group members are selected by default.

If you’re using an API that doesn’t provide a ready-made set of functions for producing form elements (or those functions are inconvenient to use), you may elect either to print HTML as you fetch list items from MySQL, or write utility routines that generate the form elements for you. The following discussion considers how to implement both approaches, using PHP and Python.

In PHP, the list of values from the cow_color table can be presented in a pop-up menu as follows using a fetch-and-print loop like this:

print ("<select name=\"color\">\n");

$stmt = "SELECT color FROM cow_color ORDER BY color";

$result =& $conn->query ($stmt);

if (!PEAR::isError ($result))

{

while (list ($color) = $result->fetchRow ())

{

$color = htmlspecialchars ($color);

print ("<option value=\"$color\">$color</option>\n");

}

$result->free ();

}

print ("</select>\n");

Python code to do the same is similar:

stmt = "SELECT color FROM cow_color ORDER BY color"

cursor = conn.cursor ()

cursor.execute (stmt)

print "<select name=\"color\">"

for (color, ) in cursor.fetchall ():

color = cgi.escape (color, 1)

print "<option value=\"%s\">%s</option>" % (color, color)

cursor.close ()

print "</select>"

The state list requires different values and labels, so the code is slightly more complex. In PHP, it looks like this:

print ("<select name=\"state\">\n");

$stmt = "SELECT abbrev, name FROM states ORDER BY name";

$result =& $conn->query ($stmt);

if (!PEAR::isError ($result))

{

while ($row =& $result->fetchRow ())

{

$abbrev = htmlspecialchars ($row[0]);

$name = htmlspecialchars ($row[1]);

print ("<option value=\"$abbrev\">$name</option>\n");

}

$result->free ();

}

print ("</select>\n");

And in Python, like this:

stmt = "SELECT abbrev, name FROM states ORDER BY name"

cursor = conn.cursor ()

cursor.execute (stmt)

print "<select name=\"state\">"

for (abbrev, name) in cursor.fetchall ():

abbrev = cgi.escape (abbrev, 1)

name = cgi.escape (name, 1)

print "<option value=\"%s\">%s</option>" % (abbrev, name)

cursor.close ()

print "</select>"

Radio buttons and scrolling lists can be produced in similar fashion. But rather than doing so, let’s use a different approach and construct a set of functions that generate form elements, given the proper information. The functions return a string representing the appropriate kind of form element. They’re invoked as follows:

make_radio_group (name, values, labels, default, vertical)

make_popup_menu (name, values, labels, default)

make_scrolling_list (name, values, labels, default, size, multiple)

These functions have several arguments in common:

name

The name of the form element.

values

An array or list of values for the items in the element.

labels

Another array that provides the corresponding element label to display for each value. The two arrays must be the same size. (If you want to use the values as the labels, just pass the same array to the function twice.)

default

The initial value of the form element. This should be a scalar value, except for make_scrolling_list(). We’ll write that function to handle either single-pick or multiple-pick lists (and use it for the latter purpose in Creating Multiple-Pick Form Elements from Database Content), so itsdefault value is allowed to be either a scalar or an array. If there is no default, pass a value that isn’t contained in the values array; typically, an empty string will do.

Some of the functions also have additional arguments that apply only to particular element types:

vertical

This applies to radio button groups. If true, it indicates that the items should be stacked vertically rather than horizontally.

size, multiple

These arguments apply to scrolling lists. size indicates how many items in the list are visible, and multiple should be true if the list allows multiple selections.

The implementation of some of these list-generating functions is discussed here, but you can find the code for all of them in the lib directory of the recipes distribution. All of them act like CGI.pm for form element functions in the sense that they automatically perform HTML-encoding of argument values that are incorporated into the list. (The Ruby version of the library file includes utility methods for generating these elements, too, even though the cgi module has methods for creating them; I think the utility methods are easier to use than the cgi methods.)

In PHP, the make_radio_group() function for creating a set of radio buttons is written like this:

function make_radio_group ($name, $values, $labels, $default, $vertical)

{

$str = "";

for ($i = 0; $i < count ($values); $i++)

{

# select the item if it corresponds to the default value

$checked = ($values[$i] == $default ? " checked=\"checked\"" : "");

$str .= sprintf (

"<input type=\"radio\" name=\"%s\" value=\"%s\"%s />%s",

htmlspecialchars ($name),

htmlspecialchars ($values[$i]),

$checked,

htmlspecialchars ($labels[$i]));

if ($vertical)

$str .= "<br />"; # display items vertically

$str .= "\n";

}

return ($str);

}

The function constructs the form element as a string, which it returns. To use make_radio_group() to present cow colors, invoke it after fetching the items from the cow_color table, as follows:

$values = array ();

$stmt = "SELECT color FROM cow_color ORDER BY color";

$result =& $conn->query ($stmt);

if (!PEAR::isError ($result))

{

while ($row =& $result->fetchRow ())

$values[] = $row[0];

$result->free ();

}

print (make_radio_group ("color", $values, $values, "", TRUE));

The $values array is passed to make_radio_group() twice because it’s used both for the values and the labels.

If you want to present a pop-up menu, use the following function instead:

function make_popup_menu ($name, $values, $labels, $default)

{

$str = "";

for ($i = 0; $i < count ($values); $i++)

{

# select the item if it corresponds to the default value

$checked = ($values[$i] == $default ? " selected=\"selected\"" : "");

$str .= sprintf (

"<option value=\"%s\"%s>%s</option>\n",

htmlspecialchars ($values[$i]),

$checked,

htmlspecialchars ($labels[$i]));

}

$str = sprintf (

"<select name=\"%s\">\n%s</select>\n",

htmlspecialchars ($name),

$str);

return ($str);

}

make_popup_menu() has no $vertical parameter, but otherwise you invoke it the same way as make_radio_group():

print (make_popup_menu ("color", $values, $values, ""));

The make_scrolling_list() function is similar to make_popup_menu(), so I won’t show its implementation here. To invoke it to produce a single-pick list, pass the same arguments as for make_popup_menu(), but indicate how many rows should be visible at once, and add amultiple argument of FALSE:

print (make_scrolling_list ("color", $values, $values, "", 3, FALSE));

The state list uses labels that are different from the values. Fetch the labels and values like this:

$values = array ();

$labels = array ();

$stmt = "SELECT abbrev, name FROM states ORDER BY name";

$result =& $conn->query ($stmt);

if (!PEAR::isError ($result))

{

while ($row =& $result->fetchRow ())

{

$values[] = $row[0];

$labels[] = $row[1];

}

$result->free ();

}

Then use the values and labels to generate the type of list you want:

print (make_popup_menu ("state", $values, $labels, ""));

print (make_scrolling_list ("state", $values, $labels, "", 6, FALSE));

Ruby and Python implementations of the utility functions are similar to the PHP versions. For example, the Python version of make_popup_menu() looks like this:

def make_popup_menu (name, values, labels, default):

result_str = ""

# make sure name and default are strings

name = str (name)

default = str (default)

for i in range (len (values)):

# make sure value and label are strings

value = str (values[i])

label = str (labels[i])

# select the item if it corresponds to the default value

if value == default:

checked = " selected=\"selected\""

else:

checked = ""

result_str = result_str + \

"<option value=\"%s\"%s>%s</option>\n" \

% (cgi.escape (value, 1),

checked,

cgi.escape (label, 1))

result_str = "<select name=\"%s\">\n%s</select>\n" \

% (cgi.escape (name, 1), result_str)

return result_str

To present the cow colors in a form, fetch them like this:

values = []

stmt = "SELECT color FROM cow_color ORDER BY color"

cursor = conn.cursor ()

cursor.execute (stmt)

for (color, ) in cursor.fetchall ():

values.append (color)

cursor.close ()

Then convert the list to a form element using one of the following calls:

print make_radio_group ("color", values, values, "", True)

print make_popup_menu ("color", values, values, "")

print make_scrolling_list ("color", values, values, "", 3, False)

To present the state list, fetch the names and abbreviations:

values = []

labels = []

stmt = "SELECT abbrev, name FROM states ORDER BY name"

cursor = conn.cursor ()

cursor.execute (stmt)

for (abbrev, name) in cursor.fetchall ():

values.append (abbrev)

labels.append (name)

cursor.close ()

Then pass them to the appropriate function:

print make_popup_menu ("state", values, labels, "")

print make_scrolling_list ("state", values, labels, "", 6, False)

Something the Ruby and Python utility methods in the lib directory do that their PHP counterparts do not is explicitly convert to string form all argument values that get incorporated into the list. This is necessary because the Ruby CGI.escapeHTML() and Python cgi.escape() methods raise an exception if you try to use them to HTML-encode nonstring values.

We have thus far considered how to fetch rows from the cow_color and states tables and convert them to form elements. Another element that needs to be part of the form for the online cow-ordering application is the field for specifying cow figurine size. The legal values for this field come from the definition of the size column in the cow_order table. That column is an ENUM, so getting the legal values for the corresponding form element is a matter of getting the column definition and parsing it apart. In other words, we need to use the column metadata rather than the column data.

As it happens, a lot of the work involved in this task has already been done in Getting ENUM and SET Column Information, which develops utility routines to get ENUM or SET column metadata. In Perl, for example, invoke the get_enumorset_info() function as follows to get the sizecolumn metadata:

my $size_info = get_enumorset_info ($dbh, "cookbook", "cow_order", "size");

The resulting $size_info value is a reference to a hash that has several members, two of which are relevant to our purposes here:

$size_info->{values}

$size_info->{default}

The values member is a reference to a list of the legal enumeration values, and default is the column’s default value. This information is in a format that can be converted directly to a form element such as a group of radio buttons or a pop-up menu as follows:

print radio_group (-name => "size",

-values => $size_info->{values},

-default => $size_info->{default},

-linebreak => 1); # display buttons vertically

print popup_menu (-name => "size",

-values => $size_info->{values},

-default => $size_info->{default});

The default value is medium, so that’s the value that will be selected initially when the browser displays the form.

The equivalent Ruby metadata-fetching method returns a hash. Use it as follows to generate form elements from the size column metadata:

size_info = get_enumorset_info(dbh, "cookbook", "cow_order", "size")

form << make_radio_group("size",

size_info["values"],

size_info["default"],

true) # display items vertically

form << make_popup_menu("size",

size_info["values"],

size_info["default"])

The metadata function for PHP returns an associative array, which is used in similar fashion:

$size_info = get_enumorset_info ($conn, "cookbook", "cow_order", "size");

print (make_radio_group ("size",

$size_info["values"],

$size_info["default"],

TRUE)); # display items vertically

print (make_popup_menu ("size",

$size_info["values"],

$size_info["default"]));

The Python version of the metadata function returns a dictionary:

size_info = get_enumorset_info (conn, "cookbook", "cow_order", "size")

print make_radio_group ("size",

size_info["values"],

size_info["default"],

True) # display items vertically

print make_popup_menu ("size",

size_info["values"],

size_info["default"])

When you use ENUM values like this to create list elements, the values are displayed in the order they are listed in the column definition. The size column definition lists the values in the proper display order (small, medium, large), but for columns for which you want a different order, sort the values appropriately.

To demonstrate how to process column metadata to generate form elements in JSP pages, I’m going to use a function embedded into the page. A better approach would be to write a custom action in a tag library that maps onto a class that returns the information, but custom tag writing is beyond the scope of this book. The examples take the following approach instead:

1. Use JSTL tags to query INFORMATION_SCHEMA for the ENUM column definition, and then move the definition into page context.

2. Invoke a function that extracts the definition from page context, parses it into an array of individual enumeration values, and moves the array back into page context.

3. Access the array using a JSTL iterator that displays each of its values as a list item. For each value, compare it to the column’s default value and mark it as the initially selected item if it’s the same.

The function that extracts legal values from an ENUM or SET column definition is named getEnumOrSetValues(). Place it into a JSP page like this:

<%@ page import="java.util.*" %>

<%@ page import="java.util.regex.*" %>

<%!

// declare a class method for busting up ENUM/SET values.

// typeDefAttr - the name of the page context attribute that contains

// the columm type definition

// valListAttr - the name of the page context attribute to stuff the

// column value list into

void getEnumOrSetValues (PageContext ctx,

String typeDefAttr,

String valListAttr)

{

String typeDef = ctx.getAttribute (typeDefAttr).toString ();

List values = new ArrayList ();

// column must be an ENUM or SET

Pattern pc = Pattern.compile ("(enum|set)\$(.*)\$",

Pattern.CASE_INSENSITIVE);

Matcher m = pc.matcher (typeDef);

// matches() fails unless it matches entire string

if (m.matches ())

{

// split value list on commas, trim quotes from end of each word

String[] v = m.group (2).split (",");

for (int i = 0; i < v.length; i++)

values.add (v[i].substring (1, v[i].length() - 1));

}

ctx.setAttribute (valListAttr, values);

}

The function takes three arguments:

ctx

The page context object.

typeDefAttr

The name of the page attribute that contains the column definition. This is the function “input.”

valListAttr

The name of the page attribute into which to place the resulting array of legal column values. This is the function “output.”

To generate a list element from the size column, begin by fetching the column metadata. Extract the column value list into a JSTL variable named values and the default value into a variable named default as follows:

<sql:query dataSource="${conn}" var="rs">

SELECT COLUMN_TYPE, COLUMN_DEFAULT

FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_SCHEMA = 'cookbook' AND TABLE_NAME = 'cow_order'

AND COLUMN_NAME = 'size'

</sql:query>

<c:set var="typeDef" scope="page" value="${rs.rowsByIndex[0][0]}"/>

<% getEnumOrSetValues (pageContext, "typeDef", "values"); %>

<c:set var="default" scope="page" value="${rs.rowsByIndex[0][1]}"/>

Then use the value list and default value to construct a form element. For example, produce a set of radio buttons like this:

<c:forEach items="${values}" var="val">

<input type="radio" name="size"

value="<c:out value="${val}"/>"

<c:if test="${val == default}">checked="checked"</c:if>

/><c:out value="${val}"/><br />

</c:forEach>

or a pop-up menu like this:

<c:forEach items="${values}" var="val">

<option

value="<c:out value="${val}"/>"

<c:if test="${val == default}">selected="selected"</c:if>

<c:out value="${val}"/></option>

</c:forEach>

</select>

DON’T FORGET TO HTML-ENCODE ALL LIST CONTENT IN FORMS

The Ruby, PHP, and Python utility routines described in this recipe for generating list elements perform HTML-encoding of attribute values for the HTML tags that make up the list, such as the name and value attributes. They also encode the labels. I’ve noticed that many published accounts of list generation do not do this, or they encode the labels but not the values. That is a mistake. If either the label or the value contains a special character like & or <, the browser may misinterpret them, and your application will misbehave. It’s also important to make sure that your encoding function turns double quotes into " entities (or ", which is equivalent), because tag attributes are so often enclosed within double quotes. Failing to convert a double quote to the entity name in an attribute value results in a double quote within a double-quoted string, which is malformed.

If you’re using the Perl CGI.pm module or the JSTL tags to produce HTML for form elements, encoding is taken care of for you. CGI.pm’s form-related functions automatically perform encoding. Similarly, using the JSTL <c:out> tag to write attribute values from JSP pages will produce properly encoded values.

The list-generating methods discussed here are not tied to any particular database table, so they can be used to create form elements for all kinds of data, not just those shown for the cow-ordering scenario. For example, to enable a user to pick a table name in a database administration application, you can generate a scrolling list that contains an item for each table in the database. A CGI.pm-based script might do so like this:

my $stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME";

my $table_ref = $dbh->selectcol_arrayref ($stmt);

print scrolling_list (-name => "table",

-values => $table_ref,

-size => 10); # display 10 items at a time

Query results need not necessarily even be related to database tables. For example, if you want to present a list with an entry for each of the last seven days from within a JSP page, you can calculate the dates using this statement:

<sql:query dataSource="${conn}" var="rs">

SELECT

CURDATE() - INTERVAL 6 DAY,

CURDATE() - INTERVAL 5 DAY,

CURDATE() - INTERVAL 4 DAY,

CURDATE() - INTERVAL 3 DAY,

CURDATE() - INTERVAL 2 DAY,

CURDATE() - INTERVAL 1 DAY,

CURDATE()

</sql:query>

Then use the dates to generate a list element:

<c:set var="dateList" value="${rs.rowsByIndex[0]}"/>

<c:forEach items="${dateList}" var="date">

<input type="radio" name="date"

value="<c:out value="${date}"/>"

/><c:out value="${date}"/><br />

</c:forEach>

Of course, if your programming language makes it reasonably easy to perform date calculations, it is more efficient to generate the list of dates on the client side without sending a statement to the MySQL server.

Creating Multiple-Pick Form Elements from Database Content

Problem

A form needs to present a field that offers several options and enables the user to select any number of them.

Solution

Use a multiple-pick list element, such as a set of checkboxes or a scrolling list.

Discussion

Multiple-pick form elements enable you to present multiple choices, any number of which can be selected, or possibly even none of them. For our example scenario in which customers order cow figurines online, the multiple-pick element is represented by the set of accessory items that are available. The accessory column in the cow_order table is represented as a SET, so the allowable and default values can be obtained with the following statement:

mysql>SELECT COLUMN_TYPE, COLUMN_DEFAULT FROM INFORMATION_SCHEMA.COLUMNS

-> WHERE TABLE_SCHEMA='cookbook' AND TABLE_NAME='cow_order'

-> AND COLUMN_NAME='accessories';

+---------------------------------------------------+----------------+

| COLUMN_TYPE | COLUMN_DEFAULT |

+---------------------------------------------------+----------------+

| set('cow bell','horns','nose ring','tail ribbon') | cow bell,horns |

+---------------------------------------------------+----------------+

The values listed in the definition can reasonably be represented as either a set of checkboxes or as a multiple-pick scrolling list. Either way, the cow bell and horns items should be selected initially, because each is present in the column’s default value. The following discussion shows the HTML syntax for these elements, and then describes how to generate them from within scripts.

NOTE

The material in this section relies heavily on Creating Single-Pick Form Elements from Database Content , which discusses radio buttons, pop-up menus, and single-pick scrolling lists. I assume that you’ve already read that section.

Checkboxes

A group of checkboxes is similar to a group of radio buttons in that it consists of <input> elements that all have the same name attribute. However, the type attribute is checkbox rather than radio, and you can specify checked for as many items in the group as you want to be selected by default. If no items are marked as checked, none are selected initially. The following checkbox set shows the cow accessory items with the first two items selected by default:

<input type="checkbox" name="accessories" value="cow bell"

checked="checked" />cow bell

<input type="checkbox" name="accessories" value="horns"

checked="checked" />horns

<input type="checkbox" name="accessories" value="nose ring" />nose ring

<input type="checkbox" name="accessories" value="tail ribbon" />tail ribbon

Scrolling lists

A multiple-pick scrolling list has most syntax in common with its single-pick counterpart. The differences are that you include a multiple attribute in the opening <select> tag, and default value selection is different. For a single-pick list, you can add selected to at most one item, and the first item is selected by default in the absence of an explicit selected attribute. For a multiple-pick list, you can add a selected attribute to as many of the items as you like, and no items are selected by default in the absence of selected attributes.

If the set of cow accessories is represented as a multiple-pick scrolling list with cow bell and horns selected initially, it looks like this:

<option value="horns" selected="selected">horns</option>

<option value="tail ribbon">tail ribbon</option>

</select>

In CGI.pm-based Perl scripts, you create checkbox sets or scrolling lists by invoking checkbox_group() or scrolling_list(). These functions take name, values, labels, and default arguments, just like their single-pick cousins. But because multiple items can be selected initially, CGI.pm allows the default argument to be specified as either a scalar value or a reference to an array of values. (It also accepts the argument name defaults as a synonym for default.)

To get the list of legal values for a SET column, we can do the same thing as in Creating Single-Pick Form Elements from Database Content for ENUM columns—that is, call a utility routine that returns the column metadata:

my $acc_info = get_enumorset_info ($dbh, "cookbook", "cow_order", "accessories");

However, the default value for a SET column is not in a form that is directly usable for form element generation. MySQL represents SET default values as a list of zero or more items, separated by commas; for example, the default for the accessories column is cow bell,horns. That doesn’t match the list-of-values format that the CGI.pm functions expect, so it’s necessary to split the default value at the commas to obtain an array. The following expression shows how to do so, taking into account the possibility that the default column value might be undef (NULL):

my @acc_def = (defined ($acc_info->{default})

? split (/,/, $acc_info->{default})

: () );

After splitting the default value, pass the resulting array by reference to whichever of the list-generating functions you want to use:

print checkbox_group (-name => "accessories",

-values => $acc_info->{values},

-default => \@acc_def,

-linebreak => 1); # display buttons vertically

print scrolling_list (-name => "accessories",

-values => $acc_info->{values},

-default => \@acc_def,

-size => 3, # display 3 items at a time

-multiple => 1); # create multiple-pick list

When you use SET values like this to create list elements, the values are displayed in the order they are listed in the column definition. That may not correspond to the order in which you want them to appear; if not, sort the values appropriately.

For Ruby, PHP, and Python, we can create utility functions to generate multiple-pick items. They’ll have the following invocation syntax:

make_checkbox_group (name, values, labels, default, vertical)

make_scrolling_list (name, values, labels, default, size, multiple)

The name, values, and labels arguments to these functions are similar to those of the single-pick utility routines described in Creating Single-Pick Form Elements from Database Content . make_checkbox_group() takes a vertical argument to indicate whether the items should be stacked vertically rather than horizontally. make_scrolling_list() has already been described in Creating Single-Pick Form Elements from Database Content for producing single-pick lists. To use it here, the multiple argument should be true to produce a multiple-pick list. For both functions, the default argument can be an array of multiple values if several items should be selected initially.

make_checkbox_group()looks like this (shown here in Ruby; the PHP and Python versions are similar):

def make_checkbox_group(name, values, labels, default, vertical)

# make sure default is an array (converts a scalar to an array)

default = [ default ].flatten

str = ""

for i in 0...values.length do

# select the item if it corresponds to one of the default values

checked = (default.include?(values[i]) ? " checked=\"checked\"" : "")

str << sprintf(

"<input type=\"checkbox\" name=\"%s\" value=\"%s\"%s />%s",

CGI.escapeHTML(name.to_s),

CGI.escapeHTML(values[i].to_s),

checked,

CGI.escapeHTML(labels[i].to_s))

str << "<br />" if vertical # display items vertically

str << "\n"

end

return str

end

To fetch the cow accessory information and present it using checkboxes, do this:

acc_info = get_enumorset_info(dbh, "cookbook", "cow_order", "accessories")

if acc_info["default"].nil?

acc_def = []

else

acc_def = acc_info["default"].split(",")

end

form << make_checkbox_group("accessories",

acc_info["values"],

acc_def,

true) # display items vertically

To display a scrolling list instead, invoke make_scrolling_list():

form << make_scrolling_list("accessories",

acc_info["values"],

acc_def,

3, # display 3 items at a time

true) # create multiple-pick list

In PHP, fetch the accessory information, and then present checkboxes or a scrolling list as follows:

$acc_info = get_enumorset_info ($conn, "cookbook", "cow_order", "accessories");

$acc_def = explode (",", $acc_info["default"]);

print (make_checkbox_group ("accessories[]",

$acc_info["values"],

$acc_def,

TRUE)); # display items vertically

print (make_scrolling_list ("accessories[]",

$acc_info["values"],

$acc_def,

3, # display 3 items at a time

TRUE)); # create multiple-pick list

Note that the field name in the PHP examples is specified as accessories[] rather than as accessories. In PHP, you must add [] to the name if you want to allow a field to have multiple values. If you omit the [], the user will be able to select multiple items while filling in the form, but PHP will return only one of them to your script. This issue comes up again when we discuss how to process the contents of submitted forms in Collecting Web Input.

In Python, to fetch the cow accessory information and present it using checkboxes or a scrolling list, do this:

acc_info = get_enumorset_info (conn, "cookbook", "cow_order", "accessories")

if acc_info["default"] == None:

acc_def = ""

else:

acc_def = acc_info["default"].split (",")

print make_checkbox_group ("accessories",

acc_info["values"],

acc_def,

True) # display items vertically

print make_scrolling_list ("accessories",

acc_info["values"],

acc_def,

3, # display 3 items at a time

True) # create multiple-pick list

In JSP pages, the getEnumOrSetValues() function used earlier to get the value list for the size column (an ENUM) can also be used for the accessory column (a SET). The column definition and default value can be obtained from INFORMATION_SCHEMA. Query the COLUMNS table, parse the type definition into a list of values named values, and put the default value in defList like this:

<sql:query dataSource="${conn}" var="rs">

SELECT COLUMN_TYPE, COLUMN_DEFAULT

FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_SCHEMA = 'cookbook'

AND TABLE_NAME = 'cow_order'

AND COLUMN_NAME = 'accessories'

</sql:query>

<c:set var="typeDef" scope="page" value="${rs.rowsByIndex[0][0]}"/>

<% getEnumOrSetValues (pageContext, "typeDef", "values"); %>

<c:set var="defList" scope="page" value="${rs.rowsByIndex[0][1]}"/>

For a SET column, the defList value might contain multiple values, separated by commas. It needs no special treatment; the JSTL <c:forEach> tag can iterate over such a string, so initialize the default values for a checkbox set as follows:

<c:forEach items="${values}" var="val">

<input type="checkbox" name="accessories"

value="<c:out value="${val}"/>"

<c:forEach items="${defList}" var="default">

<c:if test="${val == default}">checked="checked"</c:if>

</c:forEach>

/><c:out value="${val}"/><br />

</c:forEach>

For a multiple-pick scrolling list, do this:

<c:forEach items="${values}" var="val">

<option

value="<c:out value="${val}"/>"

<c:forEach items="${defList}" var="default">

<c:if test="${val == default}">selected="selected"</c:if>

</c:forEach>

<c:out value="${val}"/></option>

</c:forEach>

</select>

Loading a Database Record into a Form

Problem

You want to display a form but initialize it using the contents of a database record. This enables you to present a record-editing form.

Solution

Generate the form as you usually would, but populate it with database content. That is, instead of setting the form element defaults to their usual values, set them to the values of columns in the database record.

Discussion

The examples in earlier recipes that show how to generate form fields have either supplied no default value or have used the default value as specified in an ENUM or SET column definition as the field default. That’s most appropriate for presenting a “blank” form that you expect the user to fill in. However, for applications that present a web-based interface for record editing, it’s more likely that you’d want to fill in the form using the content of an existing record for the initial values. This section discusses how to do that.

The examples shown here illustrate how to generate an editing form for rows from the cow_order table. Normally, you would allow the user to specify which record to edit. For simplicity, assume the use of the record that has an id value of 1, with the following contents:

mysql>SELECT * FROM cow_order WHERE id = 1\G

*************************** 1. row ***************************

id: 1

color: Black & White

size: large

accessories: cow bell,nose ring

cust_name: Farmer Brown

cust_street: 123 Elm St.

cust_city: Katy

cust_state: TX

To generate a form with contents that correspond to a database record, use the column values for the element defaults as follows:

§ For <input> elements such as radio buttons or checkboxes, add a checked attribute to each list item that matches the column value.

§ For <select> elements such as pop-up menus or scrolling lists, add a selected attribute to each list item that matches the column value.

§ For text fields represented as <input> elements of type text, set the value attribute to the corresponding column value. For example, a 60-character field for cust_name can be presented initialized to Farmer Brown like this:

To present a <textarea> element instead, set the body to the column value. To create a field 40 columns wide and 3 rows high, write it like this:

Farmer Brown

</textarea>

§ In a record-editing situation, it’s a good idea to include a unique value in the form so that you can tell which record the form contents represent when the user submits it. Use a hidden field to do this. Its value is not displayed to the user, but the browser returns it with the rest of the field values. Our sample record has an id column with a value of 1, so the hidden field looks like this:

The following examples show how to produce a form with id represented as a hidden field, color as a pop-up menu, size as a set of radio buttons, and accessories as a set of checkboxes. The customer information values are represented as text input boxes, except that cust_state is a single-pick scrolling list. You could make other choices, of course, such as to present the sizes as a pop-up menu rather than as radio buttons.

The recipes distribution scripts for the examples in this section are named cow_edit.pl, cow_edit.jsp, and so forth.

The following procedure outlines how to load the sample cow_table record into an editing form for a CGI.pm-based Perl script:

1. Retrieve the column values for the record that you want to load into the form:

2. my $id = 1; # select record number 1

3. my ($color, $size, $accessories,

4. $cust_name, $cust_street, $cust_city, $cust_state) =

5. $dbh->selectrow_array (

6. "SELECT

7. color, size, accessories,

8. cust_name, cust_street, cust_city, cust_state

9. FROM cow_order WHERE id = ?",

undef, $id);

10.Begin the form:

print start_form (-action => url ());

11.Generate the hidden field containing the id value that uniquely identifies the cow_order record:

print hidden (-name => "id", -value => $id, -override => 1);

The override argument forces CGI.pm to use the value specified in the value argument as the hidden field value. If override is not true, CGI.pm normally tries to use values present in the script execution environment to initialize form fields, even if you provide values in the field-generating calls. (CGI.pm does this to make it easier to redisplay a form with the values the user just submitted. For example, if you find that a form has been filled in incorrectly, you can redisplay it and ask the user to correct any problems. To make sure that a form element contains the value you specify, it’s necessary to override this behavior.)

12.Create the fields that describe the cow figurine specifications. These fields are generated the same way as described in Recipes and , except that the default values come from the contents of record 1. The code here presents color as a pop-up menu, size as a set of radio buttons, andaccessories as a set of checkboxes. Note that it splits the accessories value at commas to produce an array of values, because the column value might name several accessory items:

13.my $color_ref = $dbh->selectcol_arrayref (

14. "SELECT color FROM cow_color ORDER BY color");

15.

16.print br (), "Cow color:", br ();

17.print popup_menu (-name => "color",

18. -values => $color_ref,

19. -default => $color,

20. -override => 1);

21.

22.my $size_info = get_enumorset_info ($dbh, "cookbook",

23. "cow_order", "size");

24.

25.print br (), "Cow figurine size:", br ();

26.print radio_group (-name => "size",

27. -values => $size_info->{values},

28. -default => $size,

29. -override => 1,

30. -linebreak => 1);

31.

32.my $acc_info = get_enumorset_info ($dbh, "cookbook",

33. "cow_order", "accessories");

34.my @acc_val = (defined ($accessories)

35. ? split (/,/, $accessories)

36. : () );

37.

38.print br (), "Cow accessory items:", br ();

39.print checkbox_group (-name => "accessories",

40. -values => $acc_info->{values},

41. -default => \@acc_val,

42. -override => 1,

-linebreak => 1);

43.Create the customer information fields. These are represented as text input fields, except the state, which is shown here as a single-pick scrolling list:

44.print br (), "Customer name:", br ();

45.print textfield (-name => "cust_name",

46. -value => $cust_name,

47. -override => 1,

48. -size => 60);

49.

50.print br (), "Customer street address:", br ();

51.print textfield (-name => "cust_street",

52. -value => $cust_street,

53. -override => 1,

54. -size => 60);

55.

56.print br (), "Customer city:", br ();

57.print textfield (-name => "cust_city",

58. -value => $cust_city,

59. -override => 1,

60. -size => 60);

61.

62.my @state_values;

63.my %state_labels;

64.my $sth = $dbh->prepare ("SELECT abbrev, name

65. FROM states ORDER BY name");

66.$sth->execute ();

67.while (my ($abbrev, $name) = $sth->fetchrow_array ())

68.{

69. push (@state_values, $abbrev); # save each value in an array

70. $state_labels{$abbrev} = $name; # map each value to its label

71.}

72.

73.print br (), "Customer state:", br ();

74.print scrolling_list (-name => "cust_state",

75. -values => \@state_values,

76. -labels => \%state_labels,

77. -default => $cust_state,

78. -override => 1,

-size => 6); # display 6 items at a time

79.Create a form submission button and terminate the form:

80.print br (),

81. submit (-name => "choice", -value => "Submit Form"),

end_form ();

The same general procedure applies to other APIs. For example, in a JSP page, you can fetch the record to be edited and extract its contents into scalar variables like this:

<c:set var="id" value="1"/>

<sql:query dataSource="${conn}" var="rs">

SELECT

id, color, size, accessories,

cust_name, cust_street, cust_city, cust_state

FROM cow_order WHERE id = ?

<sql:param value="${id}"/>

</sql:query>

<c:set var="row" value="${rs.rows[0]}"/>

<c:set var="id" value="${row.id}"/>

<c:set var="color" value="${row.color}"/>

<c:set var="size" value="${row.size}"/>

<c:set var="accessories" value="${row.accessories}"/>

<c:set var="cust_name" value="${row.cust_name}"/>

<c:set var="cust_street" value="${row.cust_street}"/>

<c:set var="cust_city" value="${row.cust_city}"/>

<c:set var="cust_state" value="${row.cust_state}"/>

Then use the values to initialize the various form elements, such as:

§ The hidden field for the ID value:

§ The color pop-up menu:

§ <sql:query dataSource="${conn}" var="rs">

§ SELECT color FROM cow_color ORDER BY color

§ </sql:query>

§ <br />Cow color:<br />

§ <select name="color">

§ <c:forEach items="${rs.rows}" var="row">

§ <option

§ value="<c:out value="${row.color}"/>"

§ <c:if test="${row.color == color}">selected="selected"</c:if>

§ ><c:out value="${row.color}"/></option>

§ </c:forEach>

</select>

§ The cust_name text field:

§ <br />Customer name:<br />

§ <input type="text" name="cust_name"

§ value="<c:out value="${cust_name}"/>"

size="60" />

For Ruby, PHP, or Python, create the form using the utility functions developed in Recipes and . See the cow_edit.rb, cow_edit.php, and cow_edit.py scripts for details.

Collecting Web Input

Problem

You want to extract the input parameters that were submitted as part of a form or specified at the end of a URL.

Solution

Use the capabilities of your API that provide a means of accessing the names and values of the input parameters in the execution environment of a web script.

Discussion

Earlier recipes in this chapter discussed how to retrieve information from MySQL and use it to generate various forms of output, such as static text, lists, hyperlinks, or form elements. In this recipe, we discuss the opposite problem—how to collect input from the Web. Applications for such input are many. For example, you can use the techniques shown here to extract the contents of a form submitted by a user. You might interpret the information as a set of search keywords, and then run a query against a product catalog to show the matching items to a customer. In this case, you use the Web to collect information from which you can determine the client’s interests. From that you construct an appropriate search statement and display the results. If a form represents a survey, a mailing list sign-up sheet, or a poll, you might just store the values, using the data to create a new database record (or perhaps to update an existing record).

A script that receives input over the Web and uses it to interact with MySQL generally processes the information in a series of stages:

1. Extract the input from the execution environment. When a request arrives that contains input parameters, the web server places the input into the environment of the script that handles the request, and the script queries its environment to obtain the parameters. It may be necessary to decode special characters in the parameters to recover the actual values submitted by the client, if the extraction mechanism provided by your API doesn’t do it for you. (For example, you might need to convert %20 to space.)

2. Validate the input to make sure that it’s legal. You cannot trust users to send legal values, so it’s a good idea to check input parameters to make sure they look reasonable. For example, if you expect a user to enter a number into a field, you should check the value to be sure that it’s really numeric. If a form contains a pop-up menu that was constructed using the allowable values of an ENUM column, you might expect the value that you actually get back to be one of these values. But there’s no way to be sure except to check. Remember, you don’t even know that there is a real user on the other end of the network connection. It might be a malicious script roving your web site, trying to hack into your site by exploiting weaknesses in your form-processing code.

If you don’t check your input, you run the risk of entering garbage into your database or corrupting existing content. It is true that you can prevent entry of values that are invalid for the data types in your table columns by enabling strict SQL mode. However, there might be additional semantic constraints on what your application considers legal, in which case it is still useful to check values in your script before attempting to enter them. Also, by checking in your script, you may be able to present more meaningful error messages to users about problems in the input than the messages returned by the MySQL server when it detects bad data. For these reasons, it might be best to consider strict SQL mode a valuable additional level of protection, but one that is not necessarily sufficient in itself. That is, you can combine strict mode on the server side with client-side validation. See Using the SQL Mode to Control Bad Input Data Handling for additional information about setting the SQL mode for strict input value checking.

3. Construct a statement based on the input. Typically, input parameters are used to add a record to a database, or to retrieve information from the database for display to the client. Either way, you use the input to construct a statement and send it to the MySQL server. Statement construction based on user input should be done with care, using proper escaping to avoid creating malformed or dangerous SQL statements. Use of placeholders is a good idea here.

The rest of this recipe explores the first of these three stages of input processing. Recipes and cover the second and third stages. The first stage (pulling input from the execution environment) has little to do with MySQL, but is covered here because it’s how you obtain the information that is used in the later stages.

Input obtained over the Web can be received in several ways, two of which are most common:

§ As part of a get request, in which case input parameters are appended to the end of the URL. For example, the following URL invokes a PHP script named price_quote.php and specifies item and quantity parameters with values D-0214 and 60:

http://localhost/mcb/price_quote.php?item=D-0214&quantity=60

Such requests commonly are received when a user selects a hyperlink or submits a form that specifies method="get" in the <form> tag. A parameter list in a URL begins with ? and consists of name = value pairs separated by ; or & characters. (It’s also possible to place information in the middle of a URL, but this book doesn’t cover that.)

§ As part of a post request, such as a form submission that specifies method="post" in the <form> tag. The contents of a form for a post request are sent as input parameters in the body of the request, rather than at the end of the URL.

You may also have occasion to process other types of input, such as uploaded files. Those are sent using post requests, but as part of a special kind of form element. Processing File Uploads discusses file uploading.

When you gather input for a web script, you should consider how the input was sent. (Some APIs distinguish between input sent via get and post; others do not.) However, after you have pulled out the information that was sent, the request method doesn’t matter. The validation and statement construction stages do not need to know whether parameters were sent using get or post.

The recipes distribution includes some scripts in the apache/params directory (tomcat/mcb for JSP) that process input parameters. Each script enables you to submit get or post requests, and shows how to extract and display the parameter values thus submitted. Examine these scripts to see how the parameter extraction methods for the various APIs are used. Utility routines invoked by the scripts can be found in the library modules in the lib directory of the distribution.

Web input extraction conventions

To obtain input parameters passed to a script, you should familiarize yourself with your API’s conventions so that you know what it does for you, and what you must do yourself. For example, you should know the answers to these questions:

§ How do you determine which parameters are available?

§ How do you pull a parameter value from the environment?

§ Are values thus obtained the actual values submitted by the client, or do you need to decode them further?

§ How are multiple-valued parameters handled (for example, when several items in a checkbox group are selected)?

§ For parameters submitted in a URL, which separator character does the API expect between parameters? This may be & for some APIs and ; for others. ; is preferable as a parameter separator because it’s not special in HTML like & is, but many browsers or other user agents separate parameters using &. If you construct a URL within a script that includes parameters at the end, be sure to use a parameter-separator character that the receiving script will understand.

Perl. The Perl CGI.pm module makes input parameters available to scripts through the param() function. param() provides access to input submitted via either get or post, which simplifies your task as the script writer. You don’t need to know which method a form used for submitting parameters. You don’t need to perform any decoding, either; param() handles that as well.

To obtain a list of names of all available parameters, call param() with no arguments:

@param_names = param ();

To obtain the value of a specific parameter, pass its name to param(). In scalar context, param() returns the parameter value if it is single-valued, the first value if it is multiple-valued, or undef if the parameter is not available. In array context, param() returns a list containing all the parameter’s values, or an empty list if the parameter is not available:

$id = param ("id");

@options = param ("options");

A parameter with a given name might not be available if the form field with that name was left blank, or if there isn’t any field with that name. Note too that a parameter value may be defined but empty. For good measure, you may want to check both possibilities. For example, to check for anage parameter and assign a default value of unknown if the parameter is missing or empty, you can do this:

$age = param ("age");

$age = "unknown" if !defined ($age) || $age eq "";

CGI.pm understands both ; and & as URL parameter separator characters.

Ruby. For Ruby scripts that use the cgi module, web script parameters are available through the same cgi object that you create for generating HTML elements. Its param method returns a hash of parameter names and values, so you can access this hash or get the parameter names as follows:

params = cgi.params

param_names = cgi.params.keys

The value of a particular parameter is accessible either through the hash of parameter names and values or directly through the cgi object:

id = cgi.params["id"]

id = cgi["id"]

The two access methods differ slightly. The params method returns each parameter value as an array. The array contains multiple entries if the parameter has multiple values, and is empty if the parameter is not present. The cgi object returns a single string. If the parameter has multiple values, only the first is returned. If the parameter is not present, the value is the empty string. For either access method, use the has_key? method to test whether a parameter is present.

The following listing shows how to get the parameter names and loop through each parameter to print its name and value, printing multiple-valued parameters as a comma-separated list:

params = cgi.params

param_names = cgi.params.keys

param_names.sort!

page << cgi.p { "Parameter names:" + param_names.join(", ") }

list = ""

param_names.each do |name|

val = params[name]

list << cgi.li {

"type=#{val.class}, name=#{name}, value=" +

CGI.escapeHTML(val.join(", "))

}

end

page << cgi.ul { list }

The cgi module understands both ; and & as URL parameter separator characters.

PHP. Input parameters can be available to PHP in several ways, depending on your configuration settings:

§ If the track_vars variable is on, parameters are available in the $HTTP_GET_VARS and $HTTP_POST_VARS arrays. For example, if a form contains a field named id, the value will be available as $HTTP_GET_VARS["id"] or $HTTP_POST_VARS["id"], depending on whether the form was submitted via GET or POST. If you access $HTTP_GET_VARS and $HTTP_POST_VARS in a nonglobal scope, such as within a function, you must declare them using the global keyword to make them accessible.

§ As of PHP 4.1, parameters also are available in the $_GET and $_POST arrays if track_vars is on. These are analogous to $HTTP_GET_VARS and $HTTP_POST_VARS except that they are “superglobal” arrays that are automatically available in any scope. (For example, it is unnecessary to declare $_GET and $_POST with global inside of functions.) The $_GET and $_POST arrays are the preferred means of accessing input parameters.

§ If the register_globals variable is on, parameters are assigned to global variables of the same name. In this case, the value of a field named id will be available as the variable $id, regardless of whether the request was sent via GET or POST. This is dangerous, for reasons described shortly.

The track_vars and register_globals settings can be compiled into PHP or configured in the PHP php.ini file. track_vars is always enabled as of PHP 4.0.3, so I’ll assume that this is true for your PHP installation.

register_globals makes it convenient to access input parameters through global variables, but the PHP developers recommend that it be disabled for security reasons. Suppose that you write a script that requires the user to supply a password, which is represented by the $passwordvariable. In the script, you might check the password like this:

if (check_password ($password))

$password_is_ok = 1;

The intent here is that if the password matches, the script sets $password_is_ok to 1. Otherwise $password_is_ok is left unset (which compares false in Boolean expressions). But suppose that someone invokes your script as follows:

http://your.host.com/chkpass.php?password_is_ok=1

If register_globals is enabled, PHP sees that the password_is_ok parameter is set to 1 and sets the corresponding $password_is_ok variable to 1. The result is that when your script executes, $password_is_ok is 1 no matter what password was given, or even if no password was given! The problem with register_globals is that it enables outside users to supply default values for global variables in your scripts. The best solution is to disable register_globals, in which case you need to check the global arrays for input parameter values. If you cannot disableregister_globals, take care not to assume that PHP variables have no value initially. Unless you’re expecting a global variable to be set from an input parameter, it’s best to initialize it explicitly to a known value. The password-checking code should be written as follows to make sure that only $password (and not $password_is_ok) can be set from an input parameter. That way, $password_is_ok is assigned a value by the script itself whatever the result of the test:

$password_is_ok = 0;

if (check_password ($password))

$password_is_ok = 1;

The PHP scripts in this book do not rely on the register_globals setting. Instead, they obtain input through the global parameter arrays.

Another complicating factor when retrieving input parameters in PHP is that they may need some decoding, depending on the value of the magic_quotes_gpc configuration variable. If magic quotes are enabled, any quote, backslash, and NUL characters in input parameter values accessed by your scripts will be escaped with backslashes. I suppose that this is intended to save you a step by allowing you to extract values and use them directly in SQL statement strings. However, that’s only useful if you plan to use web input in a statement with no preprocessing or validity checking, which is dangerous. You should check your input first, in which case it’s necessary to strip out the slashes, anyway. This means that having magic quotes turned on isn’t really very useful.

Given the various sources through which input parameters may be available, and the fact that they may or may not contain extra backslashes, extracting input in PHP scripts can be an interesting problem. If you have control of your server and can set the values of the various configuration settings, you can of course write your scripts based on those settings. But if you do not control your server or are writing scripts that need to run on several machines, you may not know in advance what the settings are. Fortunately, with a bit of effort it’s possible to write reasonably general-purpose parameter extraction code that works correctly with very few assumptions about your PHP operating environment. The following utility function, get_param_val(), takes a parameter name as its argument and returns the corresponding parameter value. If the parameter is not available, the function returns an unset value. (get_param_val() uses a helper function, strip_slash_helper(), which is discussed shortly.)

function get_param_val ($name)

{

global $HTTP_GET_VARS, $HTTP_POST_VARS;

$val = NULL;

if (isset ($_GET[$name]))

$val = $_GET[$name];

else if (isset ($_POST[$name]))

$val = $_POST[$name];

else if (isset ($HTTP_GET_VARS[$name]))

$val = $HTTP_GET_VARS[$name];

else if (isset ($HTTP_POST_VARS[$name]))

$val = $HTTP_POST_VARS[$name];

if (isset ($val) && get_magic_quotes_gpc ())

$val = strip_slash_helper ($val);

return ($val);

}

To use this function to obtain the value of a single-valued parameter named id, call it like this:

$id = get_param_val ("id");

You can test $id to determine whether the id parameter was present in the input:

if (isset ($id))

... id parameter is present ...

else

... id parameter is not present ...

For a form field that might have multiple values (such as a checkbox group or a multiple-pick scrolling list), you should represent it in the form using a name that ends in []. For example, a list element constructed from the SET column accessories in the cow_order table has one item for each allowable set value. To make sure PHP treats the element value as an array, name the field accessories[], not accessories. (See Creating Multiple-Pick Form Elements from Database Content for an example.) When the form is submitted, PHP places the array of values in a parameter named without the []. To access it, do this:

$accessories = get_param_val ("accessories");

The value of the $accessories variable will be an array, whether the parameter has multiple values, a single value, or even no values. The determining factor is not whether the parameter actually has multiple values, but whether you named the corresponding field in the form using []notation.

The get_param_val() function checks the $_GET, $_POST, $HTTP_GET_VARS, and $HTTP_POST_VARS arrays for parameter values. Thus, it works correctly regardless of whether the request was made by GET or POST, or whether register_globals is enabled. The only thing that the function assumes is that track_vars is enabled.

get_param_val() also works correctly regardless of whether magic quoting is enabled. It uses a helper function strip_slash_helper() that performs backslash stripping from parameter values if necessary:

function strip_slash_helper ($val)

{

if (!is_array ($val))

$val = stripslashes ($val);

else

{

foreach ($val as $k => $v)

$val[$k] = strip_slash_helper ($v);

}

return ($val);

}

strip_slash_helper() checks whether a value is a scalar or an array and processes it accordingly. The reason it uses a recursive algorithm for array values is that in PHP it’s possible to create nested arrays from input parameters.

To make it easy to obtain a list of all parameter names, use another utility function:

function get_param_names ()

{

global $HTTP_GET_VARS, $HTTP_POST_VARS;

# construct an array in which each element has a parameter name as

# both key and value. (Using names as keys eliminates duplicates.)

$keys = array ();

if (isset ($_GET))

{

foreach ($_GET as $k => $v)

$keys[$k] = $k;

}

else if (isset ($HTTP_GET_VARS))

{

foreach ($HTTP_GET_VARS as $k => $v)

$keys[$k] = $k;

}

if (isset ($_POST))

{

foreach ($_POST as $k => $v)

$keys[$k] = $k;

}

else if (isset ($HTTP_POST_VARS))

{

foreach ($HTTP_POST_VARS as $k => $v)

$keys[$k] = $k;

}

return ($keys);

}

get_param_names() returns a list of parameter names present in the HTTP variable arrays, with duplicate names removed if there is overlap between the arrays. The return value is an array with its keys and values both set to the parameter names. This way you can use either the keys or the values as the list of names. The following example prints the names, using the values:

$param_names = get_param_names ();

foreach ($param_names as $k => $v)

print (htmlspecialchars ($v) . "<br />\n");

To construct URLs that point to PHP scripts and that have parameters at the end, you should separate the parameters by & characters. To use a different character (such as ;), change the separator by means of the arg_separator configuration variable in the PHP initialization file.

Python. The Python cgi module provides access to the input parameters that are present in the script environment. Import that module, and then create a FieldStorage object:

import cgi

params = cgi.FieldStorage ()

The FieldStorage object contains information for parameters submitted via either GET or POST requests, so you need not know which method was used to send the request. The object also contains an element for each parameter present in the environment. Its key() method returns a list of available parameter names:

param_names = params.keys ()

If a given parameter, name, is single-valued, the value associated with it is a scalar that you can access as follows:

val = params[name].value

If the parameter is multiple-valued, params[name] is a list of MiniFieldStorage objects that have name and value attributes. Each of these has the same name (it will be equal to name) and one of the parameter’s values. To create a list containing all the values for such a parameter, do this:

val = []

for item in params[name]:

val.append (item.value)

You can distinguish single-valued from multiple-valued parameters by checking the type. The following listing shows how to get the parameter names and loop through each parameter to print its name and value, printing multiple-valued parameters as a comma-separated list. This code requires that you import the types module in addition to the cgi module.

params = cgi.FieldStorage ()

param_names = params.keys ()

param_names.sort ()

print "<p>Parameter names:", param_names, "</p>"

items = []

for name in param_names:

if type (params[name]) is not types.ListType: # it's a scalar

ptype = "scalar"

val = params[name].value

else: # it's a list

ptype = "list"

val = []

for item in params[name]: # iterate through MiniFieldStorage

val.append (item.value) # items to get item values

val = ",".join (val) # convert to string for printing

items.append ("type=" + ptype + ", name=" + name + ", value=" + val)

print make_unordered_list (items)

Python raises an exception if you try to access a parameter that is not present in the FieldStorage object. To avoid this, use has_key() to find out if the parameter exists:

if params.has_key (name):

print "parameter " + name + " exists"

else:

print "parameter " + name + " does not exist"

Single-valued parameters have attributes other than value. For example, a parameter representing an uploaded file has additional attributes you can use to get the file’s contents. Processing File Uploads discusses this further.

The cgi module expects URL parameters to be separated by & characters. If you generate a hyperlink to a Python script based on the cgi module and the URL includes parameters, don’t separate them by ; characters.

Java. Within JSP pages, the implicit request object provides access to the request parameters through the following methods:

getParameterNames()

Returns an enumeration of String objects, one for each parameter name present in the request.

getParameterValues(String name)

Returns an array of String objects, one for each value associated with the parameter, or null if the parameter does not exist.

getParameterValue(String name)

Returns the first value associated with the parameter, or null if the parameter does not exist.

The following example shows one way to use these methods to display request parameters:

<%@ page import="java.util.*" %>

<ul>

Enumeration e = request.getParameterNames ();

while (e.hasMoreElements ())

{

String name = (String) e.nextElement ();

// use array in case parameter is multiple-valued

String[] val = request.getParameterValues (name);

out.println ("<li> name: " + name + "; values:");

for (int i = 0; i < val.length; i++)

out.println (val[i]);

out.println ("</li>");

}

</ul>

Request parameters are also available within JSTL tags, using the special variables param and paramValues. param[name] returns the first value for a given parameter and thus is most suited for single-valued parameters:

color value:

<c:out value="${param['color']}"/>

paramValues[name] returns an array of values for the parameter, so it’s useful for parameters that can have multiple values:

accessory values:

<c:forEach items="${paramValues['accessories']}" var="val">

<c:out value="${val}"/>

</c:forEach>

If a parameter name is legal as an object property name, you can also access the parameter using dot notation:

color value:

<c:out value="${param.color}"/>

accessory values:

<c:forEach items="${paramValues.accessories}" var="val">

<c:out value="${val}"/>

</c:forEach>

To produce a list of parameter objects with key and value attributes, iterate over the paramValues variable:

<ul>

<c:forEach items="${paramValues}" var="p">

<li>

name:

<c:out value="${p.key}"/>;

values:

<c:forEach items="${p.value}" var="val">

<c:out value="${val}"/>

</c:forEach>

</li>

</c:forEach>

</ul>

To construct URLs that point to JSP pages and that have parameters at the end, you should separate the parameters by & characters.

Validating Web Input

Problem

After extracting the parameters supplied to a script, you want to check them to be sure they’re valid.

Solution

Web input processing is one form of data import, so after you’ve extracted the input parameters, you can validate them using the techniques discussed in Chapter 10.

Discussion

One phase of web form processing is to extract the input that comes back when the user submits the form. It’s also possible to receive input in the form of parameters at the end of a URL. Either way, if you’re going to store the input in your database, it’s a good idea to check it to be sure that it’s valid.

When clients send input to you over the Web, you don’t really know what they’re sending. If you present a form for users to fill out, most of the time they’ll probably be nice and enter the kinds of values you expect. But a malicious user can save the form to a file, modify the file to allow form options you don’t intend, reload the file into a browser window, and submit the modified form. Your form-processing script won’t know the difference. If you write it only to process the kinds of values that well-intentioned users will submit, the script may misbehave or crash when presented with unexpected input—or perhaps even do bad things to your database. (Storing Web Input in a Database discusses what kinds of bad things can happen.) For this reason, it’s prudent to perform some validity checking on web input before using it to construct SQL statements.

Preliminary checking is a good idea even for nonmalicious users. If you require a field to be filled in and the user forgets to provide a value, you’ll need to remind the user to supply one. This might involve a simple “Is the parameter present?” check, or it might be more involved. Typical validation operations include the following:

§ Checking content format, such as making sure a value looks like an integer or a date. This may involve some reformatting for acceptability to MySQL (for example, changing a date from MM/DD/YY to ISO format).

§ Determining whether a value is a member of a legal set of values. Perhaps the value must be listed in the definition for an ENUM or SET column, or must be present in a lookup table.

§ Filtering out extraneous characters such as spaces or dashes from telephone numbers or credit card numbers.

Some of these operations have little to do with MySQL, except in the sense that you want values to be appropriate for the types of the columns in which you’ll store them or against which you’ll match them. For example, if you’re going to store a value in an INT column, you can make sure that it’s an integer first, using a test like this (shown here using Perl):

$val =~ /^\d+$/

or die "Hey! '" . escapeHTML ($val) . "' is not an integer!\n";

For other types of validation, MySQL is intimately involved. If a field value is to be stored into an ENUM column, you can make sure the value is one of the legal enumeration values by checking the column definition in INFORMATION_SCHEMA.

Having described some of the kinds of web input validation you might want to carry out, I won’t further discuss them here. These and other forms of validation testing are described in Chapter 10. That chapter is oriented largely toward bulk input validation, but the techniques discussed there apply to web programming as well because processing form input or URL parameters is in essence a data import operation.

Storing Web Input in a Database

Problem

Input obtained over the Web cannot be trusted and should not be entered into a database without taking the proper precautions.

Solution

Sanitize data values by using placeholders or a quoting function so that SQL statements you construct are valid and not subject to SQL injection attacks. Also, enabling strict SQL mode causes the MySQL server to reject values that are invalid for column data types.

Discussion

After you’ve extracted input parameter values in a web script and checked them to make sure they’re valid, you’re ready to use them to construct an SQL statement. This is actually the easy part of input processing, although it’s necessary to take the proper precautions to avoid making a mistake that you’ll regret. First, let’s consider what can go wrong, and then see how to prevent problems.

Suppose that you have a search form that contains a keyword field and acts as a frontend to a simple search engine. When a user submits a keyword, you intend to use it to find matching rows in a table by constructing a statement like this:

SELECT * FROM mytbl WHERE keyword = 'keyword_val'

Here, keyword_val represents the value entered by the user. If the value is something like eggplant, the resulting statement is:

SELECT * FROM mytbl WHERE keyword = 'eggplant'

The statement returns all eggplant-matching rows, presumably generating a small result set. But suppose that the user is tricky and tries to subvert your script by entering the following value:

eggplant' OR 'x'='x

In this case, the statement becomes:

SELECT * FROM mytbl WHERE keyword = 'eggplant' OR 'x'='x'

That statement matches every row in the table! If the table is quite large, the input effectively becomes a form of denial-of-service attack, because it causes your system to divert resources away from legitimate requests into doing useless work. This type of attack is known as SQL injectionbecause the user is injecting executable SQL code into your statement where you expect to receive only a nonexecutable data value. Likely results of SQL injection attacks include the following:

§ Extra load on the MySQL server

§ Out-of-memory problems in your script as it tries to digest the result set received from MySQL

§ Extra network bandwidth consumption as the script sends the results to the client

If your script generates a DELETE or UPDATE statement, the consequences of this kind of subversion can be much worse. Your script might issue a statement that empties a table completely or changes all of its rows, when you intended to allow it to affect only a single row.

The implication of the preceding discussion is that providing a web interface to your database opens you up to certain forms of security vulnerabilities. However, you can prevent these problems by means of a simple precaution that you should already be following: don’t put data values literally into statement strings. Use placeholders or an encoding function instead. For example, in Perl you can handle an input parameter by using a placeholder:

$sth = $dbh->prepare ("SELECT * FROM mytbl WHERE keyword = ?");

$sth->execute (param ("keyword"));

# ... fetch result set ...

Or by using quote():

$keyword = $dbh->quote (param ("keyword"));

$sth = $dbh->prepare ("SELECT * FROM mytbl WHERE keyword = $keyword");

$sth->execute ();

# ... fetch result set ...

Either way, if the user enters the subversive value, the statement becomes harmless:

SELECT * FROM mytbl WHERE keyword = 'eggplant\' OR \'x\'=\'x'

As a result, the statement matches no rows rather than all rows—definitely a more suitable response to someone who’s trying to break your script.

Placeholder and quoting techniques for Ruby, PHP, Python, and Java are similar, and have been discussed in Handling Special Characters and NULL Values in Statements . For JSP pages written using the JSTL tag library, you can quote input parameter values using placeholders and the<sql:param> tag (Using Tomcat to Run Web Scripts). For example, to use the value of a form parameter named keyword in a SELECT statement, do this:

<sql:query dataSource="${conn}" var="rs">

SELECT * FROM mytbl WHERE keyword = ?

<sql:param value="${param['keyword']}"/>

</sql:query>

One issue not covered by placeholder techniques involves a question of interpretation: If a form field is optional, what should you store in the database if the user leaves the field empty? Perhaps the value represents an empty string—or perhaps it should be interpreted as NULL. One way to resolve this question is to consult the column metadata. If the column can contain NULL values, interpret an empty field as NULL. Otherwise, take an empty field to mean an empty string.

Placeholders and encoding functions apply only to SQL data values. One issue not addressed by them is how to handle web input used for other kinds of statement elements such as the names of databases, tables, and columns. If you intend to include such values into a statement, you must insert them literally, which means you should check them first. For example, if you construct a statement such as the following, you should verify that $tbl_name contains a reasonable value:

SELECT * FROM $tbl_name;

But what does “reasonable” mean? If you don’t have tables containing strange characters in their names, it may be sufficient to make sure that $tbl_name contains only alphanumeric characters or underscores. An alternative is to issue a statement that determines whether the table actually exists. (You can check INFORMATION_SCHEMA or use SHOW TABLES.) This is more foolproof, at the cost of an additional statement.

A better option is to use an identifier-quoting routine, if you have one. This approach requires no extra statement because it renders any string safe for use in a statement. If the identifier does not exist, the statement simply fails as it should. Handling Special Characters in Identifiers discusses this option further.

For additional protection in your web scripts, combine client-side checking of input values with strict server-side checking. You can set the server SQL mode to be restrictive about accepting input values so that it rejects values that don’t match your table column data types. For discussion of the SQL mode and input value checking, see Using the SQL Mode to Control Bad Input Data Handling.

YOU SHOULD TRY TO BREAK YOUR SCRIPTS

The discussion in this recipe has been phrased in terms of guarding against other users from attacking your scripts. But it’s not a bad idea to put yourself in the place of an attacker and adopt the mindset, “How can I break this application?” That is, consider whether there is some input you can submit to it that the application won’t handle, and that will cause it to generate a malformed statement. If you can cause an application to misbehave, so can other people, either deliberately or accidentally. Be wary of bad input, and write your applications accordingly. It’s better to be prepared than to just hope.

See Also

Several other recipes in this chapter illustrate how to incorporate web input into statements. Processing File Uploads shows how to upload files and load them into MySQL. Performing Searches and Presenting the Results demonstrates a simple search application using input as search keywords. Recipes and process parameters submitted via URLs.

Processing File Uploads

Problem

You want to allow files to be uploaded to your web server and stored in your database.

Solution

Present the user with a web form that includes a file field. When the user submits the form, extract the file and store it.

Discussion

One special kind of web input is an uploaded file. A file is sent as part of a post request, but it’s handled differently from other post parameters, because a file is represented by several pieces of information such as its contents, its MIME type, its original filename on the client, and its name in temporary storage on the web server host.

To handle file uploads, you must send a special kind of form to the user; this is true no matter what API you use to create the form. When the user submits the form, the operations that check for and process an uploaded file are API-specific.

To create a form that enables files to be uploaded, the opening <form> tag should specify the post method and must also include an enctype (encoding type) attribute with a value of multipart/form-data:

If the form is submitted using the application/x-www-form-urlencoded encoding type, file uploads will not work properly.

To include a file upload field in the form, use an <input> element of type file. For example, to present a 60-character file field named upload_file, the element looks like this:

The browser displays this field as a text input box into which the user can enter the name manually. It also presents a Browse button that enables the user to select the file via a standard file-browsing system dialog. When the user chooses a file and submits the form, the browser encodes the file contents for inclusion into the resulting post request. At that point, the web server receives the request and invokes your script to process it. The specifics vary for particular APIs, but file uploads generally work like this:

§ The file will already have been uploaded and stored in a temporary directory by the time your upload-handling script begins executing. All your script has to do is read it. The temporary file is available to your script either as an open file descriptor or the temporary filename, or perhaps both. The size of the file can be obtained through the file descriptor. The API may also make available other information about the file, such as its MIME type. (But note that some browsers may not send a MIME value.)

§ The web server automatically deletes uploaded files when your script terminates. If you want a file’s contents to persist beyond the end of your script’s execution, the script must save the file to a more permanent location, such as in a database or somewhere else in the filesystem. If you save the file in the filesystem, the directory where you store it must be accessible to the web server. (But don’t put it under the document root or any user home directories, or a remote attacker would effectively be able to install scripts and HTML files on your web server.)

§ The API might enable you to control the location of the temporary file directory or the maximum size of uploaded files. Changing the directory to one that is accessible only to your web server may improve security a bit against local exploits by other users with login accounts on the server host.

This recipe discusses how to create forms that include a file upload field. It also demonstrates how to handle uploads using a Perl script, post_image.pl. The script is somewhat similar to the store_image.pl script for loading images from the command line (Storing Images or Other Binary Data). post_image.pl differs in that it enables you to store images over the Web by uploading them, and it stores images only in MySQL, whereas store_image.pl stores them in both MySQL and the filesystem.

This recipe also discusses how to obtain file upload information using PHP and Python. It does not repeat the entire image-posting scenario shown for Perl, but the recipes distribution contains equivalent implementations of post_image.pl for the other languages.

Uploads in Perl

You can specify multipart encoding for a form several ways using the CGI.pm module. The following statements are equivalent:

print start_form (-action => url (), -enctype => "multipart/form-data");

print start_form (-action => url (), -enctype => MULTIPART ());

print start_multipart_form (-action => url ());

The first statement specifies the encoding type literally. The second uses the CGI.pm MULTIPART() function, which is easier than trying to remember the literal encoding value. The third statement is easiest of all, because start_multipart_form() supplies the enctype parameter automatically. (Like start_form(), start_multipart_form() uses a default request method of post, so you need not include a method argument.)

Here’s a simple form that includes a text field that enables the user to assign a name to an image, a file field so that the user can select the image file, and a Submit button:

print start_multipart_form (-action => url ()),

"Image name:", br (),

textfield (-name =>"image_name", -size => 60),

br (), "Image file:", br (),

filefield (-name =>"upload_file", -size => 60),

br (), br (),

submit (-name => "choice", -value => "Submit"),

end_form ();

When the user submits an uploaded file, begin processing it by extracting the parameter value for the file field:

$file = param ("upload_file");

The value for a file upload parameter is special in CGI.pm because you can use it two ways. You can treat it as an open file handle to read the file’s contents or pass it to uploadInfo() to obtain a reference to a hash that provides information about the file such as its MIME type. The following listing shows how post_image.pl presents the form and processes a submitted form. When first invoked, post_image.pl generates a form with an upload field. For the initial invocation, no file will have been uploaded, so the script does nothing else. If the user submitted an image file, the script gets the image name, reads the file contents, determines its MIME type, and stores a new row in the image table. For illustrative purposes, post_image.pl also displays all the information that the uploadInfo() function makes available about the uploaded file.

#!/usr/bin/perl

# post_image.pl - allow user to upload image files via post requests

use strict;

use warnings;

use CGI qw(:standard escapeHTML);

use Cookbook;

print header (), start_html (-title => "Post Image", -bgcolor => "white");

# Use multipart encoding because the form contains a file upload field

print start_multipart_form (-action => url ()),

"Image name:", br (),

textfield (-name =>"image_name", -size => 60),

br (), "Image file:", br (),

filefield (-name =>"upload_file", -size => 60),

br (), br (),

submit (-name => "choice", -value => "Submit"),

end_form ();

# Get a handle to the image file and the name to assign to the image

my $image_file = param ("upload_file");

my $image_name = param ("image_name");

# Must have either no parameters (in which case that script was just

# invoked for the first time) or both parameters (in which case the form

# was filled in). If only one was filled in, the user did not fill in the

# form completely.

my $param_count = 0;

++$param_count if defined ($image_file) && $image_file ne "";

++$param_count if defined ($image_name) && $image_name ne "";

if ($param_count == 0) # initial invocation

{

print p ("No file was uploaded.");

}

elsif ($param_count == 1) # incomplete form

{

print p ("Please fill in BOTH fields and resubmit the form.");

}

else # a file was uploaded

{

my ($size, $data);

# If an image file was uploaded, print some information about it,

# then save it in the database.

# Get reference to hash containing information about file

# and display the information in "key=x, value=y" format

my $info_ref = uploadInfo ($image_file);

print p ("Information about uploaded file:");

foreach my $key (sort (keys (%{$info_ref})))

{

printf p ("key="

. escapeHTML ($key)

. ", value="

. escapeHTML ($info_ref->{$key}));

}

$size = (stat ($image_file))[7]; # get file size from file handle

print p ("File size: " . $size);

binmode ($image_file); # helpful for binary data

if (sysread ($image_file, $data, $size) != $size)

{

print p ("File contents could not be read.");

}

else

{

print p ("File contents were read without error.");

# Get MIME type, use generic default if not present

my $mime_type = $info_ref->{'Content-Type'};

$mime_type = "application/octet-stream" unless defined ($mime_type);

# Save image in database table. (Use REPLACE to kick out any

# old image that has the same name.)

my $dbh = Cookbook::connect ();

$dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)",

undef,

$image_name, $mime_type, $data);

$dbh->disconnect ();

}

print end_html ();

Uploads in PHP

To write an upload form in PHP, include a file field. If you like, you can also include a hidden field preceding the file field that has a name of MAX_FILE_SIZE and a value of the largest file size that you’re willing to accept:

<form method="post" enctype="multipart/form-data"

action="<?php print (get_self_path ()); ?>">

Image name:<br />

<br />

Image file:<br />

</form>

Be aware that MAX_FILE_SIZE is advisory only, because it can be subverted easily. To specify a value that cannot be exceeded, use the upload_max_filesize configuration variable in the php.ini PHP configuration file. There is also a file_uploads variable that controls whether file uploads are allowed at all.

When the user submits the form, file upload information may be obtained as follows:

§ As of PHP 4.1, file upload information from post requests is placed in an array, $_FILES, that has one entry for each uploaded file. Each entry is itself an array with four elements. For example, if a form has a file field named upload_file and the user submits a file, information about it is available in the following variables:

§ $_FILES["upload_file]["name"]

§ $_FILES["upload_file]["tmp_name"]

§ $_FILES["upload_file]["size"]

$_FILES["upload_file]["type"]

These variables represent the original filename on the client host, the temporary filename on the server host, the file size in bytes, and the file MIME type. Be careful here, because there may be an entry for an upload field even if the user submitted no file. In this case, the tmp_name value will be the empty string or the string none.

§ Earlier PHP 4 releases have file upload information in an array, $HTTP_POST_FILES, that has entries that are structured like those in $_FILES. For a file field named upload_file, information about it is available in the following variables:

§ $HTTP_POST_FILES["upload_file]["name"]

§ $HTTP_POST_FILES["upload_file]["tmp_name"]

§ $HTTP_POST_FILES["upload_file]["size"]

$HTTP_POST_FILES["upload_file]["type"]

$_FILES is a superglobal array (global in any scope). $HTTP_POST_FILES must be declared with the global keyword if used in a nonglobal scope, such as within a function.

To avoid fooling around figuring out which array contains file upload information, it makes sense to write a utility routine that does all the work. The following function, get_upload_info(), takes an argument corresponding to the name of a file upload field. Then it examines the$_FILES and $HTTP_POST_FILES arrays as necessary and returns an associative array of information about the file, or a NULL value if the information is not available. For a successful call, the array element keys are "tmp_name", "name", "size", and "type".

function get_upload_info ($name)

{

global $HTTP_POST_FILES, $HTTP_POST_VARS;

# Look for information in PHP 4.1 $_FILES array first.

# Check the tmp_name member to make sure there is a file. (The entry

# in $_FILES might be present even if no file was uploaded.)

if (isset ($_FILES))

{

if (isset ($_FILES[$name])

&& $_FILES[$name]["tmp_name"] != ""

&& $_FILES[$name]["tmp_name"] != "none")

return ($_FILES[$name]);

}

# Look for information in PHP 4 $HTTP_POST_FILES array next.

else if (isset ($HTTP_POST_FILES))

{

if (isset ($HTTP_POST_FILES[$name])

&& $HTTP_POST_FILES[$name]["tmp_name"] != ""

&& $HTTP_POST_FILES[$name]["tmp_name"] != "none")

return ($HTTP_POST_FILES[$name]);

}

return (NULL);

}

See the post_image.php script for details about how to use this function to get image information and store it in MySQL.

The upload_tmp_dir PHP configuration variable controls where uploaded files are saved. This is /tmp by default on many systems, but you may want to override it to reconfigure PHP to use a different directory that’s owned by the web server user ID and thus more private.

Uploads in Python

A simple upload form in Python can be written like this:

print """

Image name:<br />

<br />

Image file:<br />

</form>

""" % (os.environ["SCRIPT_NAME"])

When the user submits the form, its contents can be obtained using the FieldStorage() method of the cgi module (Collecting Web Input). The resulting object contains an element for each input parameter. For a file upload field, you get this information as follows:

form = cgi.FieldStorage ()

if form.has_key ("upload_file") and form["upload_file"].filename != "":

image_file = form["upload_file"]

else:

image_file = None

According to most of the documentation that I have read, the file attribute of an object that corresponds to a file field should be true if a file has been uploaded. Unfortunately, the file attribute seems to be true even when the user submits the form but leaves the file field blank. It may even be the case that the type attribute is set when no file actually was uploaded (for example, to application/octet-stream). In my experience, a more reliable way to determine whether a file really was uploaded is to test the filename attribute:

form = cgi.FieldStorage ()

if form.has_key ("upload_file") and form["upload_file"].filename:

print "<p>A file was uploaded</p>"

else:

print "<p>A file was not uploaded</p>"

Assuming that a file was uploaded, access the parameter’s value attribute to read the file and obtain its contents:

data = form["upload_file"].value

See the post_image.py script for details about how to use this function to get image information and store it in MySQL.

Performing Searches and Presenting the Results

Problem

You want to implement a web-based search interface.

Solution

Present a form containing fields that enable the user to supply search parameters such as keywords. Use the keywords to construct a database query, and then display the query results.

Discussion

A script that implements a web-based search interface provides a convenience for people who visit your web site because they don’t have to know any SQL to find information in your database. Instead, visitors supply keywords that describe what they’re interested in and your script figures out the appropriate statements to run on their behalf. A common paradigm for this activity involves a form containing one or more fields for entering search parameters. The user fills in the form, submits it, and receives back a new page containing the records that match the parameters.

As the writer of such a script, you must handle these operations:

1. Generate the form and send it to the users.

2. Interpret the submitted form and construct an SQL statement based on its contents. This includes proper use of placeholders or quoting to prevent bad input from crashing or subverting your script.

3. Execute the statement and display its result. This can be simple if the result set is small, or more complex if it is large. In the latter case, you may want to present the matching records using a paged display—that is, a display consisting of multiple pages, each of which shows a subset of the entire statement result. Multiple-page displays have the benefit of not overwhelming the user with huge amounts of information all at once. Generating Previous-Page and Next-Page Links discusses how to implement them.

This recipe demonstrates a script that implements a minimal search interface: a form with one keyword field, from which a statement is constructed that returns at most one record. The script performs a two-way search through the contents of the states table. That is, if the user enters a state name, it looks up the corresponding abbreviation. Conversely, if the user enters an abbreviation, it looks up the name. The script, search_state.pl, looks like this:

#!/usr/bin/perl

# search_state.pl - simple "search for state" application

# Present a form with an input field and a submit button. User enters

# a state abbreviation or a state name into the field and submits the

# form. Script finds the abbreviation and displays the full name, or

# finds the name and displays the abbreviation.

use strict;

use warnings;

use CGI qw(:standard escapeHTML);

use Cookbook;

my $title = "State Name or Abbreviation Lookup";

print header (), start_html (-title => $title, -bgcolor => "white");

# Extract keyword parameter. If it's present and nonempty,

# attempt to perform a lookup.

my $keyword = param ("keyword");

if (defined ($keyword) && $keyword !~ /^\s*$/)

{

my $dbh = Cookbook::connect ();

my $found = 0;

my $s;

# first try looking for keyword as a state abbreviation;

# if that fails, try looking for it as a name

$s = $dbh->selectrow_array ("SELECT name FROM states WHERE abbrev = ?",

undef, $keyword);

if ($s)

{

++$found;

print p ("You entered the abbreviation: " . escapeHTML ($keyword));

print p ("The corresponding state name is : " . escapeHTML ($s));

}

$s = $dbh->selectrow_array ("SELECT abbrev FROM states WHERE name = ?",

undef, $keyword);

if ($s)

{

++$found;

print p ("You entered the state name: " . escapeHTML ($keyword));

print p ("The corresponding abbreviation is : " . escapeHTML ($s));

}

if (!$found)

{

print p ("You entered the keyword: " . escapeHTML ($keyword));

print p ("No match was found.");

}

$dbh->disconnect ();

}

print p (qq{

Enter a state name into the form and select Search, and I will show you

the corresponding abbreviation.

Or enter an abbreviation and I will show you the full name.

});

print start_form (-action => url ()),

"State: ",

textfield (-name => "keyword", -size => 20),

br (),

submit (-name => "choice", -value => "Search"),

end_form ();

print end_html ();

The script first checks to see whether a keyword parameter is present. If so, it runs the statements that look for a match to the parameter value in the states table and displays the results. Then it presents the form so that the user can enter a new search.

When you try the script, you’ll notice that the value of the keyword field carries over from one invocation to the next. That’s due to CGI.pm’s behavior of initializing form fields with values from the script environment. If you don’t like this behavior, defeat it and make the field come up blank each time by supplying an empty value explicitly and an override parameter in the textfield() call:

print textfield (-name => "keyword",

-value => "",

-override => 1,

-size => 20);

Alternatively, clear the parameter’s value in the environment before generating the field:

param (-name => "keyword", -value => "");

print textfield (-name => "keyword", -size => 20);

Generating Previous-Page and Next-Page Links

Problem

A statement matches so many rows that displaying them all in a single web page produces an unwieldy result.

Solution

Split the statement output across several pages and include links that enable the user to navigate among pages.

Discussion

If a statement matches a large number of rows, showing them all in a single web page can result in a display that’s difficult to navigate. For such cases, it can be more convenient for the user if you split the result among multiple pages. Such a paged display avoids overwhelming the user with too much information, but is more difficult to implement than a single-page display.

A paged display typically is used in a search context to present rows that match the search parameters supplied by the user. To simplify things, the examples in this recipe don’t have any search interface. Instead, they implement a paged display that presents 10 rows at a time from the result of a fixed statement:

SELECT name, abbrev, statehood, pop FROM states ORDER BY name;

MySQL makes it easy to select just a portion of a result set: add a LIMIT clause that indicates which rows you want. The two-argument form of LIMIT takes values indicating how many rows to skip at the beginning of the result set, and how many to select. The statement to select a section of the states table thus becomes:

SELECT name, abbrev, statehood, pop FROM states ORDER BY name

LIMITskip,select;

One issue, then, is to determine the proper values of skip and select for any given page. Another is to generate the links that point to other pages or the statement result. This second issue presents you with a choice: which paging style should you use for the links?

§ One style of paged display presents only “previous page” and “next page” links. To do this, you need to know whether any rows precede or follow those you’re displaying in the current page.

§ Another paging style displays a link for each available page. This enables the user to jump directly to any page, not just the previous or next page. To present this kind of navigation, you have to know the total number of rows in the result set and the number of rows per page, so that you can determine how many pages there are.

Paged displays with previous-page and next-page links

The following script, state_pager1.pl, presents rows from the states table in a paged display that includes navigation links only to the previous and next pages. For a given page, you can determine which links are needed as follows:

§ A “previous page” link is needed if there are rows in the result set preceding those shown in the current page. If the current page starts at row one, there are no such rows.

§ A “next page” link is needed if there are rows in the result set following those shown in the current page. You can determine this by issuing a SELECT COUNT(*) statement to see how many rows the statement matches in total. Another method is to select one more row than you need. For example, if you’re displaying 10 rows at a time, try to select 11 rows. If you get 11, there is a next page. If you get 10 or less, there isn’t. state_pager1.pl uses the latter approach.

To determine its current position in the result set and how many rows to display, state_pager1.pl looks for start and per_page input parameters. When you first invoke the script, these parameters won’t be present, so they’re initialized to 1 and 10, respectively. Thereafter, the script generates “previous page” and “next page” links to itself that include the proper parameter values in the URLs for selecting the previous or next sections of the result set.

#!/usr/bin/perl

# state_pager1.pl - paged display of states, with prev-page/next-page links

use strict;

use warnings;

use CGI qw(:standard escape escapeHTML);

use Cookbook;

my $title = "Paged U.S. State List";

my $page = header ()

. start_html (-title => $title, -bgcolor => "white")

. h3 ($title);

my $dbh = Cookbook::connect ();

# Collect parameters that determine where we are in the display and

# verify that they are integers.

# Default to beginning of result set, 10 records/page if parameters

# are missing/malformed.

my $start = param ("start");

$start = 1

if !defined ($start) || $start !~ /^\d+$/ || $start < 1;

my $per_page = param ("per_page");

$per_page = 10

if !defined ($per_page) || $per_page !~ /^\d+$/ || $per_page < 1;;

# If start > 1, then we'll need a live "previous page" link.

# To determine whether there is a next page, try to select one more

# record than we need. If we get that many, display only the first

# $per_page records, but add a live "next page" link.

# Select the records in the current page of the result set, and

# attempt to get an extra record. (If we get the extra one, we

# won't display it, but its presence tells us there is a next

# page.)

my $stmt = sprintf ("SELECT name, abbrev, statehood, pop

FROM states

ORDER BY name LIMIT %d,%d",

$start - 1, # number of records to skip

$per_page + 1); # number of records to select

my $tbl_ref = $dbh->selectall_arrayref ($stmt);

$dbh->disconnect ();

# Display results as HTML table

my @rows;

push (@rows, Tr (th (["Name", "Abbreviation", "Statehood", "Population"])));

for (my $i = 0; $i < $per_page && $i < @{$tbl_ref}; $i++)

{

# get data values in row $i

my @cells = @{$tbl_ref->[$i]}; # get data values in row $i

# map values to HTML-encoded values, or to if null/empty

@cells = map {

defined ($_) && $_ ne "" ? escapeHTML ($_) : " "

} @cells;

# add cells to table

push (@rows, Tr (td (\@cells)));

}

$page .= table ({-border => 1}, @rows) . br ();

# If we're not at the beginning of the query result, present a live

# link to the previous page. Otherwise, present static text.

if ($start > 1) # live link

{

my $url = sprintf ("%s?start=%d;per_page=%d",

url (),

$start - $per_page,

$per_page);

$page .= "[" . a ({-href => $url}, "previous page") . "] ";

}

else # static text

{

$page .= "[previous page]";

}

# If we got the extra record, present a live link to the next page.

# Otherwise, present static text.

if (@{$tbl_ref} > $per_page) # live link

{

my $url = sprintf ("%s?start=%d;per_page=%d",

url (),

$start + $per_page,

$per_page);

$page .= "[" . a ({-href => $url}, "next page") . "]";

}

else # static text

{

$page .= "[next page]";

}

$page .= end_html ();

print $page;

Paged displays with links to each page

The next script, state_pager2.pl, is much like state_pager1.pl, but presents a paged display that includes navigation links to each page of the query result. To do this, it’s necessary to know how many rows there are in all. state_pager2.pl determines this by running a SELECT COUNT(*)statement. Because the script then knows the total row count, it need not select an extra row when fetching the section of the result to be displayed.

Omitting the parts of state_pager2.pl that are the same as state_pager1.pl, the middle part that retrieves rows and generates links is implemented as follows:

# Determine total number of records

my $total_recs = $dbh->selectrow_array ("SELECT COUNT(*) FROM states");

# Select the records in the current page of the result set

my $stmt = sprintf ("SELECT name, abbrev, statehood, pop

FROM states

ORDER BY name LIMIT %d,%d",

$start - 1, # number of records to skip

$per_page); # number of records to select

my $tbl_ref = $dbh->selectall_arrayref ($stmt);

$dbh->disconnect ();

# Display results as HTML table

my @rows;

push (@rows, Tr (th (["Name", "Abbreviation", "Statehood", "Population"])));

for (my $i = 0; $i < @{$tbl_ref}; $i++)

{

# get data values in row $i

my @cells = @{$tbl_ref->[$i]}; # get data values in row $i

# map values to HTML-encoded values, or to if null/empty

@cells = map {

defined ($_) && $_ ne "" ? escapeHTML ($_) : " "

} @cells;

# add cells to table

push (@rows, Tr (td (\@cells)));

}

$page .= table ({-border => 1}, @rows) . br ();

# Generate links to all pages of the result set. All links are

# live, except the one to the current page, which is displayed as

# static text. Link label format is "[m to n]" where m and n are

# the numbers of the first and last records displayed on the page.

for (my $first = 1; $first <= $total_recs; $first += $per_page)

{

my $last = $first + $per_page - 1;

$last = $total_recs if $last > $total_recs;

my $label = "$first to $last";

my $link;

if ($first != $start) # live link

{

my $url = sprintf ("%s?start=%d;per_page=%d",

url (),

$first,

$per_page);

$link = a ({-href => $url}, $label);

}

else # static text

{

$link = $label;

}

$page .= "[$link] ";

}

Generating “Click to Sort” Table Headings

Problem

You want to display a query result in a web page as a table that enables the user to select which column to sort the table rows by.

Solution

Make each column heading a hyperlink that redisplays the table, sorted by the corresponding column.

Discussion

When a web script runs, it can determine what action to take by examining its environment to find out what parameters are present and what their values are. In many cases these parameters come from a user, but there’s no reason a script cannot add parameters to URLs itself. This is one way a given invocation of a script can send information to the next invocation. The effect is that the script communicates with itself by means of URLs that it generates to cause specific actions. An application of this technique is for showing the result of a query such that a user can select which column of the result to use for sorting the display. This is done by making the column headers active links that redisplay the table, sorted by the selected column.

The examples here use the mail table, which has the following contents:

mysql>SELECT * FROM mail;

+---------------------+---------+---------+---------+---------+---------+

+---------------------+---------+---------+---------+---------+---------+

+---------------------+---------+---------+---------+---------+---------+

To retrieve the table and display its contents as an HTML table, you can use the techniques discussed in Displaying Query Results as Tables. Here we’ll use those same concepts but modify them to produce “click to sort” table column headings.

A “plain” HTML table includes a row of column headers consisting only of the column names:

<tr>

<th>srcuser</th>

<th>srchost</th>

<th>dstuser</th>

<th>dsthost</th>

</tr>

To make the headings active links that reinvoke the script to produce a display sorted by a given column name, we need to produce a header row that looks like this:

<tr>

<th><a href="script_name?sort=srcuser">srcuser</a></th>

<th><a href="script_name?sort=srchost">srchost</a></th>

<th><a href="script_name?sort=dstuser">dstuser</a></th>

<th><a href="script_name?sort=dsthost">dsthost</a></th>

</tr>

To generate such headings, the script needs to know the names of the columns in the table, as well as its own URL. Recipes and show how to obtain this information using statement metadata and information in the script’s environment. For example, in PHP, a script can generate the header row for the columns in a given statement as follows, where tableInfo() returns an array containing metadata for a query result. $info[ i ] contains information about column i and $info[ i ][ "name" ] contains the column’s name.

$self_path = get_self_path ();

print ("<tr>\n");

$info =& $conn->tableInfo ($result, NULL);

if (PEAR::isError ($info))

die (htmlspecialchars ($result->getMessage ()));

for ($i = 0; $i < $result->numCols (); $i++)

{

$col_name = $info[$i]["name"];

printf ("<th><a href=\"%s?sort=%s\">%s</a></th>\n",

$self_path,

urlencode ($col_name),

htmlspecialchars ($col_name));

}

print ("</tr>\n");

The following script, clicksort.php, implements this kind of table display. It checks its environment for a sort parameter that indicates which column to use for sorting. The script then uses the parameter to construct a statement of the following form:

SELECT * FROM $tbl_name ORDER BY $sort_col LIMIT 50

There is a small bootstrapping problem for this kind of script. The first time you invoke it, there is no sort column name in the environment, so the script doesn’t know which column to sort by initially. What should you do? There are a couple possibilities:

§ You can retrieve the results unsorted.

§ You can hardwire one of the column names into the script as the default.

§ You can look up the column names from INFORMATION_SCHEMA and use one of them (such as the first) as the default. To look up the name on the fly, use this statement:

§ SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS

§ WHERE TABLE_SCHEMA = "cookbook" AND TABLE_NAME = "mail"

AND ORDINAL_POSITION = 1;

The following script looks up the name from INFORMATION_SCHEMA. It also uses a LIMIT clause when retrieving results as a precaution that prevents the script from dumping huge amounts of output if the table is large.

<?php

# clicksort.php - display query result as HTML table with "click to sort"

# column headings

# Rows from the database table are displayed as an HTML table.

# Column headings are presented as hyperlinks that reinvoke the

# script to redisplay the table sorted by the corresponding column.

# The display is limited to 50 rows in case the table is large.

require_once "Cookbook.php";

require_once "Cookbook_Webutils.php";

$title = "Table Display with Click-To-Sort Column Headings";

<html>

<head>

</head>

<?php

# ----------------------------------------------------------------------

# names for database and table and default sort column; change as desired

$db_name = "cookbook";

$tbl_name = "mail";

$conn =& Cookbook::connect ();

if (PEAR::isError ($conn))

die ("Cannot connect to server: "

. htmlspecialchars ($conn->getMessage ()));

print ("<p>" . htmlspecialchars ("Table: $db_name.$tbl_name") . "</p>\n");

print ("<p>Click on a column name to sort by that column.</p>\n");

# Get the name of the column to sort by: If missing, use the first column.

$sort_col = get_param_val ("sort");

if (!isset ($sort_col))

{

$stmt = "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?

AND ORDINAL_POSITION = 1";

$result =& $conn->query ($stmt, array ($db_name, $tbl_name));

if (PEAR::isError ($result))

die (htmlspecialchars ($result->getMessage ()));

if (!(list ($sort_col) = $result->fetchRow ()))

die (htmlspecialchars ($result->getMessage ()));

$result->free ();

}

# Construct query to select records from the table, sorting by the

# named column. (The column name comes from the environment, so quote

# it to avoid an SQL injection attack.)

# Limit output to 50 rows to avoid dumping entire contents of large tables.

$stmt = "SELECT * FROM $db_name.$tbl_name";

$stmt .= " ORDER BY " . $conn->quoteIdentifier ($sort_col);

$stmt .= " LIMIT 50";

$result =& $conn->query ($stmt);

if (PEAR::isError ($result))

die (htmlspecialchars ($result->getMessage ()));

# Display query results as HTML table. Use query metadata to get column

# names, and display names in first row of table as hyperlinks that cause

# the table to be redisplayed, sorted by the corresponding table column.

print ("<table border=\"1\">\n");

$self_path = get_self_path ();

print ("<tr>\n");

$info =& $conn->tableInfo ($result, NULL);

if (PEAR::isError ($info))

die (htmlspecialchars ($result->getMessage ()));

for ($i = 0; $i < $result->numCols (); $i++)

{

$col_name = $info[$i]["name"];

printf ("<th><a href=\"%s?sort=%s\">%s</a></th>\n",

$self_path,

urlencode ($col_name),

htmlspecialchars ($col_name));

}

print ("</tr>\n");

while ($row =& $result->fetchRow ())

{

print ("<tr>\n");

for ($i = 0; $i < $result->numCols (); $i++)

{

# encode values, using for empty cells

$val = $row[$i];

if (isset ($val) && $val != "")

$val = htmlspecialchars ($val);

else

$val = " ";

printf ("<td>%s</td>\n", $val);

}

print ("</tr>\n");

}

$result->free ();

print ("</table>\n");

$conn->disconnect ();

</body>

</html>

The $sort_col value comes from the sort parameter of the environment, so it should be considered dangerous: An attacker might submit a URL with a sort parameter designed to attempt an SQL injection attack. To prevent this, $sort_col should be quoted when you construct theSELECT statement that retrieves rows from the displayed table. You cannot use a placeholder to quote the value because that technique applies to data values. ($sort_col is used as an identifier, not a data value.) However, some MySQL APIs, PEAR DB among them, provide an identifier-quoting function. clicksort.php uses quoteIdentifier() to make the $sort_col value safe for inclusion in the SQL statement.

Another approach to validating the column name is to check the COLUMNS table of INFORMATION_SCHEMA. If the sort column is not listed there, it is invalid. The clicksort.php script shown here does not do that. However, the recipes distribution contains a Perl counterpart script,clicksort.pl, that does perform this kind of check. Have a look at it if you want more information.

The cells in the rows following the header row contain the data values from the database table, displayed as static text. Empty cells are displayed using so that they display with the same border as nonempty cells (see Displaying Query Results as Tables).

Web Page Access Counting

Problem

You want to count the number of times a page has been accessed.

Solution

Implement a hit counter, keyed to the page you want to count. This can be used to display a counter in the page. The same technique can be used to record other types of information as well, such as the number of times each of a set of banner ads has been served.

Discussion

This recipe discusses access counting, using hit counters for the examples. Counters that display the number of times a web page has been accessed are not such a big thing as they used to be, presumably because page authors now realize that most visitors don’t really care how popular a page is. Still, the general concept has application in several contexts. For example, if you’re displaying banner ads in your pages (Recipe 18.8), you may be charging vendors by the number of times you serve their ads. To do so, you need to count the number of accesses for each one. You can adapt the technique shown in this section for such purposes.

There are several methods for writing a page that displays a count of the number of times it has been accessed. The most basic is to maintain the count in a file. When the page is requested, open the file, read the count, increment it, and write the new count back to the file and display it in the page. This has the advantage of being easy to implement and the disadvantage that it requires a counter file for each page that includes a hit count. It also doesn’t work properly if two clients access the page at the same time, unless you implement some kind of locking protocol in the file access procedure. It’s possible to reduce counter file litter by keeping multiple counts in a single file, but that makes it more difficult to access particular values within the file, and it doesn’t solve the simultaneous-access problem. In fact, it makes it worse, because a multiple-counter file has a higher likelihood of being accessed by multiple clients simultaneously than does a single-counter file. So you end up implementing storage and retrieval methods for processing the file contents, and locking protocols to keep multiple processes from interfering with each other. Hmm... those sound suspiciously like the problems that a database management system such as MySQL already takes care of! Keeping the counts in the database centralizes them into a single table, SQL provides the storage and retrieval interface, and the locking problem goes away because MySQL serializes access to the table so that clients can’t interfere with each other. Furthermore, depending on how you manage the counters, you might be able to update the counter and retrieve the new sequence value using a single statement.

I’ll assume that you want to log hits for more than one page. To do that, create a table that has one row for each page to be counted. This means that it’s necessary to have a unique identifier for each page, so that counters for different pages don’t get mixed up. You could assign identifiers somehow, but it’s easier just to use the page’s path within your web tree. Web programming languages typically make this path easy to obtain; in fact, we’ve already discussed how to do so in Writing Scripts That Generate Web Forms. On that basis, you can create a hitcount table as follows:

CREATE TABLE hitcount

(

path VARCHAR(255)

CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,

hits BIGINT UNSIGNED NOT NULL,

PRIMARY KEY (path)

);

This table definition involves some assumptions:

§ The path column that stores page pathnames has a character set of latin1 and a case-sensitive collation of latin1_general_cs. Use of a case-sensitive collation is appropriate for a web platform where pathnames are case-sensitive, such as most versions of Unix. For Windows or for HFS+ filesystems under Mac OS X, filenames are not case-sensitive, so you would choose a collation that is not case-sensitive, such as latin1_swedish_ci. If your filesystem is set up to use pathnames in a different character set, you should change the character set and collation.

§ The path column has a maximum length of 255 characters, which limits you to page paths no longer than that.

§ The path column is indexed as a PRIMARY KEY to require unique values. Either a PRIMARY KEY or UNIQUE index is required because we will implement the hit-counting algorithm using an INSERT statement with an ON DUPLICATE KEY UPDATE clause to insert a row if none exists for the page or update the row if it does exist. (Using Sequence Generators as Counters provides background that further explains ON DUPLICATE KEY UPDATE.)

§ The table is set up to count page hits for a single document tree, such as when your web server is used to serve pages for a single domain. If you institute a hit count mechanism on a host that serves multiple virtual domains, you may want to add a column for the domain name. This value is available in the SERVER_NAME value that Apache puts into your script’s environment. In this case, the hitcount table index should include both the hostname and the page path.

The general logic involved in hit counter maintenance is to increment the hits column of the row for a page, and then retrieve the updated counter value:

UPDATE hitcount SET hits = hits + 1 WHERE path = 'page path';

SELECT hits FROM hitcount WHERE path = 'page path';

Unfortunately, if you use that approach, you may often not get the correct value. If several clients request the same page simultaneously, several UPDATE statements may be issued in close temporal proximity. The following SELECT statements then wouldn’t necessarily get the correspondinghits value. This can be avoided by using a transaction or by locking the hitcount table, but that slows down hit counting. MySQL provides a solution that enables each client to retrieve its own count, no matter how many updates happen at the same time:

UPDATE hitcount SET hits = LAST_INSERT_ID(hits+1) WHERE path = 'page path';

SELECT LAST_INSERT_ID();

The basis for updating the count here is LAST_INSERT_ID(expr), which is discussed in Using Sequence Generators as Counters. The UPDATE statement finds the relevant row and increments its counter value. The use of LAST_INSERT_ID(hits+1) rather than just hits+1 tells MySQL to treat the value as though it were an AUTO_INCREMENT value. This allows it to be retrieved in the second statement using LAST_INSERT_ID(). The LAST_INSERT_ID() function returns a connection-specific value, so you always get back the value corresponding to the UPDATE issued on the same connection. In addition, the SELECT statement doesn’t need to query a table, so it’s very fast.

However, there’s still a problem here. What if the page isn’t listed in the hitcount table? In that case, the UPDATE statement finds no row to modify and you get a counter value of zero. You could deal with this problem by requiring that any page that includes a hit counter must be registered in the hitcount table before the page goes online. An easier approach is to use MySQL’s INSERT ... ON DUPLICATE KEY UPDATE syntax, which inserts a row with a count of 1 if it does not exist, and updates its counter if it does exist:

INSERT INTO hitcount (path,hits) VALUES('some path',LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE hits = LAST_INSERT_ID(hits+1);

The counter value then can be retrieved as the value of the LAST_INSERT_ID() function:

SELECT LAST_INSERT_ID();

The first time you request a count for a page, the statement inserts a row because the page won’t be listed in the table yet. The statement creates a new counter and returns a value of one. For each request thereafter, the statement updates the existing row for the page with the new count. That way, web page designers can include counters in pages with no advance preparation required to initialize the hitcount table with a row for the page.

A further efficiency can be gained by eliminating the SELECT statement altogether, which is possible if your API provides a means for direct retrieval of the most recent sequence number. For example, in Perl, you can update the count and get the new value with only one SQL statement like this:

$dbh->do ("INSERT INTO hitcount (path,hits) VALUES(?,LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE hits = LAST_INSERT_ID(hits+1)",

undef, $page_path);

$count = $dbh->{mysql_insertid};

To make the counter mechanism easier to use, put the code in a utility function that takes a page path as an argument and returns the count. In Perl, a hit-counting function might look like this, in which the arguments are a database handle and the page path:

sub get_hit_count

{

my ($dbh, $page_path) = @_;

my $count;

$dbh->do ("INSERT INTO hitcount (path,hits) VALUES(?,LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE hits = LAST_INSERT_ID(hits+1)",

undef, $page_path);

$count = $dbh->{mysql_insertid};

return $count

}

The CGI.pm script_name() function returns the local part of the URL, so you use get_hit_count() like this:

my $count = get_hit_count ($dbh, script_name ());

print p ("This page has been accessed $count times.");

The counter-update mechanism involves a single SQL statement, so it is unnecessary to use transactions or explicit table locking to prevent race conditions that might result if multiple clients simultaneously request a page.

A Ruby version of the hit counter looks like this:

def get_hit_count(dbh, page_path)

dbh.do("INSERT INTO hitcount (path,hits) VALUES(?,LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE hits = LAST_INSERT_ID(hits+1)",

page_path)

return dbh.func(:insert_id)

end

Use the counter method as follows:

self_path = ENV["SCRIPT_NAME"]

count = get_hit_count(dbh, self_path)

page << cgi.p { "This page has been accessed " + count.to_s + " times." }

In Python, the counting function looks like this:

def get_hit_count (conn, page_path):

cursor = conn.cursor ()

cursor.execute ("""

INSERT INTO hitcount (path,hits) VALUES(%s,LAST_INSERT_ID(1))

ON DUPLICATE KEY UPDATE hits = LAST_INSERT_ID(hits+1)

""", (page_path,))

cursor.close ()

return (conn.insert_id ())

Use the function as follows:

self_path = os.environ["SCRIPT_NAME"]

count = get_hit_count (conn, self_path)

print "<p>This page has been accessed %d times.</p>" % count

The recipes distribution includes demonstration hit counter scripts for Perl, Ruby, PHP, and Python under the apache/hits directory. A JSP version is under the tomcat/mcb directory. Install any of these in your web tree, invoke it from your browser a few times, and watch the count increase. First, you’ll need to create the hitcount table, as well as the hitlog table described in Web Page Access Logging. (The hit-counting scripts show a count and also a log of the most recent hits. Web Page Access Logging discusses the logging mechanism.) Both tables can be created using the hits.sql script provided in the tables directory.

Web Page Access Logging

Problem

You want to know more about a page than just the number of times it’s been accessed, such as the time of access and the host from which the request originated.

Solution

Maintain a hit log rather than a simple counter.

Discussion

The hitcount table used in Web Page Access Counting records only the access count for each page registered in it. If you want to record other information about page access, use a different approach. Suppose that you want to track the client host and time of access for each request. In this case, you need to log a row for each page access rather than just a count. But you can still maintain the counts by using a multiple-column index that combines the page path and an AUTO_INCREMENT sequence column:

CREATE TABLE hitlog

(

path VARCHAR(255)

CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,

hits BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,

t TIMESTAMP,

host VARCHAR(255),

INDEX (path,hits)

) ENGINE = MyISAM;

See Web Page Access Counting for notes on choosing the character set and collation for the path column.

To insert new rows into the hitlog table, use this statement:

INSERT INTO hitlog (path, host) VALUES(path_val,host_val);

For example, in a JSP page, hits can be logged like this:

<c:set var="host"><%= request.getRemoteHost () %></c:set>

<c:if test="${empty host}">

<c:set var="host"><%= request.getRemoteAddr () %></c:set>

</c:if>

<c:if test="${empty host}">

<c:set var="host">UNKNOWN</c:set>

</c:if>

<sql:update dataSource="${conn}">

INSERT INTO hitlog (path, host) VALUES(?,?)

<sql:param><%= request.getRequestURI () %></sql:param>

<sql:param value="${host}"/>

</sql:update>

The hitlog table has the following useful properties:

§ Access times are recorded automatically in the TIMESTAMP column t when you insert new rows.

§ By linking the path column to an AUTO_INCREMENT column hits, the counter values for a given page path increment automatically whenever you insert a new row for that path. The counters are maintained separately for each distinct path value. This counting mechanism requires that you use the MyISAM (or BDB) storage engine, which is why the table definition includes an explicit ENGINE = MyISAM clause. (For more information on how multiple-column sequences work, see Using an AUTO_INCREMENT Column to Create Multiple Sequences.)

§ There’s no need to check whether the counter for a page already exists, because you insert a new row each time you record a hit for a page, not just for the first hit.

§ To determine the current counter value for each page, select the row for each distinct path value that has the largest hits value:

SELECT path, MAX(hits) FROM hitlog GROUP BY path;

To determine the counter for a given page, use this statement:

SELECT MAX(hits) FROM hitlog WHERE path = 'path_name';

Using MySQL for Apache Logging

Problem

You don’t want to use MySQL to log accesses for just a few pages, as shown in Web Page Access Logging. You want to log all page accesses, and you don’t want to have to put logging code in each page explicitly.

Solution

Tell Apache to log page accesses to a MySQL table.

Discussion

The uses for MySQL in a web context aren’t limited just to page generation and processing. You can use it to help you run the web server itself. For example, most Apache servers are set up to log a record of page requests to a file. But it’s also possible to send log records to a program instead, from which you can write the records wherever you like—such as to a database. With log records in a database rather than a flat file, the log becomes more highly structured and you can apply SQL analysis techniques to it. Logfile analysis tools may be written to provide some flexibility, but often this is a matter of deciding which summaries to display and which to suppress. It’s more difficult to tell a tool to display information it wasn’t built to provide. With log entries in a table, you gain additional flexibility. Want to see a particular report? Write the SQL statements that produce it. To display the report in a specific format, issue the statements from within an API and take advantage of your language’s output production capabilities.

By handling log entry generation and storage using separate processes, you gain some additional flexibility. Some of the possibilities are to send logs from multiple web servers to the same MySQL server, or to send different logs generated by a given web server to different MySQL servers.

This recipe shows how to set up web request logging from Apache into MySQL and demonstrates some summary queries you may find useful.

Setting up database logging

Apache logging is controlled by directives in the httpd.conf configuration file. For example, a typical logging setup uses LogFormat and CustomLog directives that look like this:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

CustomLog /usr/local/apache/logs/access_log common

The LogFormat line defines a format for log records and gives it the nickname common. The CustomLog directive indicates that lines should be written in that format to the access_log file in Apache’s logs directory. To set up logging to MySQL instead, use the following procedure:^[21^]

1. Decide what values you want to record and set up a table that contains the appropriate columns.

2. Write a program to read log lines from Apache and write them into the database.

3. Set up a LogFormat line that defines how to write log lines in the format the program expects, and a CustomLog directive that tells Apache to write to the program rather than to a file.

Suppose that you want to record the date and time of each request, the host that issued the request, the request method and URL pathname, the status code, the number of bytes transferred, the referring page, and the user agent (typically a browser or spider name). A table that includes columns for these values can be created as follows:

CREATE TABLE httpdlog

(

dt DATETIME NOT NULL, # request date

host VARCHAR(255) NOT NULL, # client host

method VARCHAR(4) NOT NULL, # request method (GET, PUT, etc.)

url VARCHAR(255) # URL path

CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,

status INT NOT NULL, # request status

size INT, # number of bytes transferred

referer VARCHAR(255), # referring page

agent VARCHAR(255) # user agent

);

Most of the string columns use VARCHAR and are not case-sensitive. The exception, url, is declared with a case-sensitive collation as is appropriate for a server running on a system with case-sensitive filenames. See Web Page Access Counting for notes on choosing the character set and collation for the path column.

The httpdlog table definition shown here doesn’t include any indexes. If you plan to run summary queries, you should add appropriate indexes to the table. Otherwise, the summaries will slow down dramatically as the table becomes large. The choice of which columns to index will be based on the types of statements you intend to run to analyze the table contents. For example, statements to analyze the distribution of client host values will benefit from an index on the host column.

Next, you need a program to process log lines produced by Apache and insert them into the httpdlog table. The following script, httpdlog.pl, opens a connection to the MySQL server, and then loops to read input lines. It parses each line into column values and inserts the result into the database. When Apache exits, it closes the pipe to the logging program. That causes httpdlog.pl to see end of file on its input, terminate the loop, disconnect from MySQL, and exit.

#!/usr/bin/perl

# httpdlog.pl - Log Apache requests to httpdlog table

# path to directory containing Cookbook.pm (CHANGE AS NECESSARY)

use lib qw(/usr/local/lib/mcb);

use strict;

use warnings;

use Cookbook;

my $dbh = Cookbook::connect ();

my $sth = $dbh->prepare (qq{

INSERT DELAYED INTO httpdlog

(dt,host,method,url,status,size,referer,agent)

VALUES (?,?,?,?,?,?,?,?)

});

while (<>) # loop reading input

{

chomp;

my ($dt, $host, $method, $url, $status, $size, $refer, $agent)

= split (/\t/, $_);

# map "-" to NULL for some columns

$size = undef if $size eq "-";

$agent = undef if $agent eq "-";

$sth->execute ($dt, $host, $method, $url,

$status, $size, $refer, $agent);

}

$dbh->disconnect ();

Install the httpdlog.pl script where you want Apache to look for it. On my system, the Apache root directory is /usr/local/apache, so /usr/local/apache/bin is a reasonable installation directory. The path to this directory will be needed shortly for constructing the CustomLog directive that instructs Apache to log to the script.

The purpose of including the use lib line is so that Perl can find the Cookbook.pm module. This line will be necessary if the environment of scripts invoked by Apache for logging does not enable Perl to find the module. Change the path as necessary for your system.

The script uses INSERT DELAYED rather than INSERT. The advantage of using DELAYED is that the MySQL server buffers the row in memory and then later inserts a batch of rows at a time, which is more efficient. This also enables the client to continue immediately rather than having to wait if the table is busy. The disadvantage is that if the MySQL server crashes, any rows buffered in memory at the time are lost. I figure that this is not very likely, and that the loss of a few log records is not a serious problem. If you disagree, just remove DELAYED from the statement.

httpdlog.pl assumes that input lines contain httpdlog column values delimited by tabs (to make it easy to break apart input lines), so Apache must write log entries in a matching format. The LogFormat field specifiers to produce the appropriate values are shown in the following table.

Specifier	Meaning
%{%Y-%m-%d %H:%M:%S}t	The date and time of the request, in MySQL’s DATETIME format
%h	The host from which the request originated
%m	The request method (get, post, and so forth)
%U	The URL path
%>s	The status code
%b	The number of bytes transferred
%{Referer}i	The referring page
%{User-Agent}i	The user agent

To define a logging format named mysql that produces these values with tabs in between, add the following LogFormat directive to your httpd.conf file:

LogFormat \

"%{%Y-%m-%d %H:%M:%S}t\t%h\t%m\t%U\t%>s\t%b\t%{Referer}i\t%{User-Agent}i" \

mysql

Most of the pieces are in place now. We have a log table, a program that writes to it, and a mysql format for producing log entries. All that remains is to tell Apache to write the entries to the httpdlog.pl script. However, until you know that the output format really is correct and that the program can process log entries properly, it’s premature to tell Apache to log directly to the program. To make testing and debugging a bit easier, have Apache log mysql-format entries to a file instead. That way, you can look at the file to check the output format, and you can use it as input tohttpdlog.pl to verify that the program works correctly. To instruct Apache to log lines in mysql format to the file test_log in Apache’s log directory, use this CustomLog directive:

CustomLog /usr/local/apache/logs/test_log mysql

Then restart Apache to enable the new logging directives. After your web server receives a few requests, take a look at the test_log file. Verify that the contents are as you expect, and then feed the file to httpdlog.pl:

%/usr/local/apache/bin/httpdlog.pl test_log

After httpdlog.pl finishes, take a look at the httpdlog table to make sure that it looks correct. Once you’re satisfied, tell Apache to send log entries directly to httpdlog.pl by modifying the CustomLog directive as follows:

CustomLog "|/usr/local/apache/bin/httpdlog.pl" mysql

The | character at the beginning of the pathname tells Apache that httpdlog.pl is a program, not a file. Restart Apache and new entries should appear in the httpdlog table as visitors request pages from your site.

Nothing you have done to this point changes any logging you may have been doing originally. For example, if you were logging to an access_log file before, you still are now. Thus, Apache will be sending entries both to the original logfile and to MySQL. If that’s what you want, fine. Apache doesn’t care if you log to multiple destinations. But you’ll use more disk space if you do. To disable file logging, comment out your original CustomLog directive by placing a # character in front of it, and then restart Apache.

Analyzing the logfile

Now that you have Apache logging into the database, what can you do with the information? That depends on what you want to know. Here are some examples that show the kinds of questions you can use MySQL to answer easily:

§ How many rows are in the request log?

SELECT COUNT(*) FROM httpdlog;

§ How many different client hosts have sent requests?

SELECT COUNT(DISTINCT host) FROM httpdlog;

§ How many different pages have clients requested?

SELECT COUNT(DISTINCT url) FROM httpdlog;

§ What are the 10 most popular pages?

§ SELECT url, COUNT(*) AS count FROM httpdlog

GROUP BY url ORDER BY count DESC LIMIT 10;

§ How many requests have been received for those favicon.ico files that certain browsers like to check for?

SELECT COUNT(*) FROM httpdlog WHERE url LIKE '%/favicon.ico%';

§ What is the range of dates spanned by the log?

SELECT MIN(dt), MAX(dt) FROM httpdlog;

§ How many requests have been received each day?

SELECT DATE(dt) AS day, COUNT(*) FROM httpdlog GROUP BY day;

Answering this question requires stripping off the time-of-day part from the dt values so that requests received on a given date can be grouped. The statement does this using the DATE() function to convert DATETIME values to DATE values. However, if you intend to run a lot of statements that use just the date part of the dt values, it would be more efficient to create the httpdlog table with separate DATE and TIME columns, change the LogFormat directive to produce the date and time as separate output values, and modify httpdlog.pl accordingly. Then you can operate on the request dates directly without stripping off the time, and you can index the date column for even better performance.

§ What is the hour-of-the-day request histogram?

SELECT HOUR(dt) AS hour, COUNT(*) FROM httpdlog GROUP BY hour;

§ What is the average number of requests received each day?

§ SELECT COUNT(*)/(DATEDIFF(MAX(dt), MIN(dt)) + 1)

FROM httpdlog;

The numerator is the total number of requests in the table. The denominator is the number of days spanned by the records.

§ What is the longest URL recorded in the table?

SELECT MAX(LENGTH(url)) FROM httpdlog;

If the url column is defined as VARCHAR(255) and this statement produces a value of 255, it’s likely that some URL values were too long to fit in the column and were truncated at the end. To avoid this, change the column definition to allow more characters. For example, to allow up to 5,000 characters, modify the url column as follows:

ALTER TABLE httpdlog

MODIFY url VARCHAR(5000)

CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL;

§ What is the total number of bytes served and the average bytes per request?

§ SELECT

§ COUNT(size) AS requests,

§ SUM(size) AS bytes,

§ AVG(size) AS 'bytes/request'

FROM httpdlog;

The statement uses COUNT(size) rather than COUNT(*) to count only those requests with a non-NULL size value. If a client requests a page twice, the server may respond to the second request by sending a header indicating that the page hasn’t changed rather than by sending content. In this case, the log entry for the request will have NULL in the size column.

§ How much traffic has there been for each kind of file (based on filename extension such as .html, .jpg, or .php)?

§ SELECT

§ SUBSTRING_INDEX(SUBSTRING_INDEX(url,'?',1),'.',-1) AS extension,

§ COUNT(size) AS requests,

§ SUM(size) AS bytes,

§ AVG(size) AS 'bytes/request'

§ FROM httpdlog

§ WHERE url LIKE '%.%'

GROUP BY extension;

The WHERE clause selects only url values that have a period in them, to eliminate pathnames that refer to files that have no extension. To extract the extension values for the output column list, the inner SUBSTRING_INDEX() call strips off any parameter string at the right end of the URL and leaves the rest. (This turns a value like /cgi-bin/script.pl?id=43 into /cgi-bin/script.pl. If the value has no parameter part, SUBSTRING_INDEX() returns the entire string.) The outer SUBSTRING_INDEX() call strips everything up to and including the rightmost period from the result, leaving only the extension.

Other logging issues

The preceding discussion shows a simple method for hooking Apache to MySQL, which involves writing a short script that communicates with MySQL and then telling Apache to write to the script rather than to a file. This works well if you log all requests to a single file, but certainly won’t be appropriate for every possible configuration that Apache is capable of. For example, if you have virtual servers defined in your httpd.conf file, you might have separate CustomLog directives defined for each of them. To log them all to MySQL, you can change each directive to write tohttpdlog.pl, but then you’ll have a separate logging process running for each virtual server. That brings up two issues:

§ How do you associate log records with the proper virtual server? One solution is to create a separate log table for each server and modify httpdlog.pl to take an argument that indicates which table to use. Another is to use a table that has a vhost column, an Apache log format that includes the %v virtual host format specifier, and a logging script that uses the vhost value when it generates INSERT statements. The apache/httpdlog directory of the recipes distribution contains information about doing this.

§ Do you really want a lot of httpdlog.pl processes running? If you have many virtual servers, you may want to consider using a logging module that installs directly into Apache. Some of these can multiplex logging for multiple virtual hosts through a single connection to the database server, reducing resource consumption for logging activity.

Logging to a database rather than to a file enables you to bring the full power of MySQL to bear on log analysis, but it doesn’t eliminate the need to think about space management. Web servers can generate a lot of activity, and log records use space regardless of whether you write them to a file or to a database. One way to save space is to expire records now and then. For example, to remove log records that are more than a year old, run the following statement periodically:

DELETE FROM httpdlog WHERE dt < NOW() - INTERVAL 1 YEAR;

If you have MySQL 5.1 or higher, you can set up an event that runs the DELETE statement on a scheduled basis (Using Events to Schedule Database Actions).

With respect to disk space consumed by web logging activity, be aware that if you have query logging enabled for the MySQL server, each request will be written to the httpdlog table and also to the query logfile. Thus, you may find disk space disappearing more quickly than you expect, so it’s a good idea to have some kind of logfile rotation or expiration set up for the MySQL server.

^[21^]If you’re using logging directives such as TransferLog rather than LogFormat and CustomLog, you’ll need to adapt the instructions in this section.