MySQL Cookbook (2007)

Chapter 17. Introduction to MySQL on the Web

Introduction

This chapter and the next few discuss some of the ways that MySQL can help you build a better web site. One significant benefit is that MySQL enables you to create a more interactive site because it becomes easier to provide dynamic content rather than static content. Static content exists as pages in the web server’s document tree that are served exactly as is. Visitors can access only the documents that you place in the tree, and changes occur only when you add, modify, or delete those documents. By contrast, dynamic content is created on demand. Rather than opening a file and serving its contents directly to the client, the web server executes a script that generates the page and sends the resulting output. For example, a script can process a keyword request and return a page that lists items in a catalog that match the keyword. Each time a keyword is submitted, the script produces a result appropriate for the request. And that’s just for starters; web scripts have access to the power of the programming language in which they’re written, so the actions that they perform to generate pages can be quite extensive. For example, web scripts are important for form processing, and a single script may be responsible for generating a form and sending it to the user, processing the contents of the form when the user submits it later, and storing the contents in a database. By operating this way, web scripts interact with users of your web site and tailor the information provided according to what those users want to see.

This chapter covers the introductory aspects of writing scripts that use MySQL in a web environment. Some of the material is not MySQL-related, but is necessary to establish the initial groundwork for using your database from within the context of web programming. The topics covered here include:

§ How web scripting differs from writing static HTML documents or scripts intended to be executed from the command line.

§ Prerequisites for running web scripts. In particular, you must have a web server installed and it must be set up to recognize your scripts as programs to be executed, rather than as static files to be served literally over the network.

§ How to use each of our API languages to write a short web script that queries the MySQL server for information and displays the results in a web page.

§ How to properly encode output. HTML consists of text to be displayed interspersed with special markup constructs. However, if the text contains special characters, you must encode them to avoid generating malformed web pages. Each API provides a mechanism for doing this.

The following chapters go into more detail on topics such as how to display query results in various formats (paragraphs, lists, tables, and so forth), working with images, form processing, and tracking a user across the course of several page requests as part of a single user session.

This book uses the Apache web server for Perl, Ruby, PHP, and Python scripts. It uses the Tomcat server for Java scripts written using JSP notation. Apache and Tomcat are available from the Apache Group:

http://httpd.apache.org

http://tomcat.apache.org

Because Apache installations are fairly prevalent, I’m going to assume that it is already installed on your system and that you just need to configure it. Using Apache to Run Web Scripts discusses how to configure Apache for Perl, Ruby, PHP, and Python, and how to write a short web script in each language. Using Tomcat to Run Web Scripts discusses JSP script writing using Tomcat. Tomcat is less widely deployed than Apache, so some additional installation information is provided in Appendix C. You can use servers other than Apache and Tomcat if you like, but you’ll need to adapt the instructions given here.

The scripts for the examples in this chapter can be found in the recipes distribution under the directories named for the servers used to run them. For Perl, Ruby, PHP, and Python examples, look under the apache directory. For Java (JSP) examples, look under the tomcat directory.

I assume here that you have some basic familiarity with HTML. For Tomcat, it’s also helpful to know something about XML, because Tomcat’s configuration files are written as XML documents, and JSP pages contain elements written using XML syntax. If you don’t know any XML, see the quick summary in the sidebar “XML and XHTML in a Nutshell.” In general, the web scripts in this book produce output that is valid not only as HTML, but as XHTML, the transitional format between HTML and XML. (This is another reason to become familiar with XML.) For example, XHTML requires closing tags, so paragraphs are written with a closing tag following the paragraph body. The use of this output style will be obvious for scripts written using languages like PHP in which the HTML tags are included literally in the script. For interfaces that generate HTML for you, conformance to XHTML is a matter of whether the module itself produces XHTML. For example, the Perl CGI.pm module does generate XHTML, but the Ruby cgi module does not.

XML AND XHTML IN A NUTSHELL

XML is similar in some ways to HTML, and because more people know HTML, it’s perhaps easiest to characterize XML in terms of how it differs from HTML:

§ In HTML, lettercase of tag and attribute names does not matter. In XML, the names are case-sensitive.

§ In HTML, tag attributes can be specified with a quoted or unquoted value, or sometimes with no value at all. In XML, every tag attribute must have a value, and the value must be quoted.

§ Every opening tag in XML must have a corresponding closing tag. This is true even if there is no body, although a shortcut tag form can be used in that case. For example, in HTML, you can write , but XML requires a closing tag. You could write this as , but the element has no body, so a shortcut form can be used that combines the opening and closing tags. However, when writing XML that will be translated into HTML, it’s safer to write the tag as with a space preceding the slash. The space helps older browsers to not think the tag name is br/ and thus ignore it as an unrecognized element.

XHTML is a transitional format used for the migration of the Web away from HTML and toward XML. It’s less strict than XML, but more strict than HTML. For example, XHTML tag and attribute names must be lowercase and attributes must have a double-quoted value.

In HTML you might write a radio button element like this:

In XHTML, the tag name must be lowercase, the attribute values must be quoted, the checked attribute must be given a value, and there must be a closing tag:

The element has no body in this case, so the single-tag shortcut form can be used:

For additional information about HTML, XHTML, or XML, see Appendix D.

Basic Principles of Web Page Generation

Problem

You want to produce a web page from a script rather than by writing it manually.

Solution

Write a program that generates the page when it executes. This gives you more control over what is sent to the client than when you write a static page, although it may also require that you provide more parts of the response. For example, it may be necessary to write the headers that precede the page body.

Discussion

HTML is a markup language—that’s what the “ML” stands for. HTML consists of a mix of plain text to be displayed and special markup indicators or constructs that control how the plain text is displayed. Here is a very simple HTML page that specifies a title in the page header, and a body with a white background containing a single paragraph:

<html>

<head>

<title>Web Page Title</title>

</head>

Web page body.

</body>

</html>

It’s possible to write a script that produces that same page, but doing so differs in some ways from writing a static page. For one thing, you’re writing in two languages at once. (The script is written in your programming language, and the script itself writes HTML.) Another difference is that you may have to produce more of the response that is sent to the client. When a web server sends a static page to a client, it actually sends a set of one or more header lines first that provide additional information about the page. For example, an HTML document is preceded by a Content-Type: header that lets the client know what kind of information to expect, and a blank line that separates any headers from the page body:

Content-Type: text/html

<html>

<head>

<title>Web Page Title</title>

</head>

Web page body.

</body>

</html>

For static HTML pages, the web server produces header information automatically. When you write a web script, you may need to provide the header information yourself. Some APIs (such as PHP) may send a content-type header automatically, but enable you to override the default type. For example, if your script sends a JPEG image to the client instead of an HTML page, you should have the script change the content type from text/html to image/jpeg.

Writing web scripts also differs from writing command-line scripts, both for input and for output. On the input side, the information given to a web script is provided by the web server rather than by command-line arguments or by input that you type in. This means your scripts do not obtain data using input statements. Instead, the web server puts information into the execution environment of the script, which then extracts that information from its environment and acts on it.

On the output side, command-line scripts typically produce plain-text output, whereas web scripts produce HTML, images, or whatever other type of content you need to send to the client. Output produced in a web environment usually must be highly structured to ensure that it can be understood by the receiving client program.

Any programming language enables you to generate output by means of print statements, but some languages also offer special assistance for producing web pages. This support typically is provided by means of special modules:

§ For Perl scripts, a popular module is CGI.pm. It provides features for generating HTML markup, form processing, and more.

§ In Ruby, the cgi module provides capabilities similar to CGI.pm.

§ PHP scripts are written as a mix of HTML and embedded PHP code. That is, you write HTML literally into the script, and then drop into “program mode” whenever you need to generate output by executing code. PHP replaces the code by its output in the resulting page that is sent to the client.

§ Python includes cgi and urllib modules that help perform web programming tasks.

§ For Java, we’ll write scripts according to the JSP specification, which allows scripting directives and code to be embedded into web pages. This is similar to the way that PHP works.

Other page-generating packages are available besides those used in this book—some of which can have a marked effect on the way you use a language. For example, Mason, embPerl, ePerl, and AxKit enable you to treat Perl as an embedded language, somewhat like the way that PHP works. Similarly, the mod_snake Apache module enables Python code to be embedded into HTML templates.

Another approach to dynamic page generation separates page design and script writing. A web page is designed as a template that contains simple markers to indicate where to substitute values that can vary per request, and a script obtains the data to be displayed. With this approach, the script fetches the data as one phase of page generation. The script then invokes a template engine as another phase of page generation that places the data into the template and generates the final page as output. This decouples page design from script writing. Using a Template System to Generate Web Pages explores this approach to page generation further, using the Ruby PageTemplate and PHP Smarty template packages.

Before you can run any scripts in a web environment, your web server must be set up properly. Information about doing this for Apache and Tomcat is provided in Recipes and , but conceptually, a web server typically runs a script in one of two ways. First, the web server can use an external program to execute the script. For example, it can invoke an instance of the Python interpreter to run a Python script. Second, if the server has been enabled with the appropriate language processing ability, it can execute the script itself. Using an external program to run scripts requires no special capability on the part of the web server, but is slower because it involves starting up a separate process, as well as some additional overhead for writing request information to the script and reading the results from it. If you embed a language processor into the web server, it can execute scripts directly, resulting in much better performance.

Like most web servers, Apache can run external scripts. It also supports the concept of extensions (modules) that become part of the Apache process itself (either by being compiled in or dynamically loaded at runtime). One common use of this feature is to embed language processors into the server to accelerate script execution. Perl, Ruby, PHP, and Python scripts can be executed either way. Like command-line scripts, externally executed web scripts are written as executable files that begin with a #! line specifying the pathname of the appropriate language interpreter. Apache uses the pathname to determine which interpreter runs the script. Alternatively, you can extend Apache using modules such as mod_perl for Perl, mod_ruby for Ruby, mod_php for PHP, and mod_python or mod_snake for Python. This gives Apache the ability to directly execute scripts written in those languages.

For Java JSP scripts, the scripts are compiled into Java servlets and run inside a process known as a servlet container. This is similar to the embedded-interpreter approach in the sense that the scripts are run by a server process that manages them, rather than by starting up an external process for each script. The first time a client requests a given JSP page, the container compiles the page into a servlet in the form of executable Java byte code, and then loads it and runs it. The container caches the byte code so that the script can run directly with no compilation phase for subsequent requests. If you modify the script, the container notices this when the next request arrives, recompiles the script into a new servlet, and reloads it. The JSP approach provides a significant advantage over writing servlets directly, because you don’t have to compile code yourself or handle servlet loading and unloading. Tomcat can handle the responsibilities of both the servlet container and the web server that communicates with the container.

If you run multiple servers on the same host, they must listen for requests on different port numbers. In a typical configuration, Apache listens on the default HTTP port (80) and Tomcat listens on another port such as 8080. The examples here use a server hostname of localhost to represent URLs for scripts processed using Apache and Tomcat. The examples use a different port (8080) for Tomcat scripts. Typical forms for URLs that you’ll see in this book are as follows:

http://localhost/cgi-bin/my_perl_script.pl

http://localhost/cgi-bin/my_ruby_script.rb

http://localhost/cgi-bin/my_python_script.py

http://localhost/mcb/my_php_script.php

http://localhost:8080/mcb/my_jsp_script.jsp

Change the hostname and port number as necessary for pages served by your own servers.

Using Apache to Run Web Scripts

Problem

You want to run Perl, Ruby, PHP, or Python programs in a web environment.

Solution

Execute them using the Apache server.

Discussion

This recipe describes how to configure Apache for running Perl, Ruby, PHP, and Python scripts. It also illustrates how to write web-based scripts in each language.

There are typically several directories under the Apache root directory. Here I’ll assume that directory to be /usr/local/apache, although it may be different on your system. For example, on Windows, you might find Apache under C:\Program Files. The directories under the Apache root include bin (which contains httpd—that is, Apache itself—and other Apache-related executable programs), conf (for configuration files, notably httpd.conf, the primary file used by Apache), htdocs (the root of the document tree), and logs (for logfiles). The layout might differ on your system. For example, you might find the configuration files in /etc/httpd and the logs under /var/log/httpd. Modify the following instructions accordingly.

To configure Apache for script execution, edit the httpd.conf file in the conf directory. Typically, executable scripts are identified either by location or by filename suffix. A location can be either language-neutral or language-specific.

Apache configurations often have a cgi-bin directory under the Apache root directory in which you can install scripts that should run as external programs. It’s configured using a ScriptAlias directive:

ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/

The second argument is the actual location of the script directory in your filesystem, and the first is the pathname in URLs that corresponds to that directory. Thus, the directive just shown associates scripts located in /usr/local/apache/cgi-bin with URLs that have cgi-bin following the hostname. For example, if you install the Ruby script myscript.rb in the directory /usr/local/apache/cgi-bin on the host localhost, you’d request it with this URL:

http://localhost/cgi-bin/myscript.rb

When configured this way, the cgi-bin directory can contain scripts written in any language. Because of this, the directory is language-neutral, so Apache needs to be able to figure out which language processor to use to execute each script that is installed there. To provide this information, the first line of the script should begin with #! followed by the pathname to the program that executes the script. For example, a script that begins with the following line will be run by Perl:

#!/usr/bin/perl

Under Unix, you must also make the script executable (use chmod +x), or it won’t run properly. The #! line just shown is appropriate for a system that has Perl installed as /usr/bin/perl. If your Perl interpreter is installed somewhere else, modify the line accordingly. If you’re on a Windows machine with Perl installed as C:\Perl\bin\perl.exe, the #! line should look like this:

#!C:\Perl\bin\perl

For Windows, another option that is simpler is to set up a filename extension association between script names that end with a .pl suffix and the Perl interpreter. Then the script can begin like this:

#!perl

Directories used only for scripts generally are placed outside of your Apache document tree. As an alternative to using specific directories for scripts, you can identify scripts by filename extension, so that files with a particular suffix become associated with a specific language processor. In this case, you can place them anywhere in the document tree. This is the most common way to use PHP. For example, if you have Apache configured with PHP support built in using the mod_php module, you can tell it that scripts having names ending with .php should be interpreted as PHP scripts. To do so, add this line to httpd.conf:

AddType application/x-httpd-php .php

You may also have to add (or uncomment) a LoadModule directive for php.

With PHP enabled, you can install a PHP script myscript.php under htdocs (the Apache document root directory). The URL for invoking the script becomes:

http://localhost/myscript.php

If PHP runs as an external standalone program, you’ll need to tell Apache where to find it. For example, if you’re running Windows and you have PHP installed as C:\Php\php.exe, put the following lines in httpd.conf (note the use of forward slashes in the pathnames rather than backslashes):

ScriptAlias /php/ "C:/Php/"

AddType application/x-httpd-php .php

Action application/x-httpd-php /php/php.exe

For purposes of showing URLs in examples, I’m going to assume that Perl, Ruby, and Python scripts are in your cgi-bin directory, and that PHP scripts are in the mcb directory of your document tree, identified by the .php extension. That means the URLs for scripts in these languages will look like this:

http://localhost/cgi-bin/myscript.pl

http://localhost/cgi-bin/myscript.rb

http://localhost/cgi-bin/myscript.py

http://localhost/mcb/myscript.php

If you plan to use a similar setup, make sure that you have a cgi-bin directory that Apache knows about, and an mcb directory under your Apache document root. Then, to deploy Perl, Ruby, or Python web scripts, install them in the cgi-bin directory. To deploy PHP scripts, install them in themcb directory.

Some of the scripts use modules or library files that are specific to this book. If you have these files installed in a library directory that your language processors search by default, they should be found automatically. Otherwise, you’ll need to indicate where the files are located. An easy way to do this is by using SetEnv directives in your httpd.conf file to set environment variables that can be seen by your scripts when Apache invokes them. (Use of the SetEnv directive requires that the mod_env Apache module be enabled.) For example, if you install library files in/usr/local/lib/mcb, the following directives enable Perl, Ruby, and Python scripts to find them:

SetEnv PERL5LIB /usr/local/lib/mcb

SetEnv RUBYLIB /usr/local/lib/mcb

SetEnv PYTHONPATH /usr/local/lib/mcb

For PHP, add /usr/local/lib/mcb to the value of include_path in your php.ini configuration file.

For background information on library-related environment variables and the php.ini file, see Writing Library Files.

After Apache has been configured to support script execution, restart it. Then you can begin to write scripts that generate web pages. The remainder of this section describes how to do so for Perl, Ruby, PHP, and Python. The examples for each language connect to the MySQL server, run a simple query, and display the results in a web page. The scripts shown here indicate whether there are any additional modules or libraries that web scripts typically need to include. (Later on, I’ll generally assume that the proper modules have been included and will show only script fragments.)

Before we proceed further, I should mention a couple of debugging tips:

§ If you request a web script and get an error page in response, have a look at the Apache error log, which is a useful source of diagnostic information when you’re trying to figure out why a script doesn’t work. A common name for this log is error_log in the logs directory. If you don’t find any such file, check httpd.conf for an ErrorLog directive to see where Apache logs its errors.

§ Sometimes it’s helpful to directly examine the output that a web script generates. You can do this by invoking the script from the command line. You’ll see the HTML produced by the script, as well as any error messages that it might print. Some web modules expect to see a parameter string, and might even prompt you for one when you invoke the script at the command line. When this is the case, you might be able to supply the parameters as an argument on the command line to avoid the prompt. For example, the Ruby cgi module expects to see parameters, and will prompt you for them if they are missing:

§ %myscript.rb

(offline mode: enter name=value pairs on standard input)

At the prompt, enter the parameter values and then enter Ctrl-D (EOF). To avoid the prompt, supply the parameters on the command line:

%myscript.rb "param1=val1;param2=val2;param3=val3"

To specify “no parameters” explicitly, provide an empty argument:

%myscript.rb ""

WEB SECURITY NOTE

Under Unix, scripts are associated with particular user and group IDs when they execute. Scripts that you execute from the command line run with your own user and group IDs, and have the filesystem privileges associated with your account. Scripts executed by a web server don’t run with your user and group ID, nor will they have your privileges. Instead, they run under the user and group ID of the account the web server has been set to run as, and with that account’s privileges. (To determine what account this is, look for User and Group directives in the httpd.conf configuration file.) This means that if you expect web scripts to read and write files, those files must be accessible to the account used to run the web server. For example, if your server runs under thenobody account and you want a script to be able to store uploaded image files into a directory called uploads in the document tree, you must make that directory readable and writable by the nobody user.

Another implication is that if other people can write scripts to be executed by your web server, those scripts too will run as nobody and they can read and write the same files as your own scripts. That is, files used by your scripts cannot be considered private only to your scripts. A solution to this problem is to use the Apache suEXEC mechanism. (If you are using an ISP for web hosting, you might find that suEXEC is already enabled.)

Perl

The following listing shows our first web-based Perl script, show_tables.pl. It produces an appropriate Content-Type: header, a blank line to separate the header from the page content, and the initial part of the page. Then it retrieves and displays a list of tables in the cookbook database. The table list is followed by the trailing HTML tags that close the page:

#!/usr/bin/perl

# show_tables.pl - Display names of tables in cookbook database

# by generating HTML directly

use strict;

use warnings;

use Cookbook;

# Print header, blank line, and initial part of page

print <<EOF;

Content-Type: text/html

<html>

<head>

<title>Tables in cookbook Database</title>

</head>

Tables in cookbook database:

EOF

# Connect to database, display table list, disconnect

my $dbh = Cookbook::connect ();

my $stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME";

my $sth = $dbh->prepare ($stmt);

$sth->execute ();

while (my @row = $sth->fetchrow_array ())

{

print "$row[0] \n";

}

$dbh->disconnect ();

# Print page trailer

print <<EOF;

</body>

</html>

EOF

To try the script, install it in your cgi-bin directory and request it from your browser as follows:

http://localhost/cgi-bin/show_tables.pl

show_tables.pl generates the Content-Type: header explicitly and it produces HTML elements by including literal tags in print statements. Another approach to web page generation is to use the CGI.pm module, which makes it easy to write web scripts without writing literal HTML tags. CGI.pm provides an object-oriented interface and a function call interface, so you can use it to write web pages in either of two styles. Here’s a script, show_tables_oo.pl, that uses the CGI.pm object-oriented interface to produce the same report as show_tables.pl:

#!/usr/bin/perl

# show_tables_oo.pl - Display names of tables in cookbook database

# using the CGI.pm object-oriented interface

use strict;

use warnings;

use CGI;

use Cookbook;

# Create CGI object for accessing CGI.pm methods

my $cgi = new CGI;

# Print header, blank line, and initial part of page

print $cgi->header ();

print $cgi->start_html (-title => "Tables in cookbook Database",

-bgcolor => "white");

print $cgi->p ("Tables in cookbook database:");

# Connect to database, display table list, disconnect

my $dbh = Cookbook::connect ();

my $stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME";

my $sth = $dbh->prepare ($stmt);

$sth->execute ();

while (my @row = $sth->fetchrow_array ())

{

print $row[0], $cgi->br ();

}

$dbh->disconnect ();

# Print page trailer

print $cgi->end_html ();

The script includes the CGI.pm module with a use CGI statement, and then creates a CGI object, $cgi, through which it invokes the various HTML-generation calls. header() generates the Content-Type: header and start_html() produces the initial page tags up through the opening<body> tag. After generating the first part of the page, show_tables_oo.pl retrieves and displays information from the server. Each table name is followed by a tag, produced by invoking the br() method. end_html() produces the closing </body> and </html> tags. When you install the script in your cgi-bin directory and invoke it from a browser, you can see that it generates the same type of page as show_tables.pl.

CGI.pm calls often take multiple parameters, many of which are optional. To enable you to specify just those parameters you need, CGI.pm understands -name => value notation in parameter lists. For example, in the start_html() call, the title parameter sets the page title andbgcolor sets the background color. The -name => value notation also allows parameters to be specified in any order, so these two statements are equivalent:

print $cgi->start_html (-title => "My Page Title", -bgcolor => "white");

print $cgi->start_html (-bgcolor => "white", -title => "My Page Title");

To use the CGI.pm function call interface rather than the object-oriented interface, you must write scripts a little differently. The use line that references CGI.pm should import the method names into your script’s namespace so that you can invoke them directly as functions without having to create a CGI object. For example, to import the most commonly used methods, the script should include this statement:

use CGI qw(:standard);

The following script, show_tables_fc.pl, is the function call equivalent of the show_tables_oo.pl script just shown. It uses the same CGI.pm calls, but invokes them as standalone functions rather than through a $cgi object:

#!/usr/bin/perl

# show_tables_fc.pl - Display names of tables in cookbook database

# using the CGI.pm function-call interface

use strict;

use warnings;

use CGI qw(:standard); # import standard method names into script namespace

use Cookbook;

# Print header, blank line, and initial part of page

print header ();

print start_html (-title => "Tables in cookbook Database",

-bgcolor => "white");

print p ("Tables in cookbook database:");

# Connect to database, display table list, disconnect

my $dbh = Cookbook::connect ();

my $stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME";

my $sth = $dbh->prepare ($stmt);

$sth->execute ();

while (my @row = $sth->fetchrow_array ())

{

print $row[0], br ();

}

$dbh->disconnect ();

# Print page trailer

print end_html ();

Install the show_tables_fc.pl script in your cgi-bin directory and request it from your browser to verify that it produces the same output as show_tables_oo.pl.

This book uses the CGI.pm function call interface for Perl-based web scripts from this point on. You can get more information about CGI.pm at the command line by using the following commands to read the installed documentation:

%perldoc CGI

% perldoc CGI::Carp

Appendix D lists other sources of information for this module, both online and in print form.

Ruby

The Ruby cgi module provides an interface to HTML-generating methods. To use it, create a CGI object and invoke its methods to produce HTML page elements. Methods are named after the HTML elements to which they correspond. Their invocation syntax follows these principles:

§ If an element should have attributes, pass them as arguments to the method.

§ If the element has body content, specify the content in a code block associated with the method call.

For example, the following method call produces a element that includes an align attribute and content of “This is a sentence”:

cgi.p("align" => "left") { "This is a sentence" }

The output looks like this:

This is a sentence.

To display generated HTML content, pass it in a code block to the cgi.out method. The following Ruby script, show_tables.rb, retrieves a list of tables in the cookbook database and displays them in an HTML document:

#!/usr/bin/ruby -w

# show_tables.rb - Display names of tables in cookbook database

require "cgi"

require "Cookbook"

# Connect to database, display table list, disconnect

dbh = Cookbook.connect

stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME"

rows = dbh.select_all(stmt)

dbh.disconnect

cgi = CGI.new("html4")

cgi.out {

cgi.html {

cgi.head {

cgi.title { "Tables in cookbook Database" }

} +

cgi.body("bgcolor" => "white") {

cgi.p { "Tables in cookbook Database:" } +

rows.collect { |row| row[0] + cgi.br }.join

}

The collect method iterates through the row array containing the table names and produces a new array containing each name with a appended to it. The join method concatenates the strings in the resulting array.

The script includes no explicit code for producing the Content-Type: header because cgi.out generates one.

Install the script in your cgi-bin directory and request it from your browser as follows:

http://localhost/cgi-bin/show_tables.rb

If you invoke Ruby web scripts from the command line so that you examine the generated HTML, you’ll see that the HTML is all on one line and is difficult to read. To make the output easier to understand, process it through the CGI.pretty utility method, which adds line breaks and indentation. Suppose that your page output call looks like this:

cgi.out {page content here

}

To change the call to use CGI.pretty, write it like this:

cgi.out {

CGI.pretty(page content here)

}

PHP

PHP doesn’t provide much in the way of tag shortcuts, which is surprising given that language’s web orientation. On the other hand, because PHP is an embedded language, you can simply write your HTML literally in your script without using print statements. Here’s a show_tables.phpscript that shifts back and forth between HTML mode and PHP mode:

<?php

# show_tables.php - Display names of tables in cookbook database

require_once "Cookbook.php";

<html>

<head>

<title>Tables in cookbook Database</title>

</head>

Tables in cookbook database:

<?php

# Connect to database, display table list, disconnect

$conn =& Cookbook::connect ();

$stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME";

$result =& $conn->query ($stmt);

while (list ($tbl_name) = $result->fetchRow ())

print ($tbl_name . " \n");

$result->free ();

$conn->disconnect ();

</body>

</html>

To try the script, put it in the mcb directory of your web server’s document tree and request it from your browser as follows:

http://localhost/mcb/show_tables.php

The PHP script includes no code to produce the Content-Type: header because PHP produces one automatically. (If you want to override this behavior and produce your own headers, consult the header() function section in the PHP manual.)

Except for the break tags, show_tables.php includes HTML content by writing it outside of the <?php and ?> tags so that the PHP interpreter simply writes it without interpretation. Here’s a different version of the script that produces all the HTML using print statements:

<?php

# show_tables_print.php - Display names of tables in cookbook database

# using print() to generate all HTML

require_once "Cookbook.php";

print ("<html>\n");

print ("<head>\n");

print ("<title>Tables in cookbook Database</title>\n");

print ("</head>\n");

print ("<body bgcolor=\"white\">\n");

print ("Tables in cookbook database:\n");

# Connect to database, display table list, disconnect

$conn =& Cookbook::connect ();

$stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME";

$result =& $conn->query ($stmt);

while (list ($tbl_name) = $result->fetchRow ())

print ($tbl_name . " \n");

$result->free ();

$conn->disconnect ();

print ("</body>\n");

print ("</html>\n");

Sometimes it makes sense to use one approach, sometimes the other—and sometimes both within the same script. If a section of HTML doesn’t refer to any variable or expression values, it can be clearer to write it in HTML mode. Otherwise it may be clearer to write it using print or echostatements, to avoid switching between HTML and PHP modes frequently.

Python

A standard installation of Python includes cgi and urllib modules that are useful for web programming. However, we don’t actually need them yet, because the only web-related activity of our first Python web script is to generate some simple HTML. Here’s a Python version of the MySQL table-display script:

#!/usr/bin/python

# show_tables.py - Display names of tables in cookbook database

import MySQLdb

import Cookbook

# Print header, blank line, and initial part of page

print """Content-Type: text/html

<html>

<head>

<title>Tables in cookbook Database</title>

</head>

Tables in cookbook database:

"""

# Connect to database, display table list, disconnect

conn = Cookbook.connect ()

cursor = conn.cursor ()

stmt = """

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME

"""

cursor.execute (stmt)

for (tbl_name, ) in cursor.fetchall ():

print tbl_name + " "

cursor.close ()

conn.close ()

# Print page trailer

print """

</body>

</html>

"""

Put the script in Apache’s cgi-bin directory and request it from your browser like this:

http://localhost/cgi-bin/show_tables.py

Using Tomcat to Run Web Scripts

Problem

You want to run Java-based programs in a web environment.

Solution

Write programs using JSP notation and execute them using a servlet container such as Tomcat.

Discussion

As described in Using Apache to Run Web Scripts, Apache can be used to run Perl, Ruby, PHP, and Python scripts. For Java, a different approach is needed because Apache doesn’t serve JSP pages. Instead, we’ll use Tomcat, a server designed for processing Java in a web environment. Apache and Tomcat are very different servers, but there is a familial relationship—like Apache, Tomcat is a development effort of the Apache Software Foundation.

This recipe provides an overview of JSP programming with Tomcat, but makes several assumptions:

§ You have some familiarity with the concepts underlying JavaServer Pages, such as what a servlet container is, what an application context is, and what the basic JSP scripting elements are.

§ The Tomcat server has been installed so that you can execute JSP pages, and you know how to start and stop it.

§ You are familiar with the Tomcat webapps directory and how a Tomcat application is structured. In particular, you understand the purpose of the WEB-INF directory and the web.xml file.

§ You know what a tag library is and how to use one.

I recognize that this is a lot to assume. If you’re unfamiliar with JSP or need instructions for installing Tomcat, Appendix C provides the necessary background information.

Once you have Tomcat in place, you should install the following components so that you can work through the JSP examples in this book:

§ The mcb sample application located in the tomcat directory of the recipes distribution.

§ A MySQL JDBC driver. You might already have one installed for use with the scripts in earlier chapters, but Tomcat needs a copy, too. This book uses MySQL Connector/J.

§ The JSP Standard Tag Library (JSTL), which contains tags for performing database activities, conditional testing, and iterative operations within JSP pages.

This section discusses how to install these components, provides a brief overview of some of the JSTL tags, and describes how to write the JSP equivalent of the MySQL table-display script that was implemented in Using Apache to Run Web Scripts using Perl, Ruby, PHP, and Python.

Installing the mcb application

Web applications for Tomcat typically are packaged as files WAR (web archive) files and installed under its webapps directory, which is roughly analogous to Apache’s htdocs document root directory. The recipes distribution includes a sample application named mcb that you can use for trying the JSP examples described here. Look in the distribution’s tomcat directory, where you will find a file named mcb.war. Copy that file to Tomcat’s webapps directory.

Here’s an example installation procedure for Unix, assuming that the recipes distribution and Tomcat are located at /u/paul/recipes and /usr/local/jakarta-tomcat. Adjust the pathnames as necessary for your own system. The command to install mcb.war looks like this:

%cp /u/paul/recipes/tomcat/mcb.war /usr/local/jakarta-tomcat/webapps

For Windows, if the relevant directories are C:\recipes and C:\jakarta-tomcat, use this command:

C:\>copy C:\recipes\tomcat\mcb.war C:\jakarta-tomcat\webapps

After copying the mcb.war file to the webapps directory, restart Tomcat. As distributed, Tomcat is configured by default to look for WAR files under webapps when it starts up and automatically unpack any that have not already been unpacked. This means that restarting Tomcat after copyingmcb.war to the webapps directory should be enough to unpack the mcb application. When Tomcat finishes its startup sequence, look under webapps and you should see a new mcb directory under which are all the files contained in mcb.war. (If Tomcat doesn’t unpack mcb.war automatically, see the sidebar “Unpacking a WAR File Manually.”) If you like, have a look around in the mcb directory at this point. It should contain several files that clients can request using a browser. There should also be a WEB-INF subdirectory, which is used for information that is private—that is, available for use by scripts in the mcb directory, but not directly accessible by clients.

Next, verify that Tomcat can serve pages from the mcb application context by requesting some of them from your browser. The following URLs request in turn a static HTML page, a servlet, and a simple JSP page:

http://localhost:8080/mcb/test.html

http://localhost:8080/mcb/servlet/SimpleServlet

http://localhost:8080/mcb/simple.jsp

Adjust the hostname and port number in the URLs appropriately for your installation.

UNPACKING A WAR FILE MANUALLY

WAR files are actually ZIP-format archives that can be unpacked using jar, WinZip, or any other tool that understands ZIP files. However, when unpacking a WAR file manually, you’ll need to create its top-level directory first. The following sequence of steps shows one way to do this, using the jar utility to unpack a WAR file named mcb.war that is assumed to be located in Tomcat’swebapps directory. For Unix, change location to the webapps directory, and then issue the following commands:

%mkdir mcb

% cd mcb

% jar xf ../mcb.war

For Windows, the commands are only slightly different:

C:\>mkdir mcb

C:\> cd mcb

C:\> jar xf ..\mcb.war

Unpacking the WAR file in the webapps directory creates a new application context, so you’ll need to restart Tomcat before it notices the new application.

Installing the JDBC driver

The JSP pages in the mcb application need a JDBC driver for connecting to the cookbook database. The following instructions describe how to install the MySQL Connector/J driver; the installation procedure for other drivers should be similar.

To install MySQL Connector/J for use by Tomcat applications, place a copy of it in Tomcat’s directory tree. Assuming that the driver is packaged as a JAR file (as is the case for MySQL Connector/J), there are three likely places under the Tomcat root directory where you can install it, depending on how visible you want the driver to be:

§ To make the driver available only to the mcb application, place it in the mcb/ WEB-INF/lib directory under Tomcat’s webapps directory.

§ To make the driver available to all Tomcat applications but not to Tomcat itself, place it in the shared/lib directory under the Tomcat root.

§ To make the driver available both to applications and to Tomcat, place it in the common/lib directory under the Tomcat root.

I recommend installing a copy of the driver in the common/lib directory. That gives it the most global visibility (it will be accessible both by Tomcat and by applications), and you’ll need to install it only once. If you enable the driver only for the mcb application by placing a copy inmcb/WEB-INF/lib, but then develop other applications that use MySQL, you’ll need to either copy the driver into those applications or move it to a more global location.

Making the driver more globally accessible also is useful if you think it likely that at some point you’ll elect to use JDBC-based session management or realm authentication. Those activities are handled by Tomcat itself above the application level, so Tomcat needs access to the driver to carry them out.

Here’s an example installation procedure for Unix, assuming that the MySQL Connector/J driver and Tomcat are located at /tmp/mysql-connector-java-bin.jar and /usr/local/jakarta-tomcat. Adjust the pathnames as necessary for your own system. The command to install the driver looks like this:

%cp /tmp/mysql-connector-java-bin.jar /usr/local/jakarta-tomcat/common/lib

For Windows, if the components are installed at C:\mysql-connector-java-bin.jar and C:\jakarta-tomcat, use this command:

C:\>copy C:\mysql-connector-java-bin.jar C:\jakarta-tomcat\common\lib

After installing the driver, restart Tomcat and then request the following mcb application page to verify that Tomcat can find the JDBC driver properly:

http://localhost:8080/mcb/jdbc_test.jsp

You might need to edit jdbc_test.jsp first to change the connection parameters.

Installing the JSTL distribution

Most of the scripts that are part of the mcb sample application use JSTL, so it’s necessary to install it or those scripts won’t work. To install a tag library into an application context, copy the library’s files into the proper locations under the application’s WEB-INF directory. Generally, this means installing at least one JAR file and a tag library descriptor (TLD) file, and adding some tag library information to the application’s web.xml file. JSTL actually consists of several tag sets, so there are there are several JAR files and TLD files. The following instructions describe how to install JSTL for use with the mcb application:

1. Make sure that the mcb.war file has been unpacked to create the mcb application directory hierarchy under the Tomcat webapps directory. (Refer back to Installing the mcb application.) This is necessary because the JSTL files must be installed under the mcb/WEB-INF directory, which will not exist until mcb.war has been unpacked.

2. Get the JSTL distribution from the Jakarta Project web site. Go to the Jakarta Taglibs project page, which is accessible at http://jakarta.apache.org/taglibs/.

Follow the Standard Taglib link to get to the JSTL information page; the latter has a Downloads section from which you can get the binary JSTL distribution. Be sure to get version 1.1.2 or higher.

3. Unpack the JSTL distribution into some convenient location, preferably outside of the Tomcat hierarchy. The commands to do this are similar to those used to unpack Tomcat itself (see the section “Installing a Tomcat Distribution” in Appendix C). For example, to unpack a ZIP-format distribution, use the following command, adjusting the filename as necessary:

%jar xf jakarta-taglibs-standard.zip

4. Unpacking the distribution creates a directory containing several files. Copy the JAR files (jstl.jar, standard.jar, and so forth) to the mcb/WEB-INF/lib directory. These files contain the class libraries that implement the JSTL tag actions. Copy the tag library descriptor files (c.tld,sql.tld, and so forth) to the mcb/WEB-INF directory. These files define the interface for the actions implemented by the classes in the JAR files.

5. The mcb/WEB-INF directory contains a file named web.xml that is the web application deployment descriptor file (a fancy name for “configuration file”). Modify web.xml to add <taglib> entries for each of the JSTL TLD files. The entries look something like this:

6. <taglib>

7. <taglib-uri>http://java.sun.com/jsp/jstl/core</taglib-uri>

8. <taglib-location>/WEB-INF/c.tld</taglib-location>

9. </taglib>

10.<taglib>

11. <taglib-uri>http://java.sun.com/jsp/jstl/sql</taglib-uri>

12. <taglib-location>/WEB-INF/sql.tld</taglib-location>

</taglib>

Each <taglib> entry contains a <taglib-uri> element that specifies the symbolic name by which mcb JSP pages refer to the corresponding TLD file, and a <taglib-location> element that indicates the location of the TLD file under the mcb application directory. (You’ll find that web.xml as distributed already contains these entries. However, you should take a look at them to make sure they match the filenames of the TLD files that you just installed in the previous step.)

13.The mcb/WEB-INF directory also contains a file named jstl-mcb-setup.inc. This file is not part of JSTL itself, but it contains a JSTL <sql:setDataSource> tag that is used by many of the mcb JSP pages to set up a data source for connecting to the cookbook database. The file looks like this:

14.<sql:setDataSource

15. var="conn"

16. driver="com.mysql.jdbc.Driver"

17. url="jdbc:mysql://localhost/cookbook"

18. user="cbuser"

19. password="cbpass"

Edit the driver, url, user, and password tag attributes as necessary to change the connection parameters to those that you use for accessing the cookbook database. Do not change the var attribute.

20.The JSTL distribution also includes WAR files containing documentation and examples (standard-doc.war and standard-examples.war). If you want to install these, copy them into Tomcat’s webapps directory. (I recommand that you install the documentation so that you can access it locally from your own server. It’s useful to install the examples as well, because they provide helpful demonstrations showing how to use JSTL tags in JSP pages.)

21.Restart Tomcat so that it notices the changes you’ve just made to the mcb application and so that it unpacks the WAR files containing the JSTL documentation and examples. If Tomcat doesn’t unpack WAR files for you automatically, refer back to the sidebar “Unpacking a WAR File Manually.”

After installing JSTL and restarting Tomcat, request the following mcb application page to verify that Tomcat can find the JSTL tags properly:

http://localhost:8080/mcb/jstl_test.jsp

Writing JSP pages with JSTL

This section discusses the syntax for some of the JSTL tags used most frequently by mcb JSP pages. The descriptions are very brief, and many of these tags have additional attributes that allow them to be used in ways other than those shown here. For more information, consult the JSTL specification (see Appendix D).

A JSP page that uses JSTL must include ain taglib directive for each tag set that the page uses. Examples in this book use the core and database tags, identified by the following taglib directives:

<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>

<%@ taglib uri="http://java.sun.com/jsp/jstl/sql" prefix="sql" %>

The uri values should match the symbolic values that are listed in the web.xml <taglib> entries (see the previous section Installing the JSTL distribution). The prefix values indicate the initial string used in tag names to identify tags as part of a given tag library.

JSTL tags are written in XML format, using a special syntax for tag attributes to include expressions. Within tag attributes, text is interpreted literally unless enclosed within ${...}, in which case it is interpreted as an expression to be evaluated. The following sections summarize some of the commonly used core and database tags.

The JSTL core tag set. The following tags are part of the JSTL core tag set:

<c:out>

This tag evaluates its value attribute and is replaced by the result. One common use for this tag is to provide content for the output page. The following tag produces the value 3:

<c:out value="${1+2}"/>

<c:set>

This tag assigns a value to a variable. For example, to assign a string to a variable named title and then use the variable later in the <title> element of the output page, do this:

<c:set var="title" value="JSTL Example Page"/>

<html>

<head>

</head>

...

This example illustrates a principle that is generally true for JSTL tags: to specify a variable into which a value is to be stored, name it without using ${...} notation. To refer to that variable’s value later, use it within ${...} so that it is interpreted as an expression to be evaluated.

<c:if>

This tag evaluates the conditional test given in its test attribute. If the test result is true, the tag body is evaluated and becomes the tag’s output; if the result is false, the body is ignored:

<c:if test="${1 != 0}">

1 is not equal to 0

</c:if>

The comparison operators are ==, !=, <, >, <=, and >=. The alternative operators eq, ne, lt, gt, le, and ge make it easier to avoid using special HTML characters in expressions. Arithmetic operators are +, -, *, / (or div), and % (or mod). Logical operators are && (or and), || (or or), and ! (or not). The special empty operator is true if a value is empty or null:

<c:set var="x" value=""/>

<c:if test="${empty x}">

x is empty

</c:if>

<c:set var="y" value="hello"/>

<c:if test="${!empty y}">

y is not empty

</c:if>

The <c:if> tag does not provide any “else” clause. To perform if/then/else testing, use the <c:choose> tag.

<c:choose>

This is another conditional tag, but it allows multiple conditions to be tested. Include a <c:when> tag for each condition that you want to test explicitly, and a <c:otherwise> tag if there is a “default” case:

<c:choose>

<c:when test="${count == 0}">

Please choose an item

</c:when>

<c:when test="${count gt 1}">

Please choose only one item

</c:when>

<c:otherwise>

Thank you for choosing exactly one item

</c:otherwise>

</c:choose>

<c:forEach>

This tag acts as an iterator, enabling you to loop over a set of values. The following example uses a <c:forEach> tag to loop through a set of rows in the result set from a query (represented here by the rs variable):

<c:forEach items="${rs.rows}" var="row">

id = <c:out value="${row.id}"/>,

name = <c:out value="${row.name}"/>

</c:forEach>

Each iteration of the loop assigns the current row to the variable row. Assuming that the query result includes columns named id and name, their values are accessible as row.id and row.name.

The JSTL database tag set. The JSTL database tags enable you to issue SQL statements and access their results:

<sql:setDataSource>

This tag sets up connection parameters to be used when JSTL contacts the database server. For example, to specify parameters for using the MySQL Connector/J driver to access the cookbook database, the tag looks like this:

<sql:setDataSource

var="conn"

driver="com.mysql.jdbc.Driver"

url="jdbc:mysql://localhost/cookbook"

user="cbuser"

password="cbpass"

The driver, url, user, and password attributes specify the connection parameters, and the var attribute names the variable to associate with the connection. By convention, mcb JSP pages in this book use the variable conn, so tags occurring later in the page that require a data source can refer to the connection using the expression ${conn}.

To avoid listing connection parameters repeatedly in JSP pages that use MySQL, you can put the <sql:setDataSource> tag in a file and include the file from each page that needs a database connection. For the recipes distribution, this include file is WEB-INF/jstl-mcb-setup.inc. JSP pages can access the file as follows to set up the database connection:

<%@ include file="/WEB-INF/jstl-mcb-setup.inc" %>

To change the connection parameters used by the mcb pages, edit jstl-mcb-setup.inc.

<sql:update>

To issue a statement such as UPDATE, DELETE, or INSERT that doesn’t return rows, use a <sql:update> tag. A dataSource tag attribute indicates the data source, the rows-affected count resulting from the statement is returned in the variable named by the var attribute, and the statement itself should be specified in the tag body:

<sql:update dataSource="${conn}" var="count">

DELETE FROM profile WHERE id > 100

</sql:update>

Number of rows deleted: <c:out value="${count}"/>

<sql:query>

To process statements that return a result set, use <sql:query>. As with <sql:update>, the dataSource attribute indicates the data source, and the text of the statement is given in the tag body. The <sql:query> tag also takes a var attribute that names a variable to associate with the result set so that you can access the rows of the result:

<sql:query dataSource="${conn}" var="rs">

SELECT id, name FROM profile ORDER BY id

</sql:query>

By convention, the mcb JSP pages use rs as the name of the result set variable. Strategies for accessing the contents of a result set are outlined shortly.

<sql:param>

You can write data values literally into a statement string, but JSTL also allows the use of placeholders, which is helpful for values that contain characters that are special in SQL statements. Use a ? character for each placeholder in the statement string, and provide values to be bound to the placeholders using <sql:param> tags in the body of the statement-issuing tag. A data value can be specified either in the body of an <sql:param> tag or in its value attribute:

<sql:update dataSource="${conn}" var="count">

DELETE FROM profile WHERE id > ?

<sql:param>100</sql:param>

</sql:update>

<sql:query dataSource="${conn}" var="rs">

SELECT id, name FROM profile WHERE cats = ? AND color = ?

<sql:param value="1"/>

<sql:param value="green"/>

</sql:query>

The contents of a result set returned by <sql:query> are accessible several ways. Assuming that you have associated a variable named rs with the result set, you can access row i of the result either as rs.rows[ i ] or as rs.rowsByIndex[ i ], where row numbers begin at 0. The first form produces a row with columns that can be accessed by name. The second form produces a row with columns that can be accessed by column number (beginning with 0). For example, if a result set has columns named id and name, you can access the values for the third row by using column names like this:

<c:out value="${rs.rows[2].id}"/>

<c:out value="${rs.rows[2].name}"/>

To use column numbers instead, do this:

<c:out value="${rs.rowsByIndex[2][0]}"/>

<c:out value="${rs.rowsByIndex[2][1]}"/>

You can also use <c:forEach> as an iterator to loop through the rows in a result set. To access column values by name, iterate using rs.rows:

<c:forEach items="${rs.rows}" var="row">

id = <c:out value="${row.id}"/>,

name = <c:out value="${row.name}"/>

</c:forEach>

To access column values by number, iterate using rs.rowsByIndex:

<c:forEach items="${rs.rowsByIndex}" var="row">

id = <c:out value="${row[0]}"/>,

name = <c:out value="${row[1]}"/>

</c:forEach>

The number of rows in the result set is available as rs.rowCount:

Number of rows selected: <c:out value="${rs.rowCount}"/>

Names of the columns in the result set are available using rs.columnNames:

<c:forEach items="${rs.columnNames}" var="name">

<c:out value="${name}"/>

</c:forEach>

Writing a MySQL script using JSP and JSTL

Using Apache to Run Web Scripts shows how to write Perl, Ruby, PHP, and Python versions of a script to display the names of the tables in the cookbook database. With the JSTL tags, we can write a corresponding JSP page that provides that information as follows:

<%-- show_tables.jsp - Display names of tables in cookbook database --%>

<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>

<%@ taglib uri="http://java.sun.com/jsp/jstl/sql" prefix="sql" %>

<%@ include file="/WEB-INF/jstl-mcb-setup.inc" %>

<html>

<head>

<title>Tables in cookbook Database</title>

</head>

Tables in cookbook database:

<sql:query dataSource="${conn}" var="rs">

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'cookbook' ORDER BY TABLE_NAME

</sql:query>

<c:forEach items="${rs.rowsByIndex}" var="row">

<c:out value="${row[0]}"/>

</c:forEach>

</body>

</html>

The taglib directives identify which tag libraries the script needs, and the include directive pulls in the code that sets up a data source for accessing the cookbook database. The rest of the script generates the page content.

Assuming that you’ve installed the mcb.war file in your Tomcat server’s webapps directory as described earlier, you should find the show_tables.jsp script in the mcb subdirectory. Request it from your browser as follows:

http://localhost:8080/mcb/show_tables.jsp

The JSP script does not produce any Content-Type: header explicitly. The JSP engine produces a default header with a content type of text/html automatically.

Encoding Special Characters in Web Output

Problem

Certain characters are special in web pages and must be encoded if you want to display them literally. Because database content often contains instances of these characters, scripts that include query results in web pages should encode those results to prevent browsers from misinterpreting the information.

Solution

Use the methods that are provided by your API for performing HTML-encoding and URL-encoding.

Discussion

HTML is a markup language: it uses certain characters as markers that have a special meaning. To include literal instances of these characters in a page, you must encode them so that they are not interpreted as having their special meanings. For example, < should be encoded as < to keep a browser from interpreting it as the beginning of a tag. Furthermore, there are actually two kinds of encoding, depending on the context in which you use a character. One encoding is appropriate for HTML text, another is used for text that is part of a URL in a hyperlink.

The MySQL table-display scripts shown in Recipes and are simple demonstrations of how to produce web pages using programs. But with one exception, the scripts have a common failing: they take no care to properly encode special characters that occur in the information retrieved from the MySQL server. (The exception is the JSP version of the script. The <c:out> tag used there handles encoding automatically, as we’ll discuss shortly.)

As it happens, I deliberately chose information to display that is unlikely to contain any special characters; the scripts should work properly even in the absence of any encoding. However, in the general case, it’s unsafe to assume that a query result will contain no special characters, so you must be prepared to encode it for display in a web page. Neglecting to do this often results in scripts that generate pages containing malformed HTML that displays incorrectly.

This recipe describes how to handle special characters, beginning with some general principles, and then discusses how each API implements encoding support. The API-specific examples show how to process information drawn from a database table, but they can be adapted to any content you include in a web page, no matter its source.

General encoding principles

One form of encoding applies to characters that are used in writing HTML constructs; another applies to text that is included in URLs. It’s important to understand this distinction so that you don’t encode text inappropriately.

NOTE

Encoding text for inclusion in a web page is an entirely different issue from encoding special characters in data values for inclusion in an SQL statement. Handling Special Characters and NULL Values in Statements discusses the latter issue.

Encoding characters that are special in HTML. HTML markup uses < and > characters to begin and end tags, & to begin special entity names (such as to signify a nonbreaking space), and " to quote attribute values in tags (such as ). Consequently, to display literal instances of these characters, you should encode them as HTML entities so that browsers or other clients understand your intent. To do this, convert the special characters <, >, &, and " to the corresponding HTML entity designators shown in the following table.

Special character	HTML entity
<	<
>	>
&	&
"	"

Suppose that you want to display the following string literally in a web page:

Paragraphs begin and end with & tags.

If you send this text to the client browser exactly as shown, the browser will misinterpret it: the and tags will be taken as paragraph markers and the & may be taken as the beginning of an HTML entity designator. To display the string the way you intend, encode the special characters as the <, >, and & entities:

Paragraphs begin and end with & tags.

The principle of encoding text this way is also useful within tags. For example, HTML tag attribute values usually are enclosed within double quotes, so it’s important to perform HTML-encoding on attribute values. Suppose that you want to include a text-input box in a form, and you want to provide an initial value of Rich "Goose" Gossage to be displayed in the box. You cannot write that value literally in the tag like this:

The problem here is that the double-quoted value attribute includes internal double quotes, which makes the <input> tag malformed. The proper way to write it is to encode the double quotes:

When a browser receives this text, it decodes the " entities back to " characters and interprets the value attribute value properly.

Encoding characters that are special in URLs. URLs for hyperlinks that occur within HTML pages have their own syntax and their own encoding. This encoding applies to attributes within several tags:

Many characters have special meaning within URLs, such as :, /, ?, =, &, and ;. The following URL contains some of these characters:

http://localhost/myscript.php?id=428&name=Gandalf

Here the : and / characters segment the URL into components, the ? character indicates that parameters are present, and the & character separates the parameters, each of which is specified as a name=value pair. (The ; character is not present in the URL just shown, but commonly is used instead of & to separate parameters.) If you want to include any of these characters literally within a URL, you must encode them to prevent the browser from interpreting them with their usual special meaning. Other characters such as spaces require special treatment as well. Spaces are not allowed within a URL, so if you want to reference a page named my home page.html on the local host, the URL in the following hyperlink won’t work:

URL-encoding for special and reserved characters is performed by converting each such character to % followed by two hexadecimal digits representing the character’s ASCII code. For example, the ASCII value of the space character is 32 decimal, or 20 hexadecimal, so you’d write the preceding hyperlink like this:

Sometimes you’ll see spaces encoded as + in URLs. That is legal, too.

Use the appropriate encoding method for the context. Be sure to encode information properly for the context in which you’re using it. Suppose that you want to create a hyperlink to trigger a search for items matching a search term, and you want the term itself to appear as the link label that is displayed in the page. In this case, the term appears as a parameter in the URL, and also as HTML text between the <a> and </a> tags. If the search term is “cats & dogs”, the unencoded hyperlink construct looks like this:

That is incorrect because & is special in both contexts and the spaces are special in the URL. The link should be written like this instead:

Here, & is HTML-encoded as & for the link label, and is URL-encoded as %26 for the URL, which also includes spaces encoded as %20.

Granted, it’s a pain to encode text before writing it to a web page, and sometimes you know enough about a value that you can skip the encoding. (See the sidebar, “Do You Always Need to Encode Web Page Output?”) But encoding is the safe thing to do most of the time. Fortunately, most APIs provide functions to do the work for you. This means you need not know every character that is special in a given context. You just need to know which kind of encoding to perform, so that you can call the appropriate function to produce the intended result.

DO YOU ALWAYS NEED TO ENCODE WEB PAGE OUTPUT?

If you know a value is legal in a particular context within a web page, you need not encode it. For example, if you obtain a value from an integer-valued column in a database table that cannot be NULL, it must necessarily be an integer. No HTML- or URL-encoding is needed to include the value in a web page, because digits are not special in HTML text or within URLs. On the other hand, suppose that you solicit an integer value using a field in a web form. You might be expecting the user to provide an integer, but the user might be confused and enter an illegal value. You could handle this by displaying an error page that shows the value and explains that it’s not an integer. But if the value contains special characters and you don’t encode it, the page won’t display the value properly, possibly further confusing the user.

Encoding special characters using web APIs

The following encoding examples show how to pull values out of MySQL and perform both HTML-encoding and URL-encoding on them to generate hyperlinks. Each example reads a table named phrase that contains short phrases and then uses its contents to construct hyperlinks that point to a (hypothetical) script that searches for instances of the phrases in some other table. The table contains the following rows:

mysql>SELECT phrase_val FROM phrase ORDER BY phrase_val;

+--------------------------+

| phrase_val |

+--------------------------+

| are we "there" yet? |

| cats & dogs |

| rhinoceros |

| the whole > sum of parts |

+--------------------------+

The goal here is to generate a list of hyperlinks using each phrase both as the hyperlink label (which requires HTML-encoding) and in the URL as a parameter to the search script (which requires URL-encoding). The resulting links look something like this:

are we "there" yet?</a>

cats & dogs</a>

rhinoceros</a>

the whole > sum of parts</a>

The initial part of the href attribute value will vary per API. Also, the links produced by some APIs will look slightly different because they encode spaces as + rather than as %20.

Perl. The Perl CGI.pm module provides two methods, escapeHTML() and escape(), that handle HTML-encoding and URL-encoding. There are three ways to use these methods to encode a string $str:

§ Invoke escapeHTML() and escape() as CGI class methods using a CGI:: prefix:

§ use CGI;

printf "%s\n%s\n", CGI::escape ($str), CGI::escapeHTML ($str);

§ Create a CGI object and invoke escapeHTML() and escape() as object methods:

§ use CGI;

§ my $cgi = new CGI;

printf "%s\n%s\n", $cgi->escape ($str), $cgi->escapeHTML ($str);

§ Import the names explicitly into your script’s namespace. In this case, neither a CGI object nor the CGI:: prefix is necessary and you can invoke the methods as standalone functions. The following example imports the two method names in addition to the set of standard names:

§ use CGI qw(:standard escape escapeHTML);

printf "%s\n%s\n", escape ($str), escapeHTML ($str);

I prefer the last alternative because it is consistent with the CGI.pm function call interface that you use for other imported method names. Just remember to include the encoding method names in the use CGI statement for any Perl script that requires them, or you’ll get “undefined subroutine”errors when the script executes.

The following code reads the contents of the phrase table and produces hyperlinks from them using escapeHTML() and escape():

my $stmt = "SELECT phrase_val FROM phrase ORDER BY phrase_val";

my $sth = $dbh->prepare ($stmt);

$sth->execute ();

while (my ($phrase) = $sth->fetchrow_array ())

{

# URL-encode the phrase value for use in the URL

my $url = "/cgi-bin/mysearch.pl?phrase=" . escape ($phrase);

# HTML-encode the phrase value for use in the link label

my $label = escapeHTML ($phrase);

print a ({-href => $url}, $label), br (), "\n";

}

Ruby. The Ruby cgi module contains two methods, CGI.escapeHTML() and CGI.escape(), that perform HTML-encoding and URL-encoding. However, both methods raise an exception unless the argument is a string. One way to deal with this is to apply the to_s method to any argument that might not be a string, to force it to string form and convert nil to the empty string. For example:

stmt = "SELECT phrase_val FROM phrase ORDER BY phrase_val"

dbh.execute(stmt) do |sth|

sth.fetch do |row|

# make sure that the value is a string

phrase = row[0].to_s

# URL-encode the phrase value for use in the URL

url = "/cgi-bin/mysearch.rb?phrase=" + CGI.escape(phrase)

# HTML-encode the phrase value for use in the link label

label = CGI.escapeHTML(phrase)

page << cgi.a("href" => url) { label } + cgi.br + "\n"

end

page is used here as a variable that “accumulates” page content and that eventually you pass to cgi.out to display the page.

PHP. In PHP, thehtmlspecialchars() and urlencode() functions perform HTML-encoding and URL-encoding. Use them as follows:

$stmt = "SELECT phrase_val FROM phrase ORDER BY phrase_val";

$result =& $conn->query ($stmt);

if (!PEAR::isError ($result))

{

while (list ($phrase) = $result->fetchRow ())

{

# URL-encode the phrase value for use in the URL

$url = "/mcb/mysearch.php?phrase=" . urlencode ($phrase);

# HTML-encode the phrase value for use in the link label

$label = htmlspecialchars ($phrase);

printf ("<a href=\"%s\">%s</a> \n", $url, $label);

}

$result->free ();

}

Python. In Python, the cgiand urllib modules contain the relevant encoding methods. cgi.escape() and urllib.quote() perform HTML-encoding and URL-encoding. However, both methods raise an exception unless the argument is a string. One way to deal with this is to apply the str() method to any argument that might not be a string, to force it to string form and convert None to the string "None". (If you want None to convert to the empty string, you need to test for it explicitly.) For example:

import cgi

import urllib

stmt = "SELECT phrase_val FROM phrase ORDER BY phrase_val"

cursor = conn.cursor ()

cursor.execute (stmt)

for (phrase,) in cursor.fetchall ():

# make sure that the value is a string

phrase = str (phrase)

# URL-encode the phrase value for use in the URL

url = "/cgi-bin/mysearch.py?phrase=" + urllib.quote (phrase)

# HTML-encode the phrase value for use in the link label

label = cgi.escape (phrase, 1)

print "<a href=\"%s\">%s</a> " % (url, label)

cursor.close ()

The first argument to cgi.escape() is the string to be HTML-encoded. By default, this function converts <, >, and & characters to their corresponding HTML entities. To tell cgi.escape() to also convert double quotes to the " entity, pass a second argument of 1, as shown in the example. This is especially important if you’re encoding values to be placed into a double-quoted tag attribute.

Java. The <c:out> JSTL tag automatically performs HTML-encoding for JSP pages. (Strictly speaking, it performs XML-encoding, but the set of characters affected is <, >, &, ", and ', which includes all those needed for HTML-encoding.) By using <c:out> to display text in a web page, you need not even think about converting special characters to HTML entities. If for some reason you want to suppress encoding, invoke <c:out> with an encodeXML attribute value of false:

<c:out value="value to display" encodeXML="false"/>

To URL-encode parameters for inclusion in a URL, use the <c:url> tag. Specify the URL string in the tag’s value attribute, and include any parameter values and names in <c:param> tags in the body of the <c:url> tag. A parameter value can be given either in the value attribute of a<c:param> tag or in its body. Here’s an example that shows both uses:

<c:url var="urlStr" value="myscript.jsp">

<c:param name="id" value ="47"/>

<c:param name="color">sky blue</c:param>

</c:url>

This will URL-encode the values of the id and color parameters and add them to the end of the URL. The result is placed in an object named urlStr, which you can display as follows:

<c:out value="${urlStr}"/>

NOTE

The <c:url> tag does not encode special characters such as spaces in the string supplied in its value attribute. You must encode them yourself, so it’s probably best to avoid creating pages with spaces in their names, to avoid the likelihood that you’ll need to refer to them.

To display entries from the phrase table, use the <c:out> and <c:url> tags as follows:

<sql:query dataSource="${conn}" var="rs">

SELECT phrase_val FROM phrase ORDER BY phrase_val

</sql:query>

<c:forEach items="${rs.rows}" var="row">

<%-- URL-encode the phrase value for use in the URL --%>

<c:url var="urlStr" value="/mcb/mysearch.jsp">

<c:param name="phrase" value ="${row.phrase_val}"/>

</c:url>

<%-- HTML-encode the phrase value for use in the link label --%>

<c:out value="${row.phrase_val}"/>

</a>

</c:forEach>