Web Development under UNIX - Enterprise Solutions - UNIX: The Complete Reference (2007)

UNIX: The Complete Reference (2007)

Part VI: Enterprise Solutions

Chapter 27: Web Development under UNIX

Overview

In Chapter 16, you learned how a Web server is set up. In this chapter, you will learn about how to create up a web site. That is, you will learn how to develop the web pages that make up a web site. You may want to create your own web pages for a variety of reasons. For example, you may want to create a personal home page to tell the world about yourself, as well as your family, travels, hobbies, and politics, among other things. You may also want to provide links to your own favorite web sites. You may have your own business and would like to build web pages to advertise your products and/or services and even to take orders. You may want to help build web pages for an educational institution or for a charitable organization. No matter what your reason, you will find building web pages easier and more rewarding than you think.

This chapter tells you how to get started creating Hypertext Markup Language (HTML) documents, which hold the content and formatting elements that are presented as pages on the web. You will learn the syntax and formatting tags of basic HTML. You will also learn about web development standards such as JavaScript, the Document Object Model, and Cascading Style Sheets that you can use to get beyond the limitations of simple HTML. The chapter will also give you an introduction to web programming with the Common Gateway Interface and the PHP language, with which you can develop web applications.

This chapter does not tell you what to name your HTML documents or how to make them available to others on the web. That depends on the software platform you are using, the web server running on your platform, and the local server configuration. Contact your system administrator or a local guru for the specifics. If you know that a web server is installed on your machine and that user directories have been enabled (see the section “User Directories” of Chapter 16), try the following steps to create and test a simple personal home page:

1. Create a directory directly under your home directory with the name public_html with permissions of 0755, that is, with world read and search permission.

2. Following a simple example from the section “Creating an HTML Document” in this chapter, construct your home page in a file named index.html in the public_html directory Give the index.html file 0644 permissions-that is, with world read permission. The $HOME/public_html directory is the default directory for personal home pages for many web servers, including the Apache Web Server, discussed in Chapter 16.

3. Browse your home page with the following URL: http://my.machine.name/~user_name

where user_name is your UNIX user ID on the web server machine.

Though the preceding URL does not explicitly include the public_html directory or index.html file, many web servers look for an index.html file by default in the public_html directory belonging to user_name and serve it to web browsers automatically If Step 3 succeeds and you get a valid web page in your browser, your web server is set up and ready for use. If Step 3 fails, you should contact your system administrator.

History of the Web and Web Standards

Before you start constructing web pages, you should know something about the history and background of HTML, especially about the HTML standards that have been promulgated so that the web pages that form the World Wide Web can be reliably accessed by most popular web browsers.

The seeds of the web go back to the work of Ted Nelson in the 1960s. Ted coined the term “hypertext” for “non-sequential writing” or text that is not constrained to be linear. Hypermedia is a term used for hypertext that is not constrained to be text. That is, it can include graphics, video, and sound, all of which are encompassed by the web today

In the late 1980s, Bill Atkinson, a programmer working for Apple Computer, Inc., developed Hypercard for the Macintosh, which enabled users to construct a series of on-screen “filing cards” that contained textual and graphical information. Users could navigate these cards by pressing on-screen buttons, taking themselves on a tour of the information in the process. Hypercard and its imitators made documentation easier to navigate. However, these packages had the limitation that hypertext jumps could only be made to files on the same computer. Jumps made to files stored on computers on a local network, much less on the other side of the world, were out of the question. A system involving hypertext links on a global scale had not been conceived yet.

The Early Web

Several Internet services existed for information retrieval prior to the advent of the web, including FTP, WAIS, and Gopher. Each of these services had a distinct user interface. Although each interface was satisfactory by itself, the combination of several dissimilar interfaces created complexity for users. The problems increased if a service was not used frequently enough so that the operational details had to be relearned at each use.

In 1989, Tim Berners-Lee invented a prototype system based on hypertext that would eventually evolve into the web. At the time, he was working in a computing services section of CERN, the European Laboratory for Particle Physics in Geneva, Switzerland. The original idea was to enable particle physics researchers from remote sites around the world to organize and pool information. But Tim wanted to take the repository of information files a step further by employing the hypertext concept of allowing cross-reference links to be created in the text of the files. Scientific and mathematical documentation could be represented as a “web” of information held in electronic form on computers across the world.

To try to make global hypertext links feasible, Berners-Lee saw the need for an approach to implement these hyperlinks that was simpler and more cross-platform than the thenexisting hypertext applications. He demonstrated a basic but attractive way of publishing text using client-server software he developed himself, and also using a simple protocol that he developed-HTTP-for jumping to other documents via hypertext links. (For more information on HTTP, see Chapter 10.) The text-markup language that he used to create this demonstration “web” of documents was called HTML.

Berners-Lee’s HTML was based on SGML (Standard Generalized Mark-up Language), an international standard method for marking up text into structural units such as paragraphs, headings, and list items. SGML could be implemented on any machine. The idea was to make the language independent from the formatter (the browser or other viewing software) that displayed the formatted text on the screen. SGML does not include hypertext links; support for local as well as remote hypertext links was purely Berners-Lee’s invention, as was the now-familiar “www.name.name” convention for addressing machines on the web.

From the beginning (Ca. 1991), Berners-Lee took the important step of openly discussing his ideas online across the Internet (mostly through e-mail lists). In 1992, researchers from the National Center for Supercomputer Applications (NCSA) of the University of Illinois at Champaign-Urbana joined the HTTP-HTML discussion; in 1994, the NCSA would release Mosaic, the first web browser that included some of the basic web features we are familiar with.

In May 1994, when the World Wide Web had caught the imagination of academics but not businessmen, the first World Wide Web conference was held in Geneva, Switzerland. At this conference, a draft HTML 2 standard was first introduced and the importance of the fledgling web’s operating with a proper HTML specification was discussed. HTML+ was unveiled, and it was agreed that the work on HTML+ would be carried forward to the development of a proposed HTML 3 standard. Features of HTML+ included text flow around a figure with captions, resizable tables, image backgrounds, math symbols, and other features.

The draft HTML 2 standard was circulated through the Internet community for comment in 1994. The ideas of early HTML enthusiasts and early web browser developers were incorporated into the draft HTML 2 standard. And in July 1994, a Document Type Definition for HTML 2, a precise description of the language was released.

In September 1994, the Internet Engineering Task Force (IETF)-the international standards and development body of the Internet-set up an HTML working group. In November 1995, HTML 2.0 was officially published as an IETF standard.

Also in 1994, a former member of the Mosaic project at NCSA named Marc Andreessen and a tech entrepreneur named James Clark formed what would become the Netscape Communications Corporation. The Netscape Navigator web browser would soon become the first truly usable and most widely used web browser of the early web. Netscape began a trend of the dominant web browser owner “extending” or ignoring existing HTML standards because of their near monopoly on the end-user’s experience of the web through their web browser. With Microsoft’s Internet Explorer having long surpassed Netscape as the dominant web browser, the monopolistic tendency to flout HTML standards continues to be somewhat of an obstacle to the widespread acceptance of web standards.

Out of concern that the fledgling web would fragment as different web server and browser vendors pushed their own proprietary web “standards,” the World Wide Web Consortium (W3C), headed up by Tim Berners-Lee, was formed in late 1994 with the common goal to push the development of open standards by which the web could continue to grow and reach its full potential. From its inception, the W3C has sought and continues to seek to build industry consensus on web standards, often not an easy task.

In December 1995, the IETF HTML working group was dismantled, since it was having difficulties coming to consensus quickly enough to deal with the fast-evolving HTML standard. In February 1996, the World Wide Web Consortium formed the HTML Editorial Review Board (ERB). The ERB included representatives from IBM, Microsoft, Netscape, Novell, Softquad, and the W3C. The ERB’s aim was to collaborate and agree upon a common standard for HTML at a time when competing web browsers each implemented a different subset of the language. The ERB would later become the HTML Working Group.

The Dynamic Web

Early web publishers consisted mainly of academic and government institutions, and their web pages usually described their work and their organizations. It wasn’t long before businesses realized the opportunities offered by the web, and commercial sites began to appear. (The .com Internet domain had existed since 1985.) In the early 90s, the majority of commercial web sites included contact and product information. However, by 1994 a few enterprises started experimenting with the web as a new medium for commerce. The deployment of commerce on the web was enabled by several emerging web technologies such as secure transactions (introduced in the Netscape Navigator browser in 1994) and online database access through the Common Gateway Interface (CGI). CGI itself was developed around 1993 largely because of the rapidly growing web required search engines that would take user input on web page forms, create online database queries based on the user input, and then generate and display search result index pages. In 1995, Amazon.com and eBay.com, two of the biggest names in web commerce history, were launched. The dot-com boom of the late 90s followed. The web had changed significantly and become mainstream.

The web that most users experience today bears little resemblance to the original distributed repository of simple, static HTML files and text from the early 90s. Much of the web content that is browsed today is dynamically generated by programs responding to user inputs. The web pages that are browsed today-informational as well as commerce-oriented pages-can rightly be called web applications, generating responses to user input or generating content that is customized according to users’ preferences after they have logged into their own personal account. Today, web applications are used to implement web-based e-mail clients, online retail sales, online auctions, wikis, discussion boards, web logs, multiplayer online role-playing games, and other functions. Commonly used technologies to create dynamic web pages include JavaScript, CGI, PHP, Java, ASP, and ISAPI. Web pages may have JavaScript programs embedded in them that are executed by the web browser to generate page elements in response to certain events such as the user clicking a button or moving the mouse cursor over the navigation menu of a page. JavaScript will be discussed later in this chapter. Web pages may also be entirely generated by CGI programs or contain embedded PHP code. Whereas JavaScript programs are run by the web browser, CGI and PHP programs are run by the web server to generate web pages. CGI and PHP will also be discussed later in this chapter.

HTML Standards

Whether written from scratch or generated by scripts, web pages still basically consist of HTML code. And the HTML code that is sent to web browsers for display must adhere to current HTML standards. Due to the efforts of the W3C and others over the years, the major web browsers in use today do expect that the HTML documents that are sent to them conform to a common set of standards. Non-standards compliant HTML can produce some peculiar looking web pages. The following are the important HTML standards that have been published by the W3C (see also http://www.w3.org/MarkUp/#recommendations):

HTML The HTML standard has grown over the years; that is, the number of HTML markup tags, which are interpreted by web browsers to generate the web page elements we are familiar with, has grown in number. A significant version of the HTML standard was version 3.2, which introduced such elements as tables, text flow around figures, subscripts and superscripts, and frames. HTML 3.2 was introduced in January 1997 and was widely used to create web sites. However, the widespread use of HTML 3.2 tags such as <font> and the “color” attribute is seen as a negative development that went against the original intent of HTML, which was to focus on content rather than formatting. Development of large web sites in which fonts and color information had to be added to every single web page became a long and laborious process. In December 1997, HTML 4.0 was published. Version 4.0 contained language innovations for the disabled and support for international languages, as well as providing style sheet support, extensions to forms, scripting, and more. The unproductive formatting elements such as the <font> tag and “color” attribute were declared to be “deprecated” in Version 4.0. Version 4.0 was published in three “flavors”: (1) “Strict,” in which the deprecated formatting elements are forbidden, (2) “Transitional,” in which the deprecated elements are allowed, and (3) “Frameset,” in which mostly only frame-related elements are allowed. HTML 4.01, published in December 1999, is the current and final version of the HTML standard. It contains minor revisions to HTML 4.0.

XHTML XHTML (Extensible Hypertext Markup Language) is the W3C’s successor to HTML. XHTML is a reformulation of HTML 4.01 using the Extensible Markup Language (XML). XML (a February 1998 W3C Recommendation) is a simplified subset of SGML, which was the basis of the HTML language. XML was created primarily to facilitate the sharing of data across different systems, especially systems connected across the Internet. XML is a standard for creating markup languages that describe the structure of data. It is not a fixed set of elements like HTML, but rather, it enables authors to define their own descriptive tags. XML has already been used to create file formats for applications such as office suites (http://en.wikipedia.org/wiki/OpenDocument). So XHTML can be thought of as just one of several data formats that XML has been used to create. One of the potential benefits of the move to XHTML is that another file format based on XML (say a spreadsheet or a drawing) can easily be transformed into XHTML for display on the web. Another potential benefit is that a complex web page written in XHTML can be more easily simplified for display on less capable devices such as a personal digital assistant or cell phone display. The familiar markup and formatting elements in HTML are preserved in XHTML, but the syntax is stricter in XHTML; for example, HTML tags can be upper- or lowercase, but XHTML tags must be lowercase, since XML is case sensitive. XHTML 1.0 was published in January 2000 as a W3C recommendation and later revised and republished in August 2002. It contains the same three “flavors” that were introduced in HTML 4.0. XHTML 1.1 was published in May 2001 as a W3C recommendation. It is based on XHTML 1.0 “Strict” with minor changes.

HTML Syntax Basics

An HTML document consists of ordinary text interspersed with HTML tags. The browser uses the tags to help it format the document for display. A tag consists of text (called a directive) enclosed in angle brackets (< and >).

Depending on the function, tags are used singly or in pairs. A pair of tags indicates a region of the document that should be displayed in a particular way, for example as a header or in a distinctive type style. Most tags are used in pairs, enclosing text between a starting and ending tag. The ending tag looks just the same as the starting tag except that a forward slash (/) precedes the directive within the angle brackets. A single tag tells the browser to do something at a particular point in the document, for example, to start a new paragraph or insert a horizontal rule.

<h1>This is the text of a header</h1>

<p>This is text of a paragraph.

HTML is not case sensitive-that is, "<TITLE>," "<title>," and "<TiTlE>" are all equivalent. However, XHTML is case sensitive because XML is case sensitive. The XHTML standard mandates that tags be lowercase. Accordingly, it is recommended that you use lowercase for all tags.

A Minimum Document

The following HTML document shows the simplicity of the language and how easy it is to get started. Although strictly speaking, the document is not legal because a couple of directives were omitted, it will work fine with most browsers.

<title>A minimum home page title</title>

Hello, World. This is my first home page.

You can quickly view this page by invoking a browser and passing the filename as a command-line argument, like this:

$ mozilla file.html

The phrase “A minimum home page title” surrounded by the <title></title> tags is the title of the document and is displayed in the top border of the window. The text “Hello, World. This is my first home page.” is the content and is displayed in the content region of the browser.

Every document should have a title. The title is displayed separately from the document and is used for identification in other contexts, for example in a browser bookmark file or personal menu bar. Some web search services archive the titles of all web pages and search for keywords contained in the titles. By choosing a title carefully, you can make it easier for others to find your page.

A Proper Minimum Document

Next, let’s make the preceding document legal by adding a little window dressing. Although browsers will not usually complain if this is omitted, you should include it to comply with the HTML specifications and for compatibility with future browsers that may not be so permissive. The first bit of window dressing that we need to look at is the Document Type Definition.

The W3C has recommended that web sites do something called “validate”; that is, the HTML code used in web site HTML documents should conform to a written W3C standard. One of the conditions for a web page to be validated is that it contain the correct Document Type Definition (also called a “Doctype”) for the kind of page that it is presenting. The Doctype essentially tells the web browser which rules to follow when rendering a page on the screen. Doctypes are considered essential to the proper rendering and functioning of web documents in modern, standards-compliant web browsers. In order to specify which HTML standard they conform to, all HTML documents should start with a Document Type Declaration. For example,

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"

"http://www.w3.org/TR/html4/strict.dtd">

Doctype Declarations have three parts as shown here:

· Start: <!DOCTYPE HTML PUBLIC

· Public identifier: "-//W3C//DTD HTML 4.01 //EN"

· System identifier: "http://www.w3.org/TR/html4/strict.dtd">

The preceding example DTD declares that this document conforms to the Strict DTD of the HTML 4.01 standard. The W3C’s own list of valid Doctypes is at http://www.w3.org/QA/2002/04/valid-dtd-list.html. The presence or absence of a DTD in an HTML document may influence how a web browser will display that document.

Generally, modern web browsers have two rendering modes, “standards” mode and “quirks” mode. When a web browser loads an HTML document that is missing a Doctype, that begins with an invalid Doctype, or begins with an HTML 3.2–4.1 “Transitional” flavor Doctype, it will attempt to render that HTML document in “quirks” mode, emulating the parsing, page rendering, and bugs of older browsers from the mid-to-late ‘90s. If the HTML document begins with a valid Doctype, then using “standards” mode, a modern browser will do its best to render the document according to the W3C recommendations, up to and including XHTML 1.0.

Here is the proper minimum document with the Doctype at the top:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"

"http://www.w3.org/TR/html4/strict.dtd">

<html>

<head>

<title>A minimum home page title</title>

</head>

<body>

Hello, World. This is my first home page.

</body>

</html>

The <html> directive indicates that all text up to </html> is an HTML document. The text between <head> and </head> is header information, and the text between <body> and </body> is the body or content of the document. Figure 27–1 shows how the proper minimal document is rendered in a browser.

Image from book
Figure 27–1: A proper minimal HTML document

Headings

Six levels of headings are supported by HTML, numbered 1 through 6, with 1 being the most prominent. Headings are displayed larger and/or bolder than normal body text. Headings are important in a document to enhance appearance and readability The syntax follows for the heading tag, and Figure 27–2 shows how the headings would be rendered.

Image from book
Figure 27–2: Six levels of the heading tag

<html>

<head>

<title>Header Examples</title>

</head>

<body>

<h1>Heading level 1</h1>

<h2>Heading level 2</h2>

<h3>Heading level 3</h3>

<h4>Heading level 4</h4>

<h5>Heading level 5</h5>

<h6>Heading level 6</h6>

</body>

</html>

The header level does not tell the browser how big or how bold to make the header text on an absolute scale, but only in relationship to the other header levels. This is an important concept that illustrates a basic principle of HTML. For the most part, tags in HTML describe the function that a particular text serves in the document, but they do not indicate exactly how the text should be displayed. That decision is left to the browser, perhaps with consideration for user preferences. In contrast, a typesetting language like that read by troff describes the appearance of the page down to the last detail, leaving nothing up to the typesetting program.

Paragraphs

Unlike documents in most word processors, HTML documents accord no significance to carriage returns and white space. Word wrapping can occur at any point in the document, and multiple spaces are collapsed into a single space. This means that the formatting you infer by the appearance of the HTML source file is completely ignored by the browser (with the exception of text tagged as preformatted). A nicely formatted source file, with extra space between paragraphs, indents, and line breaks, will be collapsed into a hopelessly unreadable solid block of text. Instead, you have to note paragraph breaks with the <p> tag. The following sample document is rendered in a browser window in Figure 27–3.

<html>

<head>

<title>Paragraph Break Example</title>

</head>

<body>

peeped into the book her sister was reading, but it had no

conversations in it, 'and what is the use of a book,' thought

Alice 'without pictures or conversation?'

<p>

So she was considering in her own mind (as well as she could,

for the hot day made her feel very sleepy and stupid), whether

the pleasure of making a daisy chain would be worth the trouble.

</body>

</html>

Image from book
Figure 27–3: The paragraph break tag in action

The <p> tag is one of the few that is not required to be used in open-close pairs.

Hypertext Links

The capability to link one document to another, anywhere in the world, is what sets HTML and the web apart from all predecessors. Hypertext links are the single most important factor in the incredible success of the web. (The other single most important factor is the integration of dissimilar services into one consistent user interface.) Links are described this way:

<a href="target_url">link text</a>

The address of the document that is being linked to is indicated by “target_url”. The phrase “link text” is displayed in a distinctive style, such as in a contrasting color or underlined, indicating that it is a hyperlink.

The browser follows the hyperlink to “target_url” when this link text is clicked with the mouse or otherwise selected. The tag name comes from the notion of an “anchor” for the hyperlink. Here is an example of a hyperlink:

<a href="http://www.foobar.com">Visit the FooBar home page.</a>

You can specify an image for a hyperlink instead of text with the following:

<a href="http://www.foobar.com"><img src="logo.gif"></a>

Here the image described by the file logo.gif will be displayed with a distinctive border that indicates it is a hyperlink. Clicking anywhere in the image will follow the link.

Inline Images

Inline images are indicated in HTML with the <img> tag as follows:

<img src=file_path>

where file_path is the name of the image file relative to the root of the server’s directory hierarchy

If the image reference appeared on a page accessed with a user’s URL (i.e., a URL including -user in the document path), file_path is relative to the user’s web directory hierarchy.

By default the bottom of an image is aligned with the adjacent text. Include “align=top” if you want the top of the image aligned with the adjacent text, like this:

<img align=top src=logo.gif>

Several image formats are in common use, for example, .gif and .jpg. However, not all browsers support all formats. Unless you know that your target audience uses only one type of browser, you may be better off using only .gif- or .jpg-format inline images. Like everything else about the web, the image formats supported by specific browsers are likely to change by the time you read this, so look for up-to-the-minute information before committing to a particular format.

Images can add a lot to the visual appeal of a document, but on slow links such as serial modems (yes, they are still used) they can also be frustrating because of the amount of information that has to be sent to describe the image. There are a few things that you can do to improve performance when using images. Modems have the capability to compress the data they transfer. The amount of compression attained depends on the degree of randomness in the data; completely random data cannot be compressed. A simple image with a small number of colors will transfer significantly faster than a complex image with many colors and a lot of detail (such as photos). Of course the size is a factor as well but less so than image detail. Most browsers cache images on the local disk drive. This means that an image only has to be transferred on the first reference; thereafter, the browser obtains it from the local cache. You can take advantage of caching by keeping the number of different images to a minimum. For example, if your documents include navigation icons (e.g., home, next, previous) on each page, use the same ones on all pages. In other words, don’t use different images for the “next” icon on each of your pages.

Image Maps

The coordinates of the mouse position within a hyperlink image are sent to the server if the “ismap” directive is included in the <img> tag:

<a href=http: //page1. html><img src=logo.gif ismap></a>

The coordinates are sent along with the hypertext reference when the mouse is clicked. This is a powerful feature that makes it possible for the server to customize the response according to the position in an image where the mouse is clicked. For example, the mouse coordinates in the image of a control panel would indicate which control button was selected. In an image of a geographic map, the mouse coordinates might indicate a region of interest to the user.

Processing “ismap” requests at the server may require system administrator access to the web server’s designated CGI-BIN directory and server configuration files.

Named Anchors

A hyperlink ordinarily takes you to the top of the page of the new document. You can also link to a specific section within a document so that the section is displayed when the link is followed. This can be useful when linking from one document to a section within a large document or from a table of contents or index to other sections within the same document. First, define the points within the document that you are linking to, like this:

<a name=anchor_name>Associated Text</a>

“Associated Text” will appear at or near the top of the document when the link is followed to it. However, it is not displayed in a distinctive style because it is the destination of a link, not the origin of a link. Next, create a link to the target document and section as shown here:

<a href=http://www.foobar.com/big_page.html#anchor_name>HyperLink Text</a>

The term “anchor_name” is the binding text and appears in the URL separated from the pathname with a “#” symbol. If the origin and destination of a named anchor hyperlink are within the same document, only the anchor name is needed in the link, as shown here:

<a href=http://#anchor_name>HyperLink Text</a>

Lists

Several types of lists are supported by the HTML language. All lists start with an opening tag and end with a closing tag, and all elements in the list are marked with an item tag. Lists can be arbitrarily nested. A list item can contain a list. A single list item can also include a number of paragraphs, each containing additional lists. List presentation varies from browser to browser. Some may provide successive levels of indent for nested lists or vary the bullets used with unnumbered lists.

Unordered Lists

The exact presentation of an unordered list is browser-specific and might include bullets, dashes, or some other distinctive icon. Start the list with <ul>, precede each list item with <li>, and end the list with </ul>. Figure 27–4 shows how the list is rendered.

<html>

<head>

<title>An Unordered List</title>

</head>

<body>

<ul>

<li>Alice

<li>Rabbit

<li>Dinah

</ul>

</body>

</html>

Image from book
Figure 27–4: An unordered list

Ordered Lists

Items in an ordered list are preceded by a number indicating the position of the item. The browser chooses the numbers, so you never have to maintain them as you modify the list. Numbers start at 1 at the beginning of each list.

Start the list with <ol>, precede each list item with <li>, and end the list with </ol>. The following HTML incorporates an ordered list (with the results shown in Figure 27–5):

<html>

<head>

<title>An Ordered List</title>

</head>

<body>

<ol>

<li>Alice

<li>Rabbit

<li>Dinah

</ol>

</body>

</html>

Image from book
Figure 27–5: An ordered list

Descriptive Lists

A descriptive list consists of an item name followed by a definition or description. Start the list with <dl>, precede the item name with <dt> and the item definition with <dd>, and end the list with </dl>. The following HTML incorporates a descriptive list (with the results shown in Figure 27–6):

<html>

<head>

<title>A Descriptive List</title>

</head>

<body>

<dl>

<dt>Alice

<dd>Alice is the main character in the book.

<dt>Rabbit

<dd>The Rabbit led Alice down the rabbit hole.

<dt>Dinah

<dd>Dinah was Alice's cat.

</dl>

</body>

</html>

Image from book
Figure 27–6: A descriptive list

Phrase Markup

In page layout it is common to use a distinctive style of type, border, indent, and other typo-graphic features to convey the logical function of document sections and to provide visual discrimination between sections. HTML includes definitions for many logical styles likely to be found in technical documentation, including source code, sample text, keyboard phrases (i.e., something you type), variable phrases (i.e., a generic prototype for information you supply), citation phrases, and typewriter text.

Although HTML includes the definitions for many logical styles, it is up to the browser to display each in a distinctive way Some do, some don’t, and what they do depends on the browser. Certain browsers display source code, sample text, keyboard phrases, and typewriter text all in the same typeface, and other phrases in different typefaces. So the text here,

<head>

<title>Phrase Markup Examples</title>

</head>

<body>

<code>code - Source code phrase</code><br>

<samp>samp - Sample text or characters</samp><br>

<kbd>kbd - Keyboard phrase</kbd><br>

<var>var - Variable phrase</var><br>

<cite>cite - Citation phrase</cite><br>

<em>em - Emphasized Phrase</em><br>

<strong>strong - Strong Emphasis</strong><br>

</body>

</html>

displays as shown in Figure 27–7.

Image from book
Figure 27–7: Phrase markup

You may also indicate certain typographic features by physical style, such as bold, italic, or typewriter text:

<head>

<title>Physical Style Examples</title>

</head>

<body>

<b>b - Bold Text</b><br>

<i>i - Italic Text</i><br>

<tt>tt - Typewriter Text</tt><br>

</body>

</html>

as shown in Figure 27–8.

Image from book
Figure 27–8: Physical style markup

Preformatted Text

Sometimes you may want to prevent the browser from mangling your document and instead display it just as it appears in your source file. For example, a section of C code, carefully indented and commented, would ordinarily be rendered unreadable by the browser.

The browser will preserve the layout of text enclosed between <pre> and </pre>, including all spaces, tabs, and newlines:

<html>

<head>

<title>Preformatted Text Example</title>

</head>

<body>

<pre>

main()

{

printf( "Hello, world\n"

}

</pre>

</body>

as shown in Figure 27–9.

Image from book
Figure 27–9: Preformatted text

Comments

Comments are introduced with “<!--” and end with “-->”. They are useful for including nondisplayed annotations in HTML source and for temporarily suppressing the display of a section of source.

<!-- This is an HTML comment. -->

Line Breaks

Because the browser ignores the format or layout of the HTML source file, you must specify line breaks explicitly with the <br> tag. Unlike a paragraph tag (<p>), the line break tag does not add any extra space. The following code,

<html>

<head>

<title>Line Break Example</title>

</head>

<body>

peeped into the book her sister was reading, but it had no

conversations in it, 'and what is the use of a book,' thought

Alice 'without pictures or conversation?'

<br>

So she was considering in her own mind (as well as she could,

for the hot day made her feel very sleepy and stupid), whether

the pleasure of making a daisy-chain would be worth the trouble.

</body>

</html>

produces the page shown in Figure 27–10.

Image from book
Figure 27–10: A line break

Horizontal Rules

The <hr> tag produces a break in the text and a horizontal rule the width of the browser’s window. Use it to separate document sections.

Forms

A form provides a mechanism to collect data from a user viewing your web page. Using a variety of devices such as text boxes, menus, check boxes, and radio buttons, a user can enter data onto the form and click a Submit button to send the data back to a server for processing. Here is an example of an HTML form, and the resulting page is shown in Figure 27–11:

<html>

<head>

<title>Forms Example</title>

</head>

<body>

<form>

<input name=name10 type=text value="initial value"> text 1

<input name=name11 type=text> text 2

<input name=name12 type=text> text 3

<hr>

<input name=name2 type=checkbox> checkbox 1

<input name=name2 type=checkbox> checkbox 2

<input name=name2 type=checkbox> checkbox 3

<hr>

<input name=name3 type=radio> radio 1

<input name=name3 type=radio> radio 2

<input name=name3 type=radio> radio 3

<hr>

<select>

<option name=sel1> selection 1

<option name=sel2> selection 2

<option name=sel3> selection 3

</select>

<hr>

<textarea name=txt1 rows=5 cols=40>

This is default textarea input

</textarea>

<hr>

<input name=sub1 type=submit>

<input name=sub2 type=reset>

</form>

</body>

</html>

Image from book
Figure 27–11: Example of an HTML form

JavaScript and the Document Object Model

The JavaScript programming language has had an important role in the evolution of the web since it was developed by Brendan Eich of Netscape and included in the Netscape Navigator 2 browser in late 1995. JavaScript’s core script syntax closely resembled Java, so it was named JavaScript when it was released, though it is otherwise unrelated to Java. JavaScript made client-side web applications possible; that is, JavaScript code could be embedded in web pages and would be interpreted by the web browser to do things such as process numbers and modify the contents of HTML forms. The way that JavaScript referenced HTML elements such as forms, links, and anchors as “children” of the document “object,” and the way that it handled form inputs as children of their parent form became known as the Document Object Model (DOM) level 0. The DOM, in short, is the specification for how objects in a web page (text, images, headers, links, etc.) are represented. The DOM defines what attributes are associated with each object, and how the objects and attributes can be exposed to scripts for access and manipulation.

In 1996, Netscape passed their JavaScript language to the European Computer Manufacturers Association (ECMA) for standardization. This resulted in the ECMAscript standard. As of 2006, the latest version of JavaScript is version 1.6, which is a superset of ECMAscript-262, Edition 3. JavaScript support in Microsoft’s Internet Explorer web browser is actually Microsoft’s own extension of JavaScript called JScript.

One major use of web-based JavaScript is to write functions that are embedded in or included from HTML pages and interact with the Document Object Model (DOM) of a web page to perform tasks not possible in HTML alone. Some common examples of JavaScript usage are the following:

§ Popping up a new web browser window with programmatic control over the size, position, and “look” of the new window. (Usually JavaScript is used to ensure that menus, toolbars, etc., are not visible on popped-up browser windows.) This usage of JavaScript is sometimes viewed as more of an annoyance than something useful, and web browser plug-ins such as “popup blockers” have sprouted to deal with JavaScript-generated pop-up windows.

§ Validating web form input values to make sure that they will be accepted before they are submitted to the web server.

§ Changing images as the mouse cursor moves over them, also known as a “mouse-over” effect. This effect is favored by some web page designers to draw a user’s attention to important links displayed as graphical elements. Much of this type of functionality can now be achieved-usually more easily-using Cascading Style Sheets, which will be discussed later in this chapter.

§ Inserting text and HTML tags dynamically into an HTML page.

§ Reacting to events. A JavaScript program can be set to execute when something happens, such as when a page has finished loading or when a user clicks an HTML form element such as a button.

JavaScript support has been a standard feature of most web browsers since the late 90s. JavaScript has gained in importance recently as part the AJAX web application development technique (see http://en.wikipedia.org/wiki/ΛJΛX). AJAX (the term is shorthand for Asynchronous JavaScript and XML) has been used to create high-profile web applications such as Google Maps (http://maps.google.com), which seem to be able to blur the distinction between web applications and desktop applications with their high level of interactivity.

Introduction to JavaScript Usage

JavaScript was designed to add interactivity to HTML pages. It is a lightweight scripting language that consists of lines of executable instructions, which are usually embedded directly into HTML pages; these instructions are executed when the HTML page is loaded by a web browser.

The HTML <script> tag is used to insert a JavaScript into an HTML page, as shown in the following example:

<html>

<body>

<script type="text/javascript ">

// JavaScript comments begin with double forward slashes as in C++ and Java.

document.write( "Hello World!" );

</script>

</body>

</html>

An HTML file that consists of the preceding code will print Hello World! in the web browser window when loaded. In the preceding HTML file, the <script type=“text/ javascript”> and </script> tags determine where the JavaScript starts and ends. The only JavaScript instruction in this file, document.write, is a standard JavaScript command for directing output to a page, in this case the literal string “Hello World!”. The document.write instruction is also a simple example of JavaScript’s use of the resulting web page’s Level 0 DOM, in which the document in document.write is the HTML document object, and write is a built-in method of the document object, a method that “writes” output on the document. The semicolon at the end of lines of JavaScript such as document.write(“Hello World!”); is optional. As in C++ and Java, comments are started using two forward slashes (//).

The use of a variable in JavaScript is demonstrated in the next code fragment. JavaScript supports only three types of simple variables. These are text (string), numeric, and Boolean (true or false). In the example, the variable str_greeting is initially assigned the literal string “Hello World!”, so str_greeting is a string variable.

<script type="text/javascript">

var str_greeting = "Hello World!";

document.write( str_greeting);

</script>

If we wanted to embed the current date and time in the HTML document, we could use a built-in JavaScript function, Date(), as follows:

<script type="text/javascript">

document.write( "The current date and time is " + Date())

</script>

The “+” operator in JavaScript, used in the preceding code, is designed to allow the easy concatenation of strings of text, such as “The current date and time is” + Date().

The following is another simple HTML file with an embedded JavaScript function. Each time this HTML file is loaded by a web browser, one of three image files will be randomly selected and displayed in the browser window using the HTML <img> tag:

<html>

<head>

<title>JavaScript function example #1</title>

<script type="text/javascript">

function display_image()

{

// Get a random integer between 0 and 2

var whichImg = Math.round(Math.random() * 2);

// Create a 3 element string array

var image = new Array(3)

// Assumes that the image files named below are in the same directory as

// this HTML document.

image[0] = "moose.png"

image[1] = "squirrel.png"

image[2] = "mountie.png"

document. write ( "<center><img src=" + image [whichImg] + "></center>")

}

</script>

</head>

<body>

<script type="text/javascript">

// Call the function that was defined in <head> section.

display_image()

</script>

</body>

</html

As demonstrated in the preceding HTML file, JavaScripts that contain functions are usually placed in the <head> section of the HTML document to ensure that the script is loaded before the function is called. The function, display_image(), that was defined in the <head> section, is called unconditionally in the <body> section of the document each time the document is loaded. The example also demonstrates the use of the Math.round and Math. random JavaScript built-in functions to generate an appropriate random integer and also the creation and use of an array. You can see that the document.write( “<center><img src=” + image[whichImg]+“></center>”) statement contains in-line HTML tags to center the randomly selected image file on the rendered web page.

An interactive way to call a JavaScript is demonstrated in the following HTML file. This time, when an HTML form button is pressed by the user, one of three image files will be randomly selected and displayed in the browser window:

<html>

<head>

<title>JavaScript function example #2</title>

<script type="text/javascript">

function display_image()

{

var whichImg=Math.round(Math.random() * 2);

var image=new Array(3)

image[0]="moose.png"

image[1]="squirrel.png"

image[2]="mountie.png"

document. write ( "<center><img src="+image [whichImg]+"></center>")

}

</script>

</head>

<body>

<form>

<input type="button" onclick="javascript:display_image()"

value="Press to display random image">

</form>

</body>

</html

In the preceding example, the form button displayed in the browser window will have the text “Press to display random image” inside it. The onclick=“javascript:display_image()” instruction is an example of JavaScript being used to respond to a browser event.

JavaScript has the usual complement of arithmetic operators, comparison operators, logical operators, and program flow control structures that other high-level programming languages have. These operators and control structures are very much like their counterparts in C/C++ and Java, reflecting the syntactic heritage of JavaScript. The control structures include conditional selection structures such as if, if-else, if-else if-else, and switch. The control structures also include repetition structures (loops) such as for, while, and do-while. See http://en.wikipedia.org/wiki/JavaScript_syntax for a full JavaScript syntax reference.

The Document Object Model

As previously mentioned, the DOM exposes parts of an HTML document as objects to scripting languages such as JavaScript so that they can dynamically access and update the content, structure, and style of HTML documents. The DOM Level 0 is not a W3C recommendation but simply a way to refer to the early JavaScript DOM, which is still used. The DOM Level 1 has been a W3C recommendation since 1998 and is well supported by the major browsers today and is also language-independent. The DOM makes available a number of convenient methods and properties that web programmers can use in their scripts, whether written in JavaScript or other languages. The W3C’s DOM Level 1 reference can be found at http://www.w3.org/TR/REC-DOM-Level-1.

Cascading Style Sheets

Style sheets, in the world of HTML, define how HTML elements are displayed by web browsers. Cascading Style Sheets (CSS) were officially introduced by the W3C as part of the HTML 4.0 recommendation in 1997 (http://www.w3.org/Style/CSS). The W3C next published CSS level 2 as a Recommendation in May 1998. CSS2 is a superset of CSS1. All major web browsers now support CSS2 to varying degrees of correctness.

Styles were introduced to solve a common problem. HTML tags, as envisioned by Tim Berners-Lee, were originally designed to define the content of a document, rather than its style or layout. They were supposed to define elements such as headers, paragraphs, and tables using tags such as <h1>, <p>, <table>, and others. The presentation style of the rendered web page was left to the browser.

As the two major web browsers in the late ‘90s, Netscape and Internet Explorer, continued to add their own new HTML tags and attributes (such as the <font> tag and the “color” attribute) to the original HTML specification, it became more and more difficult to create web sites where the content of the HTML documents was clearly separated from the documents’ presentation layout. The proliferation of attribute tags to micromanage elements such as font size made HTML code a greater and greater mess and thus made HTML documents harder to debug when necessary. The W3C introduced style sheets to try to solve this problem, i.e., to separate a document’s content (structure) from its presentation (style).

Style sheets can offer the added benefit of saving web development time and effort. Styles sheets define how HTML elements are to be displayed, just as tags such as the <font> tag and the color attribute did in HTML 3.2. Styles are normally saved in external .css files. By editing one external style sheet, you can potentially change the appearance and layout of all the pages in your web site. You can define a style for each HTML element you use and then apply it to as many web pages as you wish.

The name Cascading Style Sheets comes from the behavior of multiple styles cascading into one. In addition to external .css files, styles can be specified inside a single HTML element in a document and also inside the <head> section of an HTML page. Multiple external style sheets can be referenced inside a single HTML document. When there is more than one style specified for an HTML element, a set cascading order is followed to choose which style will be used. All the styles will “cascade” into a new “virtual” style sheet according to the following rules, where rule 4 has the highest priority and rule 1 has the lowest priority:

1. Browser default

2. External style sheet

3. Internal style sheet (inside the document <head> section)

4. Inline style (inside an HTML element)

The CSS syntax, a fairly readable syntax, is made up of three fundamental parts: a selector, a property and a value:

selector {property: value}

The selector is normally the HTML element or tag that you wish to define, the property is the attribute you wish to change, and each property can take a value after the colon. The property and value are surrounded by curly braces. To make style definitions more readable, usually one property is defined per line as in the following example in which we set three property values for the HTML paragraph (<p>) tag:

P

{

text-align: left;

color: black;

font-family: helvetica

}

Selectors can be grouped by separating them with commas. In the example that follows, all the HTML header elements are grouped and the text color set to orange:

h1, h2, h3, h4, h5, h6

{

color: orange

}

Let’s look at an actual application of CSS. The following is a group of style definition statements in the CSS syntax. Because of the length, this group of statements will be saved in a separate .css file called myStyle.css:

p, li, h1, h2, h3

{

font-family: 'Comic Sans', 'sans serif';

}

p, h1, h2, h3, li, hr

{

margin-left: 10 pt ;

}

P, li

{

font-size: 75%;

}

h1, h2, h3, hr

{

color: firebrick;

}

a:link {COLOR: firebrick;}

a:visited {COLOR: firebrick;}

a:active {COLOR: navy;}

a:hover {COLOR: navy;}

a.content:link {COLOR: seashell;}

a.content:visited {COLOR: seashell;}

a.content:active {COLOR: seashell;}

a.content:hover {COLOR: papayawhip;}

We want to apply the style sheet definitions in myStyle.css to the simple HTML document, myPage.html, shown here:

<html>

<head>

<title>CSS Example</title>

</head>

<body>

<h1>This is header 1.</h1>

<hr>

<p>This is a paragraph.

<p>This is a second paragraph.

<hr>

<h2>This is header 2, followed by two lists.</h2>

<ol>

<li> <a href=http://foo.com>Link to home page</a>

<li> <a href=http://bar.com>Link to last page</a>

</ol>

<u1>

<li> Item 1

<li> Item 2

</ul>

</body>

</html>

When the plain myPage.html is loaded into a web browser, it looks like Figure 27–12.

Image from book
Figure 27–12: myPage.html with CSS not applied

To apply the style sheet definitions in the external file myStyle.css to myPage.html, the <link> tag needs to be used in the <head> section of myPage.html as follows, to link to the style sheet file. This is assuming that myStyle.css and myPage.html are saved to the same directory:

<head>

<title>CSS Example</title>

<link rel="stylesheet" type="text/css"

href="myStyle.css" />

</head>

When the web browser loads myPage.html, it will read the style definitions from myStyle. css and apply the definitions to myPage.html. With the <link> to myStyle.css included in the <head> section, myPage.html now looks like Figure 27–13 when loaded into a browser.

Image from book
Figure 27–13: myPage.html with CSS applied

Most noticeably, the style sheet has altered the font family and color of the text in this HTML document. The left margin has also been increased. What is not shown in the screen shot is the effect that the style sheet has on hypertext links created using the anchor tag (<a href=…). A portion of the style sheet that affects the color of hypertext links is shown here:

a:link {COLOR: firebrick;}

a:visited {COLOR: firebrick;}

a:active {COLOR: navy;}

a:hover {COLOR: navy;}

Using these definitions, link text will normally be rendered in the firebrick color, but when the user moves the mouse cursor over a link, it changes color to navy; some consider this interactive link color changing to be an increase in usability More important, by <link>'ing to myStyle.css from multiple HTML files, a web site can achieve a consistent look and feel without much work.

Server-Side Web Applications

Earlier in this chapter, JavaScript was presented as a method for developing client-side web applications, applications that are interpreted and executed by the web browser. Server-side web applications are also accessed using a web browser. However, instead of the web browser executing the program, a server-side web application is executed by the web server that is serving the web pages. Here, we are talking about web servers as defined in Chapter 16, such as the Apache Web Server. This section will discuss two server-side web application technologies: CGI and PHP.

The Common Gateway Interface

Common Gateway Interface (CGI) programs, which have had a large role in enabling the web to become the massive data warehouse it has become today, are an example of serverside web applications. On the web, CGI is used to interface external programs with a web server. A CGI program that is called from a web page is executed by the web server, and the CGI program’s output, which usually includes dynamically generated HTML code, is served by the web server back to the requesting web browser. CGI programs are often used for applications that use a database to store data, data that can be entered or queried by filling out web forms; CGI programs usually generate the web forms and parse/validate the data before they are sent to a database management system. Simple CGI programs also function as web page hit counters that display the number of visits to a page and web site guest books. More sophisticated CGI programs are used to run applications such as Wikis, web-based e-mail clients, and e-commerce shopping carts.

CGI programs can be written in any programming language that understands UNIX-style standard input and output and which can access system environment variables; these languages include compiled high-level languages such as C and C++. But most commonly, CGI programs have been written in interpreted languages such as (k)sh, Perl, and Python. Perl has been particularly popular as a CGI language because of its rich and powerful built-in text parsing capabilities. Because they are interpreted languages, programs written in Perl and Python can be prototyped (written and tested) relatively quickly without an intermediate compile step; this is another reason why these languages have been widely used for CGI work. (See Chapters 22 and 23 for more information on Perl and Python, respectively.) This section will provide some simple examples that give a glimpse of what is possible with CGI programs written in Perl and Python.

We will begin with a simple “Hello World” CGI script written in Perl:

#!/usr/bin/perl -wT

print "Content-type: text/html\n\n";

print "<html><head><title>Hello World</title></head>\n";

print "<body>\n";

print "<h2>Hello, world!</h2>\n";

print "</body></html>\n";

Like all Perl scripts, this script begins with #!, which indicates that this is a script. The next part, /usr/bin/perl, is the location (or path) of the Perl interpreter on the web server’s machine. The final part contains optional flags for the Perl interpreter. Warnings are enabled by the -w flag. Special user input taint checking is enabled by the -T flag. These flags help to create more secure Perl scripts, and their use in almost every Perl CGI script is a good habit to get into.

This CGI script is going to generate an HTML page, so the first print command on the second line, print “Content-type: text/html\n\n”;, is needed before anything else is printed. This is a content-type header that tells the receiving web browser what sort of data it is about to receive, in this case, an HTML document. The script then needs to print out all of the HTML that you want to display in the visitor’s browser, so print statements are included for every line of HTML. The next step is to save this file to a CGI script directory that your web server recognizes (see Chapter 16 for Apache CGI support) and adjust the file permissions to make it world-readable and world-executable (typically with chmod 755 filename). If you consult your web server/system administration, they may be able to configure the web server to execute your CGI scripts from your ~/public_html/cgi-bin directory The usual convention is to give the file a .cgi filename extension. When the CGI script is loaded by your web browser, you should see “Hello, world!” displayed in a H2-size HTML header.

Perl’s CGI.pm Module

One of Perl’s most attractive features is a large library of add-on modules, collections of reusable prewritten code that can save programmers the time and trouble of reinventing the wheel. One of these add-on modules is the CGI.pm module that has been part of the Perl standard library for some time. CGI.pm has a number of useful functions and features for writing CGI programs, and its use is preferred by the Perl community Here is the Hello World CGI script again, this time using CGI.pm:

#!/usr/bin/perl -wT

use CGI qw (:standard);

print header;

print start_html("Hello World");

print "<h2>Hello, world!</h2>\n";

print end_html;

The second line of the script-use CGI qw(:standard);-includes the CGI.pm module before any other code. The qw(:standard) part of this line indicates that we’re importing the “standard” set of functions from CGI.pm.

CGI.pm has important uses, including a number of functions that serve as HTML shortcuts. The functions that we see in the script are header, start_html(), and end_html The header function prints out the “Content-type” header. The start_html() function prints out the <html>, <head>, <title>, and <body> tags. It can also accept several optional arguments, including the page title argument, for example, print start_html(“Hello World”);. The end_ html function prints out the closing HTML tags: </body></html>. By reducing the number of HTML tags that have to be included in the print statements, these CGI.pm functions have at least made the script easier to read.

The next simple Perl CGI script may be useful for a system administrator wishing to monitor a server’s status remotely using a web browser. (Such a script would be for the administrator’s use only, since it can make system information available to the wrong people. The administrator would do well to password-protect this script.)

#!/usr/bin/perl -w

use CGI qw(:standard);

$host=$ENV{HTTP_HOST};

$uptime="uptime";

$w='w -s -h';

print header;

print start_html("$host Status");

print "<h1>What's happening on $host</h1>";

print "$uptime";

print "<hr>";

print "<pre>$w</pre>";

print end_html;

The script uses UNIX command substitution to assign the text output of the uptime and w commands to the variables $uptime and $w, respectively The script later prints the contents of $uptime and $w to show a web page that contains information on how long the web server machine has been running, the load on the machine, which users are currently logged in, and what those users are doing. In the third line of the script, $host=$ENV{HTTP_ HOST}, the system environment variable HTTP_HOST is accessed and its value assigned the variable $host. HTTP_HOST contains the hostname of the web server machine. The web server sends a series of environment variables to every CGI program it runs. Your CGI program can parse these variables and use the values they contain. Environment variables are stored in a Perl hash named %ENV.

CGI with HTML Forms

A common form of CGI programming handles interaction between user input in an HTML form and a database. HTML forms consist of several input fields, each with a key identifier, and also a submit button that sends the data in a query string that is parsed by a CGI program. The following is a sample CGI program written in Python:

#!/usr/bin/python

import cgi

# Required header that tells the browser how to render the HTML.

print "Content-Type: text/html\n\n"

# Define function to generate HTML form.

def generate_form():

print "<HTML>\n"

print "<HEAD>\n"

print "\t<TITLE>Info Form</TITLE>\n"

print "</HEAD>\n"

print "<BODY BGCOLOR=white>\n"

print "\t<H3>Please, enter your name and age.</H3>\n"

print "\t<TABLE BORDER=0>\n"

print "\t\t<FORM METHOD=post ACTION=\

\"python_cgi_demo.cgi\">\n"

print "\t\t<TR><TH>Name:</TH><TD><INPUT type=text \

name=\"name\" ></TD><TR>\n"

print "\t\t<TR><TH>Age:</TH><TD><INPUT type=text name=\

\"age\" ></TD></TR>\n"

print "\t</TABLE>\n"

print "\t<INPUT TYPE=hidden NAME=\"action\" VALUE=\

\"display\">\n"

print "\t<INPUT TYPE=submit VALUE=\"Enter\">\n"

print "\t</FORM>\n"

print "</BODY>\n"

print "</HTML>\n"

# Define function display data.

def display_data(name, age):

print "<HTML>\n"

print "<HEAD>\n"

print "\t<TITLE>Info Form</TITLE>\n"

print "</HEAD>\n"

print "<BODY BGCOLOR=white>\n"

print name,", you are", age, "years old."

print "</BODY>\n"

print "</HTML>\n"

# Define main function.

def main():

form=cgi.FieldStorage()

if (form.has_key("action") and form.has_key("name") \

and form.has_key("age")):

if (form["action"].value == "display"):

display_data(form["name"].value, form["age"].value)

else:

generate_form()

# Call main function.

main()

Python also has a standard CGI module that is imported in the second line of this program. Most of this program’s work is done by two Python functions: generate_form(), which mainly prints HTML tags to generate a web page containing the HTML form, and display_data(), which generates a web page that simply displays the data (name and age) that were entered in the form.

If you save this script as python_cgi_demo.cgi to a valid web server CGI directory, make it executable (with chmod 755 python_cgi_demo.cgi), and access it using a web browser, the web server will execute the script and produce an HTML form page like the one shown in Figure 27–14.

Image from book
Figure 27–14: Python CGI form

Usually, data that is entered in a CGI-generated form like this will be parsed, validated, and passed-in the form of a query-to a database. In this example, when the form’s Enter button is pressed, the name and age variables are passed to the display_data() function to be displayed in a separate web page.

CGI Overhead and Security

A limitation of CGI, which has been recognized from the beginning of CGI use on the web, is the problem of CGI overhead. CGI overhead is a consequence of HTTP being a stateless protocol, which means that a separate CGI process must be initialized for every “hit” from a browser. With very popular web sites, hundreds or thousands of CGI script instances consuming CPU time and memory would quickly bog down the web server machine. Also, when interpreted languages such as Perl and Python are used, there is the performance overhead of the CGI program’s interpreter having to initialize each time the script is called.

Work-arounds for CGI overhead do exist. There is FastCGI (http://www.fastcgi.com), which does not create a new process for every CGI script request but instead uses a single persistent process to handle many requests. Another approach used to deal with CGI overhead for scripting languages is to embed the interpreter directly into the web server as a module, so that scripts can be executed without creating a new process. The Apache web server has a number of these interpreter modules, including mod_perl (http://perl.apache.org) and mod_python (http://www.modpython.org). These web server interpreter modules allow CGI scripts to run many times faster than the traditional CGI facility

Unfortunately, CGI programs have the demonstrated potential to create large security holes in web server hosting systems. If they are carelessly programmed, they may allow people with malicious intent on the web to enter UNIX commands into CGI-processed forms and have these commands executed on your web server machine. There are rules of thumb that you should heed when writing CGI programs to make them as safe as possible:

1. Avoid giving out too much information about your web site and server host.

2. If coding in a compiled language like C, avoid making assumptions about the size of user input.

3. Never pass unchecked remote user input (such as data typed into HTML forms) to a shell command.

PHP: Hypertext Preprocessor

Another well-known server-side scripting language is PHP. Chapter 16 discussed the process of compiling and configuring the PHP interpreter module for Apache. This section will look more closely at the PHP language. PHP began life as a set of Perl scripts and was released as Personal Home Page, a full-fledged, interpreted web scripting language in June 1995 by Rasmus Lerdorf. In 1998, when PHP Version 3 gained attention in the web development community, PHP had become an acronym for PHP: Hypertext Preprocessor. The new development team, led by Zeev Suraski and Andi Gutmans, two Israeli developers at the Technion-Israel Institute of Technology, released PHP 4 in 2000 and PHP 5, which included many feature enhancements, in 2004. PHP is one of the most popular programming languages for implementing web sites, with reportedly over 20 million Internet domains using it. Because of its ease of use compared to Perl and Python, PHP is most often the “P” in LAMP (Linux, Apache, MySQL, Perl/Python/PHP), a prominent group of technologies that have been used together to create web applications such as content management systems, wikis, and online stores (see the later subsection “PHP and MySQL”).

PHP was originally designed for server-side applications in conjunction with a web server. This is in contrast with languages such as Perl and Python, which were meant to be general-purpose languages. So, while PHP scripts can be executed by the web server using the CGI mechanism, they are more often executed through a PHP interpreter module in the web server. Moreover, unlike Perl or Python CGI scripts, PHP language instructions and routines are inserted into HTML documents using special delimiting tags, as is done with JavaScript programs.

PHP, in its syntax, resembles C/C++, Perl, Java, and JavaScript, sharing the same basic set of arithmetic, assignment, comparison, and logical operators as these languages, as well as the same basic set of control structures, such as if-else and loops. PHP has associative arrays (hashes) like Perl and Python and has object-oriented programming features. This tends to ease the learning of PHP for programmers with experience in other scripting languages. Like Perl, PHP uses flexibly typed variables, prefixed with a “$” and able to hold any data type you wish. For a full PHP language reference, see http://www.php.net/manual/en/langref.php.

In Chapter 16, a simple PHP “Hello World” example was shown that incorporated the useful phpinfo() function (see Chapter 16, Figure 16–5). Here is another PHP example:

<html>

<head><title>A Simple PHP Script</title></head>

<body>

<h1>A Simple PHP Script</h1>

<p>Welcome, Internet user from IP address

<?php

$remote_ip=$_SERVER['REMOTE_ADDR'];

print $remote_ip;

?>.

<p>Make yourself at home.

</body>

</html>

This PHP “program” is an HTML file with a couple of lines of PHP embedded in it between the <?php and ?> delimiting tags. In the PHP block, the variable $remote_ip is assigned the value of the CGI environment variable REMOTE_ADDR, which is part of the built-in PHP “autoglobal” array, $_SERVER. REMOTE_ADDR holds the numeric IP address of the web browser host. Next, the PHP print command is executed to print the value of $remote_ip to the standard output. The output from the print command is included in-line into the surrounding HTML code. As in Perl, C/C++, and Java, semicolons are required at the end of PHP statements.

If your web server’s PHP interpreter module is set up as shown in Chapter 16, this HTML file will be scanned for PHP code if it has a .php file extension. Unlike CGI scripts, which must be saved to specific CGI-BIN directories that the web server looks in, .php files can be saved anywhere under the web server’s document root. Also, since .php files are essentially HTML files, they do not need execute permissions as CGI scripts do; .php files do, however, need to be made readable by the web server process owner, which is typically nobody or apache on UNIX web server systems. When loaded by a web browser, the page should look like Figure 27–15.

Image from book
Figure 27–15: Remote IP detection with PHP

The following code is another simple PHP example. It queries another CGI environment variable, HTTP_USER_AGENT, which contains information about the client web browser being used to view the PHP page (this is a primitive example of what is called “browser detection” in web development):

<html>

<head><title>Another Simple PHP Script</title></head>

<body>

<h2>A Simple PHP Browser Detection Script</h2>

Here is your full Web browser profile:

<p>

<?php

$browser=$_SERVER['HTTP_USER^&GENT'];

print $browser;

?>

<hr>

<?php

if (strpos($browser, 'Firefox') != FALSE) {

?>

<p>Congratulations on your choice of Firefox.</p>

<?php

} else {

?>

<p>You do not seem to be using <a

href=http : //www.mozilla . com/f iref ox/>Firef ox. </a></p>

<?php

}

?>

</body>

</html>

The interesting thing about this PHP page is what is happening in the PHP if-else structure. The if (strpos($browser, ‘Firefox’) != FALSE) {line shows the strpos statement being used to search for the substring ‘Firefox’ in the variable, $browser, which holds the value of the HTTP_USER_ AGENT environment variable. Also of interest is the way in which raw HTML code can be intermixed with the PHP if-else structure. For instance, <p>Congmtulatiom on your choice of Firefox.</p> is not part of the PHP code (not within the <?php…?> delimiting tags), but it will only be displayed on the web page when the if clause evaluates to true. This avoids the need to use print statements to print out HTML tags. When loaded by a web browser, the resulting web page should look like Figure 27–16.

Image from book
Figure 27–16: Browser detection with PHP

One of PHP’s most attractive features is its handling of HTML forms. When dealing with HTML forms and PHP, any form element in an HTML page is automatically available to your PHP scripts. The following is an example of a plain HTML form that calls a PHP script, php_form_demo.php, when the Submit button is pressed. The form that is generated is similar to the name and age form shown in Figure 27–14, which was generated by a Python CGI script:

<html>

<head><title>Simple PHP Form Handling Demo</title></head>

<body>

<form action="php_form_demo.php" method="POST">

Enter your name: <input type="text" name="name" />

Enter your age: <input type="text" name="age" />

< input type ="submit" />

</form>

</body>

</html>

The php_form_demo.php file must be in the same directory as the preceding HTML document. The contents of php_form_demo.php are

<html>

<body>

Your name is <?php echo $_POST["name"]; ?>, and

you are <?php echo $_POST["age"]; ?> years old.

</body>

</html>

When the form’s Name and Age fields are filled out and the Submit button is pressed, the web server loads php_form_demo.php, processes the PHP statements and HTML code in it, and then sends an HTML response back to the client web browser. The resulting web page will contain a single line like this:

Your name is Joe, and you are 38 years old.

In php_form_demo.php, $_POST is another built-in PHP “autoglobal” array. This array contains all POST method data from the form that called php_form_demo.php. Thus, the $_POST["name”] and $_POST["age”] variables are automatically set for you by PHP

PHP and MySQL

PHP and the MySQL database (http://www.mysql.com/) are often used together to create web applications, particularly as part of the previously mentioned LAMP Web application platform. One of PHP’s most popular features is its ability to interface with and manipulate many free and commercial database management systems, including MySQL, PostgreSQL, Oracle, Sybase, and others. MySQL, an open-source database management system (DBMS), has become a popular choice for use with PHP because of its reputation as a relatively easy-to-use and fast database. MySQL is cross-platform, easily compiled on all UNIX variants, and included as the default DBMS on many Linux distributions. Chapter 16 included instructions for building PHP with its built-in support for MySQL. This section will provide a brief introduction to using PHP’s MySQL support.

To verify that your PHP installation includes MySQL support, you can check the output of the phpinfo() function as described in the section “Apache and LAMP” of Chapter 16 (See Figure 16–5). You should look for a “mysql” section in the phpinfo() output.

You will also need access to a MySQL database on the UNIX web server machine, the machine on which your PHP scripts will be running. If you are not root on the UNIX machine, you will need the system or database administrator to create a MySQL database and grant you sufficient access privileges to that database, typically requiring authentication with your userid and a password that the administrator assigns to you. We will assume that a MySQL database called mytest has been created for you on the web server machine for use with PHP.

After the mytest database has been created, you will need to create a database table containing some fields. We’ll create the table, info, in the mytest database using a PHP script, maketable.php, that will demonstrate how PHP is used to access a MySQL database:

<?php

$user="username";

$password="password";

$database="mytest";

mysql_connect (localhost, $user, $password) ;

@mysql_select_db($database) or die( "Unable to select database");

$query="CREATE TABLE info(

id int (6) NOT NULL auto_increment,

firstname varchar(15) NOT NULL,

lastname varchar (15) NOT NULL,

PRIMARY KEY (first),

UNIQUE id (id),

KEY id_2 (id))

";

mysql_query ($query);

mysql_close();

?>

The maketable.php file consists of only PHP statements and no HTML. When you load this file using a web browser, it will display a blank page, but the PHP statements will have been executed silently to create the info table. The $user, $password, and $database variables are used to connect to the mytest MySQL database; you have to substitute your own userid and MySQL password for $user and $password. The $query variable contains the actual MySQL database query that is used to create the info table and its three fields, id, firstname, and lastname. The commands to access the MySQL database begin with the keyword mysql and are fairly intuitive.

You can use another query, shown next, to insert one record into the info table (use this query in the preceding PHP script instead of the CREATE TABLE query and load the PHP script in a web browser):

$query="INSERT INTO info VALUES (

",'Bullwinkle','Moose')

".

Here, using the INSERT INTO MySQL query, we are inserting one data record into the info table. The firstname value is ‘Bullwinkle’ and the lastname value is ‘Moose’, but the id value is left NULL (“). This is because we want values for id to be auto-assigned so that each record has a unique id.

Finally, you can display the data you just inserted into the database using the following example script, displaydata.php:

<?php

$username="username";

$password="password";

$database="mytest";

mysql_connect (localhost, $username, $password) ;

@mysql_select_db($database) or die( "Unable to select database");

$query="SELECT * FROM info";

$result=mysql_query($query);

mysql_close();

print "<b>Database Contents</b><br><br>";

$id=mysql_result($result,0,"id");

$firstname=mysql_result($result,0,"firstname");

$lastname=mysql_result($result,0,"lastname");

print "$id $firstname $lastname";

?>

Here, the MySQL query we are using is “SELECT * FROM info”, meaning that we are selecting all fields and records (only one record in this example) from the info table. The mysql_result PHP function allows us to easily parse the query result in the $result variable and extract the $id, $firstname, and $lastname. The print command then prints the values of $id, $firstname, and $lastname.

Web Authoring Software

When the web was young and HTML was relatively simple, using vi, emacs, and other UNIX text editors was a viable method for creating and editing HTML documents and maintaining a web site. Some would still argue that editing raw HTML text files helps you to learn HTML and also fine-tune the design of web pages in ways that dedicated HTML editors cannot. Also, vi variants such as vim and emacs now come standard with HTML modes, including color syntax highlighting.

However, with more tags being added as the HTML (and now the XHTML) standard matures, and with the need to pay attention to CSS, Doctypes, the DOM, JavaScript, etc., it may be worthwhile to mention some alternatives to plain text editors for web page authoring and web site maintenance. Among these alternatives are word processors, filters, and dedicated HTML editors.

Office Suites and Filters

A quick way to create a web page is to prepare the document in a WYSIWYG (What You See Is What You Get) word processor application and then save the document as HTML. Word processors for UNIX that can do this sort of work include the Sun StarOffice (http://www.sun.com/software/star/staroffice) and OpenOffice.org word processors. Typically, these word processors include a software filter that can convert their native file format’s formatting to HTML code. Unfortunately, the HTML code that these word processors’ filters generate can be a mess and can be very difficult to manually edit if the need arises (the same can be said of the HTML generated by WYSIWYG HTML editors, which are mentioned later). Also, the look and formatting of the document in the word processor often does not translate well to a web page. Otherwise, the word processor to HTML approach is fine for simple documents that need to be generated quickly The StarOffice and OpenOffice.org suites also include presentation software that usually does a good job of converting electronic slide presentations to HTML; this makes it easy to publish your presentations to the web.

For UNIX users who prefer to use LaTeX to generate all their documents, there is a command-line latex2html filter (http://www.latex2html.org). The latex2html filter generates cleaner HTML than the word processor filters.

The Website META Language (WML, http://thewml.org) is more of a web site programming tool than a filter. It is billed as an “off-line HTML generation toolkit for UNIX.” The idea in the WML is to describe web pages in a higher-level language than HTML and then use filters to build the HTML documents all at once in a manner akin to building a large C/C++ project using a makefile. This method lends itself to building sites with a very consistent look and layout.

Dedicated Web Page Editors

Dedicated web page editors, also called HTML editors, usually try to be all-in-one web publishing solutions, including features to upload files and directories to remote web servers and also to manage multiple versions of files. All HTML editors provide all or a subset of the following features: drop-down menu access to commonly used HTML tags, tools to ease the building of HTML tables and forms, support for the different HTML/ XHTML standard Doctypes, support for the integration of JavaScript and CSS in page building, DOM support and support for CGI script languages, and remote publishing of files to a server via FTP or another network protocol. The sections that follow describe some HTML editors that are available as binary packages or can be compiled from source code on Linux and UNIX platforms:

Non-WYSIWYG HTML Editors

Bluefish (http://bluefish.openoffice.nl), Quanta Plus (http://quanta.kdewebdev.org), and Screem (http://www.screem.org) are non-WYSIWYG HTML editors. They are designed to appeal to users who have some prior knowledge of HTML. The HTML editing is done in the main edit window in these applications, which offers syntax coloring, syntax highlighting, and automatic tag completion. HTML tags must be typed in directly or inserted using a tag button bar or the drop-down menus provided. Since there is no WYSIWYG mode, the HTML pages being written must be previewed in a web browser using a preview button or preview menu item. Figure 27–17 shows a screenshot of the Quanta Plus editor.

Image from book
Figure 27–17: The Quanta Plus HTML editor

WYSIWYG HTML Editors

Amaya (http://www.w3.org/Amaya), Mozilla Composer (http://www.mozilla.org/products/mozilla1.x), and Nvu (http://www.nvu.com) are WYSIWYG HTML editors. They may appeal to users who have little to no prior knowledge of HTML but who have used a WYSIWYG word processor.

Amaya is an editor and browser, developed by the W3C to be used as a test bed for new web technologies that have not made it yet into major production web browsers. As an editor, it is rather rudimentary and has poor CSS support.

Mozilla Composer is part of the Mozilla browser suite, has many of the same features as the non-WYSIWYG editors mentioned previously, and may be a good choice for a beginning web page author.

Nvu, based on Mozilla Composer, is intended to have a feature set comparable to well-known, “professional-level” HTML editors on the Windows platform, editors such as Microsoft FrontPage and Macromedia Dreamweaver. A screenshot of Nvu in WYSIWYG mode appears as Figure 27–18.

Image from book
Figure 27–18: The Nvu HTML editor

Summary

In this chapter you were introduced to web development in UNIX. You learned of the history of HTML and the efforts of the World Wide Web Consortium to build consensus for HTML standards that browser vendors and web developers could agree on. You learned the basics of HTML markup. You learned about client-side web scripting with JavaScript and about the Document Object Model, which was proposed to make web pages more scriptable. You learned about the push, through the development of Cascading Style Sheets, to separate the content of web pages from their presentation. You learned about server-side scripting possibilities with Common Gateway Interface programs written in the Perl and Python interpreted languages. You learned about the comparative ease of the PHP approach to server-side scripting and how PHP can be used with the popular MySQL database. Finally, you were introduced to existing web authoring solutions for the Linux and UNIX platforms.

How to Find Out More

Here are a couple of general web design references:

· Cederholm, Dan. Bulletproof Web Design: Improving Flexibility and Protecting Against Worst-Case Scenarios with XHTML and CSS. Berkeley, CA: New Riders Press, 2005.

· Zeldman, Jeffrey. Designing with Web Standards. Corte Madera, CA: Waite Group Press, 2003.

The following books are basic HTML/XHTML and CSS references:

· Meyer, Eric A. Cascading Style Sheets: The Definitive Guide. 2nd ed. Newton, MA: O’Reilly Media, Inc., 2004.

· Musciano, Chuck, and Bill Kennedy. HTML & XHTML: The Definitive Guide. 5th ed. Newton, MA: O’Reilly Media, Inc., 2002.

· Powell, Thomas, HTML & XHTML: The Complete Reference. Berkeley, CA: McGraw-Hill/ Osborne Media, 2003.

Some books on web programming are

· Goodman, Danny, JavaScript & DHTML Cookbook. Newton, MA: O’Reilly Media, Inc., 2003.

· Hamilton, Jacqueline D. CGI Programming 101: Programming Perl for the World Wide Web. 2nd ed. Houston, TX: CGI101.com, 2004.

· Marini, Joe. Document Object Model: Processing Structured Documents. Berkeley, CA: McGraw-Hill/Osborne Media, 2002.

· Stein, Lincoln. Official Guide to Programming with CGI.pm. New York: John Wiley and Sons, 1998.

· Tatroe, Kevin, Rasmus Lerdorf, and Peter MacIntyre. Programming PHP. 2nd ed. Newton, MA: O’Reilly Media, Inc., 2006.

A more in-depth treatment of the LAMP platform and techniques can be found in the following:

· Rosebrock, Eric, and Eric Filson. Setting Up LAMP: Getting Linux, Apache, MySQL, and PHP Working Together. Berkeley, CA: Sybex, 2004.

You can also find a well-maintained source of current information about LAMP on the web at http://www.onlamp.com/.