The Apache Web Server - Network Administration - UNIX: The Complete Reference (2007)

UNIX: The Complete Reference (2007)

Part IV: Network Administration

Chapter 16: The Apache Web Server

An Overview of Web Servers

The World Wide Web (WWW or just web) is based on a client/server model. The client runs browser software that allows you to request information on the web and to browse and navigate through it. The information that you request is stored on a machine called a web server. The function of the web server is to provide (serve) web documents and applications to multiple simultaneous browser clients. The term “web server” is also used to describe the software on the “web server” machine that actually handles the requests for information from web browsers. For the purposes of this chapter, when the term, “web server,” is used, it will be in reference to the web server software program, unless otherwise stated.

The most used data transfer protocol that is used between web servers and web browsers is the Hypertext Transfer Protocol (HTTP), which is described in Chapter 10. So in the UNIX world, web server software is commonly known as the httpd or http daemon (see Chapter 11 for a discussion of daemons), which is typically a background process (or a group of processes) governing the HTTP network service. In Chapter 10 we noted that web browsers can be used not just to browse web pages but also to transfer and manage files via FTP and to read Netnews via NNTP, but these extra client functions of web browsers do not involve a web server, but rather, FTP and NNTP servers, respectively.

The content that is served by web servers has increased greatly in sophistication over the decade or so that the web has existed. Web content in the beginning consisted mainly of static web pages containing mostly text. Web content now is dynamic rather than static. The content has become sophisticated enough to enable a rich set of Internet applications, including electronic commerce and streaming multimedia. But underneath it all is the web server, which has also grown in sophistication to serve the dynamic content that is demanded on the web today.

The two basic common features of a modern web server are

§ HTTP responses to HTTP requests Every web server program operates by accepting HTTP requests from the network and providing a HTTP response to each requester. The HTTP response typically consists of an HTML document, but it can also be a text file, an image, or some other type of document. If an error is encountered in a client request, or while trying to serve the request, a web server will serve a HTML document containing an error code and a brief explanation of the error to the requesting user.

§ Logging Web servers will usually have the capability of logging detailed information about client requests and server responses to log files. This allows a webmaster to analyze the logs or to collect usage statistics manually or by running analyzers on the log files.

Modern web servers also implement these features:

§ Configuration of available features using configuration files or through external interfaces.

§ Authentication (using user name and password) before allowing access to some or all resources.

§ Handling of static content (file content stored on the file system of the web server machine) as well as dynamic content. Dynamic content is handled by supporting a collection of related methods, including SSI, CGI, PHP, ASP, and server APIs, discussed in Chapter 27.

§ Module support to allow the extension of web server capabilities by adding or modifying software modules that are linked to the web server software or that are dynamically loaded (on demand) by the core web server.

§ HTTPS support (through SSL or TLS) to allow secure connections, using encryption, to the server on the standard network port 443 instead of the usual HTTP port 80.

§ Content compression, for example, through GZIP encoding to reduce the size of the responses in order to decrease network bandwidth usage.

§ Creation of virtual hosts that serve many web sites using one IP address.

§ Support for large files, so that files of sizes greater than 2GB on 32-bit operating system can be served.

§ Bandwidth throttling to limit the speed of responses in order not to saturate the network and to be able to serve more clients.

Both proprietary and open-source web servers are available for every major version of UNIX. Notable among the proprietary web servers are Sun’s Java System Web Server (formerly Sun ONE Web Server, iPlanet Web Server, and Netscape Enterprise Server) and the Zeus Web Server. This chapter will describe the most widely-used web server on the Internet, the Apache Web server, which has become almost a de facto standard on UNIX and Linux platforms. What follows will be a nonexhaustive description of the process of installing, configuring, and administering Apache to operate a small web site with mostly static content on the Internet. Also included will be descriptions of the configuration options needed to serve dynamic content and to ensure that the web server runs securely. Even if you have no interest in web development or in being a “webmaster,” Apache has become such an important applications platform for many useful and free web applications-discussion boards, blogging, calendars, content management systems, and wikis, for example-that it may behoove you to give Apache a try.

The History and Popularity of Apache

The Apache Project was launched in 1995 when a development team began applying software “patches” to the source code of the NCSA httpd web server. NCSA httpd, developed at the National Center for Supercomputing Applications at the University of Illinois, was one of the first web servers developed. This web server was wedded to the first popular web browser, NCSA Mosaic. However, development of NCSA httpd stalled in 1994, when Rob McCool, the lead developer, left to work for Netscape. At Netscape, McCool worked on the development of the Netscape Enterprise Server, another web server that in the late 1990s played an important role in the development of the modern web, along with the Netscape web browser.

The patches to Apache led to a fork of the NCSA httpd software. Because of the many patches to NCSA httpd that led to Apache, it is often quipped that the name “Apache” came from the fact that it was “a patchy” httpd. Apache has been an open-source project, free of licensing costs, from its start. The source code of Apache was made available on the Internet to encourage others to download it, use it, and contribute improvements. Within a year of its inception, a survey showed that Apache had become the most popular web server. Other surveys showed that by 1999 close to 60 percent of all web sites were running Apache, and in early 2006, close to 70 percent of all web sites were running Apache.

During its swift rise to the status of the most popular web server, Apache ran almost exclusively on UNIX and Linux systems. The older version 1.3 and newer 2.x branches of Apache are now install-time options on every major Linux distribution and open-source BSD variants such as FreeBSD. The major UNIX variants almost all bundle some version of Apache. Apache is a part of the Solaris “Freeware” installation option. IBM AIX includes the IBM HTTP Server, which is based on Apache. HP-UX includes an “Apache-Based Web Server Suite.” The 2.x branch of Apache, which was written from scratch to be free of any remaining vestiges of NCSA httpd code, has become a popular choice for web server on even Microsoft Windows server operating systems. However, all branches of Apache still reflect the UNIX heritage in their configuration system. For example, Apache is configured through plain text configuration files. As Apache has grown in its feature set to keep pace with the growing demands and innovations of the web, the configuration of Apache has also become increasingly complex. Luckily, as with most successful software projects, a sensible default configuration is included so that web sites using Apache can be built fairly quickly There are also graphical user interface (GUI) or web front ends designed to tame the complexity of Apache’s many configuration options. This chapter will discuss basic Apache configuration.

Apache Installation

Before we start looking at the installation process, you will need to do several things. The two most important steps are to obtain a valid IP address and a hostname for your web server machine. Without a valid hostname no one will be able to access your web site, so make sure that your web server machine is added to the Domain Name Service (DNS) before you deploy your web site. You may need to contact your network administrator to have this done. Please see the section on the DNS in Chapter 10.

Another important consideration is to determine the primary type of content that your web server will be serving. If you want to support CGI programs, Java servlets, or file downloads, consider getting a separate machine to use as a web server. A common option used for many intranet web servers is to reuse an older workstation or server as a web server. You should dedicate this machine to web serving, since CGI programs, Java servlets, and such can make heavy demands on your machine’s resources.

If you are installing a UNIX variant or Linux distribution, you will most likely find the Apache web server, or some Apache-based web server, as an install-time option. If you want to install Apache on a machine on which it has not already been installed, you can either install a precompiled binary package or compile the source code yourself and install it. Though the first option is a faster and more assured way of successfully installing Apache, there are advantages and disadvantages associated with both approaches.

Binary Package Installation

When first installing most Linux distributions, the option to install Apache or a “Web Server” is a likely installation step. “Web Server” here is usually Apache. If Apache was not installed with Linux, Apache httpdand associated packages are easily installable in precompiled binary form on most popular distributions. The world of Linux binary packages is now divided into two widely used package formats, the Red Hat package management system format (RPM) and the Debian package management system format (DEB). The Apache packages on Linux also tend to be split up with the intent to make the installation more modular. For instance, there may be separate Apache packages for the core HTTP server, documentation, utilities, Perl support, SSL support, and so on.

Installing Apache Binary Packages in the Red Hat rpm Format

Apache binary packages in the RPM format generally have names that begin with httpd. So to install the core Apache package on RPM-based Linux distributions, you would need to install the httpd package. The httpd RPM packages are usually available for the newer 2.x branch of Apache only A widely used, automated package management system for RPMbased distributions is YUM. YUM contains commands to search for a package, to download a package and other packages that it depends on, and to install the package and associated dependencies. Usage of the YUM commands to find and install httpd and any associated packages is shown next for a Red Hat-like Linux system. These commands must be run as root:

# yum search httpd | less

This search command is piped to the less command because it is likely to return many pages of information about packages (not only Apache-related) that contain the search string httpd. After the desired httpdpackages have been identified, the command to install the core httpd package, man pages, and httpd system configuration utility is as follows:

# yum install httpd httpd-manual system-config-httpd

[Extraneous output not shown.]

Install 3 Package(s)

Update 0 Package(s)

Remove 0 Package(s)

Total download size: 2.5 M

Is this ok [Y/N]:

After you answer Y to the prompt, the Apache packages should be downloaded from a package repository on the Internet or loaded from CD and installed. At the conclusion of the install, the Apache web server program will also be started in the background with a default configuration.

Installing Apache Binary Packages in the Debian DEB format

Linux distributions that use the DEB format for precompiled packages also have a higher level package management system called APT that simplifies the job of installing DEB package files. APT contains commands to search for a package, to download a package and other packages that it depends on, and to install the package and associated dependencies. Usage of two of the APT system commands is demonstrated next for a Debian-derived Linux system. These commands must be run as root:

# apt-cache search apache | less

This apt-cache command searches the APT DEB package database for package descriptors that contain “apache.” The list of packages returned can be quite large, so we’ve piped the output to less. The available apache httpd packages should be listed first. On a recent version of a Debian-derived Linux system, some of the packages of interest were output as follows:

apache2 - next generation, scalable, extendable web server

apache2-common - next generation, scalable, extendable web server

apache2-doc - documentation for apache2

apache2-utils - utility programs for webservers

The command to install Apache 2 is shown here:

# apt-get install apache2

[Extraneous output not shown.]

The following NEW packages will be installed

apache2 apache2-common apache2-mpm-worker apache2-utils libapr0

[Extraneous output not shown.]

Do you want to continue [Y/n]?

The apt-get package management command has automatically resolved which other packages, the apache2 package, depends on (apache2-common, apache2-mpm-worker, apache2-utils, and libapr0). After you answer Y to the prompt, this apt-get command will download the five listed packages from an APT DEB package repository on the Internet or from the installation CD, will install them, and finally will start the Apache web server using a default configuration.

This is as painless as an initial installation of Apache can get on a Linux OS, or any OS for that matter. The use of high-level package management tools such as APT or YUM also makes it easy to upgrade Apache and any other package as the need arises, such as when a security fix is issued. When the apt-get or yum install of Apache has finished, you can usually launch a web browser and connect to the newly installed and running instance of Apache. If you run the web browser on the same machine, you can view the URL, http://localhost. If the machine that you installed Apache on has a fully qualified domain name,for example, pryor.acme.com, on the Internet or an intranet, you would point your web browser to http://pryor.acme.com/. The resulting default home page should look similar to Figure 16–1.

Image from book
Figure 16–1: A default Apache home page

You may have seen this default Apache start page during your web browsing. There are many such installations on the Internet that have unconfigured Apache installations due to either negligence or unawareness on the part of administrators that they even have Apache installed. If you don’t actually need to use Apache, then please don’t install it. It can become a security risk.

Directory Structure for Linux Apache Packages

Though Apache binary packages make the installation simple, there will probably be some configuration work to be done afterward; this configuration work is discussed later in this chapter. When installed from Red Hat RPM packages, the Apache configuration files are placed in /etc/httpd. When installed from Debian DEB packages, the Apache configuration directory is /etc/apache or /etc/apache2. Figure 16–2 shows the additional directories and files created by a typical Apache installation on Linux.

Image from book
Figure 16–2: Directories and files created by typical Apache packages on Linux

The document root directory is the default location of the HTML documents that Apache will serve. An Apache-served web site will be built in the document root and its subdirectories. The modules directory contains dynamic modules that can be loaded by Apache to provide add-on functionality in addition to Apache’s core functions. The logs directory contains log files that track the status of the web server, logging the HTML pages and objects that are requested from the web server and also any errors generated by the web server. The system startup files are the SysV init scripts that can be used to start Apache when the system boots, stop Apache when the system shuts down, and restart Apache when you change its configuration. The status files with the .pid extension contain Apache’s main process ID (PID) when it is running as a daemon. The PID is used by the system startup scripts when stopping or restarting Apache.

Binary Package Installation on UNIX

The Apache Project makes compiled source packages available for download for several UNIX platforms from its web site and mirrors (see http://archive.apache.org/dist/httpd/binaries/). These packages contain versions of the Apache source code that has been precompiled with common features enabled. They also contain an installation shell script, install-bindist.sh, that needs to be run with root privileges to install the compiled binaries and configuration files to an installation directory, typically /usr/local/apache-<version number>. This method of installation is actually very close to the “do-it-yourself” method of compiling and installing Apache described in the next section. These compiled binary packages have the drawback that most of these packages are community-contributed and generally lag the current releases of Apache by several versions. For example, the last available Apache Projectsupplied binary packages for HP-UX and AIX were uploaded in 2002 and 2003, respectively The more assured way of installing the latest version of Apache-which should contain patches for past serious security bugs-is to obtain the source code and compile it yourself.

In the Solaris Freeware project (http://www.sunfreeware.com/), an outgrowth of the SunSITE project, the Solaris user community has a volunteer-maintained source for up-todate Solaris-format binary packages for most common open-source software, including Apache. The packages downloaded from http://www.sunfreeware.com/ are in the Solaris pkg format and can be installed and removed using the Solaris pkgadd and pkgrm commands, respectively The latest Apache packages from Solaris Freeware (version 2.2) have been built to install completely into the /usr/local/apache2 file system tree and are a good alternative method of installing Apache on all versions of Solaris, especially if attempts to compile it yourself fail.

Source Installation of Apache on UNIX

The Apache project, being an open-source project, makes the httpd source code downloadable from its web site, http://httpd.apache.org/. Thanks to the GNU autoconf system, Apache is simple to compile on most UNIX platforms, including Linux. Compiling Apache yourself is the surest way to use the latest version of the web server. This ensures that you can quickly apply any new critical security fixes for Apache without waiting for a vendor or Linux distribution to issue fixed Apache packages. Another advantage of compiling Apache yourself is better performance as the Apache executables will be tuned to your server hardware. Another advantage is that you have finer control over what features are compiled into Apache. Still another advantage, in the opinion of some, is that you can install Apache into its own subdirectory and not have its components scattered into the UNIX file system hierarchy as shown in Figure 16–2. A disadvantage of manually compiling Apache is that you may need to know the various configuration options for building Apache.

The following steps outline how to manually compile and install the latest version of the Apache httpd. Unless otherwise noted, you should be able to perform these steps as a normal (nonroot) user.

Step 1: Obtain the httpd Source Code

A trusted source for the source code is http://httpd.apache.org/, or one of its mirrors. As of mid-2006, the latest bzip2-compressed tar archive for Apache was httpd-2.2.0.tar.bz2. Once you have downloaded and saved httpd-2.2.0.tar.bz2 to a source directory, it needs to be unarchived. A command to uncompress and untar the archive is as follows:

$ bzip2 −dc httpd-2.2.0.tar.bz2 | tar −vxf

This will extract the contents of the tar.bz2 archive into a new subdirectory called httpd-2.2.0.

Step 2: Configure the Source Code and Build

You should enter the new httpd-2.2.0 subdirectory that was just created and read the files there that begin with “README.” In particular, you should read the README.platforms file for any special instructions that may apply to your UNIX platform. Also, read the INSTALL file. After that, you can begin the build process by first configuring the source code. The Apache build process begins with the included GNU autoconf system’s configure script. The configure script’s options can be viewed as follows:

$ ./configure --help | less

This runs the configure script in the httpd source directory and lists the command line options that can be passed to it. Here is an actual run of the configure script to configure certain important Apache options:

$ ./configure --prefix=/usr/local/apache-2.2.0 -enable-suexec \

--with-suexec-caller=nobody --with-suexec-gidmin=51 \

--with-suexec-umask=022 --enable-so --enable-dav --enable-dav-fs \

--enable-auth-digest

Here is a much simpler invocation of the configure script if this is your first time compiling Apache manually:

$ ./configure --prefix=/usr/local/apache-2.2.0

The --prefix=/usr/local/apache-2.2.0 command line switch specifies the installation root directory for Apache. The conclusion of a successful run of configure should produce output that looks like the following:

creating Makefile

creating modules/Makefile

creating srclib/Makefile

creating os/Makefile

creating server/Makefile

creating support/Makefile

creating srclib/pcre/Makefile

creating test/Makefile

[Extraneous output deleted.]

config.status: executing default commands

A successful run of configure will generate Makefiles to build and install Apache. After this it is a matter of running the make command:

$ make

The make command will process the Makefiles and build Apache and its modules. This may take several minutes depending on the speed of the computer. Next run make install. The make install and all the following commands create files and directories that are only root-accessible, so you will need to become root:

(as root) # make install

The make install command will copy the compiled Apache executables, directory structure, and data files into the installation root directory that you specified in the configure --prefix command and option described previously, for example, /usr/local/apache-2.2.0. The output of the following ls command shows the directories that were created by make install in /usr/local/apache-2.2.0:

# ls −1 /usr/local/apache-2 .2 . 0

total 22

drwxr-xr-x 2 root other 512 Apr 20 16 29 bin

drwxr-xr-x 2 root other 512 Apr 20 16 29 build

drwxr-xr-x 2 root other 512 Apr 20 16 27 cgi-bin

drwxr-xr-x 4 root other 512 Apr 20 16 27 conf

drwxr-xr-x 3 root other 1024 Apr 20 16 27 error

drwxr-xr-x 2 root other 512 Nov 29 03 13 htdocs

drwxr-xr-x 3 root other 3584 Apr 20 16 27 icons

drwxr-xr-x 2 root other 2560 Apr 20 16 29 include

drwxr-xr-x 3 root other 512 Apr 20 16 27 lib

drwxr-xr-x 2 root other 512 Apr 20 16 27 logs

drwxr-xr-x 4 root other 512 Apr 20 16 29 man

drwxr-xr-x 14 root other 4608 Nov 29 03 20 manual

drwxr-xr-x 2 root other 512 Apr 20 16 27 modules

As you can see from the directory structure, Apache has been installed into its own “sandbox,” its own directory The bin subdirectory contains the httpd executables, conf contains the httpd configuration files, htdocs is the default document root where HTML files will be stored, and logs will contain log files generated while Apache is running. At this point you should be able to do a test run of Apache with the just-installed apachectl script in bin:

# /usr/local/apache-2.2.0/bin/apachect1 start

This will start Apache httpd on the default HTTP service port 80 using the default home page, /usr/local/apache-2.2.0/htdocs/index.html. You can confirm this by starting a web browser on the same machine and viewing the URL, http://localhost. You can also confirm this by searching for running httpd processes with ps –ef | grep httpd or ps −aux | grep httpd.

Step 3: Set Up the Apache System Startup Script

The manually compiled source installation is not quite finished. You need to set up Apache to start when the system boots. A recommended step is to create the symbolic link, /usr/local/apache2, that points to the just-installed apache-2.2.0 directory To do this, use the following ln command:

# cd /usr/local ; ln −s apache-2.2.0 apache2

When you manually compile and install future versions of Apache, you need to merely move the apache2 symbolic link to point to the newly installed Apache directory This simplifies administration. Assuming that the version of UNIX or Linux that you use boots using SysV-type init scripts, you need an init script for Apache in the /etc/init.d directory

That init script already exists in the form of the apachectl command that now exists in /usr/local/apache2/bin. Another symbolic link, /etc/init.d/apache2, needs to be created using the following command:

# cd /etc/init.d ; ln −s /usr/local/apache2/bin/apachect1 apache2

Next, a symbolic link in the appropriate SysV run level init directory needs to be created. On a system such as Solaris in which the default run level on bootup is run level 3, you would need to issue the following ln command:

# cd /etc/rc3.d ; ln −s ../init.d/apache2 S85apache2

The effect of creating the /etc/rc3.d/S85apache2 symbolic link is that Apache will be started when the system reaches run level 3 (normally during system bootup). Finally, you should create a symbolic link in /etc/rc2.d with the following command:

# cd /etc/rc2.d ; ln −s ../init.d/apache2 K15apache2

The /etc/rc2.d/K15apache2 symbolic link ensures that the Apache processes are terminated properly when the system reaches run level 2 (normally during system reboot or system shutdown). The symbolic links in /etc/init.d, /etc/rc3.d, and /etc/rc2.d need only to be created once if you ensure that the /usr/local/apache2 link points to a newly installed Apache directory when upgrading to newer Apache versions.

Note that on some versions of UNIX, notably HP-UX, you will need to substitute the /sbin/init.d directory for /etc/init.d. Also, you will need to use /sbin/rc3.d and /sbin/rc2.d instead of /etc/rc3.d and /etc/rc2.d.

On BSD variants such as NetBSD and FreeBSD, the process of downloading, compiling, and installing Apache from source code is automated through the pkgsrc and the related ports package management systems, respectively

Apache Modules

Whether installed as Linux packages or compiled from the source code, the Apache web server is a relatively small engine designed to handle web requests for static HTML pages quickly and efficiently All of the other features in Apache are provided by the add-on components known as modules. The modules provide features such as access control, logging, Common Gateway Interface program execution (more on this later), and directory indexing. The standard Apache distribution comes with over 35 modules. There are also several third-party modules that can be compiled against Apache to enable features such as support for certain programming languages. Modules can either be statically compiled into the Apache httpd or dynamically loaded as needed. To see what modules are statically compiled into Apache, you can use the –1 command line option with the httpd executable. On a Red Hat-derived Linux distribution, the typical output of httpd –1 is shown here:

# /usr/sbin/httpd −1

Compiled in modules:

core.c

prefork.c

http_core.c

mod so.c

As you can see from the output, only the core modules have been compiled in. The Apache modules in the same Linux distribution are stored in /usr/lib/httpd/modules/ to be loaded as needed through the httpd configuration file. On a Solaris system on which Apache was manually compiled with default compile-time options, the output of httpd –1 is shown here:

# /usr/local/apache2/bin/httpd −1

Compiled in modules:

core.c

mod_authn_file.c

mod_authn_default.c

mod_authz_host.c

mod_authz_groupfile.c

mod_authz_user.c

mod_authz_default.c

mod_auth_basic.c

mod_include.c

mod_filter.c

mod_log_config.c

mod_env.c

mod_setenvif.c

prefork.c

http_core.c

mod_mime.c

mod_status.c

mod_autoindex.c

mod_asis.c

mod_cgi.c

mod_negotiation.c

mod_dir.c

mod_actions.c

mod_userdir.c

mod_alias.c

mod_so.c

These modules have been compiled because it is thought that the features enabled by these modules will satisfy the needs for features of most web sites. Modules not on this list need to be specified when running the configure script to configure the Apache source code before it is compiled. A complete listing of modules that are included in the latest distribution of Apache httpd can be found in the official Apache documentation at http://httpd.apache.org/docs/22/mod/.

Apache Configuration

Once you have Apache successfully installed and serving web pages using the default configuration, you will most likely need to customize the configuration for your particular needs. The general features of Apache that can be configured are the global environment, such as the web document root and the TCP/IP port that Apache will use; dynamic shared object (module) control, such as support modules for programming languages that Apache can use to generate dynamic web pages; reducing the system security risks of the web server and controlling access to specific documents; support for the Common Gateway Interface (CGI), virtual hosts, and user home directories; and the location and format of logs that Apache generates.

The Apache httpd main configuration file is httpd.conf, a plain text file in the UNIX tradition. If installed from Linux packages, the location of httpd.conf is /etc/apache/, /etc/apache2/, or /etc/httpd/. If compiled and installed manually as shown earlier in this chapter, httpd.conf will be located in /usr/local/apache2/conf/.

Elements and Syntax of httpd.conf

When you first encounter the default httpd.conf file that is installed for you, you will notice how long a file it is. You’ll notice that most of the lines begin with the # (pound) symbol; all these lines are comments, another common element of UNIX configuration files. The comments explain the various options and directives. These are the commonly changed options and directives in httpd.conf (for the 2.x branch of Apache):

ServerRoot The top of the directory tree under which the server’s configuration, error, and log files are kept.

Example: ServerRoot "/usr/local/apache-2 .2.0"

Listen Allows you to bind Apache to specific IP addresses and/or ports, instead of the default. Sometimes it is desirable to have run Apache on a port other than the standard port 80, for instance, if another web server is already running on port 80.

Example: Listen 8080

User/Group The name (or number) of the user/group to run httpd as. These are important directives for security purposes. A compromised web server could be used to read and write in privileged areas of the file system. So it’s usually encouraged to use a dedicated or nonprivileged user and group for running httpd. On Linux systems on which Apache has been package installed, the Apache user and group are preset to www-data or apache. If you have manually compiled Apache, you will need to set the user and group to suitable values. Recommended values for user and group are nobody and nogroup, respectively

Example: User nobody

Example: Group nogroup

DocumentRoot The directory out of which you will serve your HTML documents. The URLs that Apache serves are relative to this document root. For example, if your DocumentRoot is set to /usr/local/apache-2.2.0/htdocs, and your server’s fully qualified domain name is pryor.acme.com, and you saved the file about.html to the /usr/local/apache-2.2.0/htdocs directory, then the URL for about.html would be http://pryor.acme.com/about.html.

Example: DocumentRoot "/usr/local/apache-2.2.0/htdocs"

Directory Each directory to which Apache has access can be configured with respect to which services and features are allowed and/or disabled in that directory (and its subdirectories). Each directory-specific configuration in httpd.conf is enclosed by an opening <Directory directory_name> tag and a closing </Directory> tag. The following example <Directory> entry for the DocumentRoot comes from a default httpd.conf after a manual compile of Apache.

Example:

<Directory "/usr/local/apache-2.2.0/htdocs">

# The Options directive is both complicated and important. Please see

# http://httpd.apache.org/docs/2.2/mod/core.html#options

# for more information.

#

Options Indexes FollowSymLinks

# AllowOverride controls what directives may be placed in .htaccess files.

# It can be "All", "None", or any combination of the keywords:

# Options Filelnfo AuthConfig Limit

#

AllowOverride None

#

# Controls who can get stuff from this server.

#

Order allow, deny

Allow from all

</Directory>

DirectoryIndex Sets the file that Apache will serve if a directory is requested. This file is usually called index.html. After you install Apache, you should find the default index.html file already installed in the DocumentRoot. After Apache is installed and started, when the home page URL http://localhost is requested, it is the index.html file in DocumentRoot that is actually served by Apache. In the following example, index.htm and index.php are also made valid directory index files.

Example:DirectoryIndex index.html index.htm index.php

Include Allows you to include external configuration files to add extra features or to modify the default configuration of the httpd server. The location of the external configuration files are specified relative to the DocumentRoot. The following example includes the external configuration file, httpd-userdir.conf, which enables users to serve web pages from their home directories by saving HTML files to the ~/public_htmldirectory If the DocumentRoot is set to /usr/local/apache-2.2.0, the Include directive here expects to find httpd-userdir.conf as DocumentRoot/conf/extra/httpd-userdir.conf.

Example: Include conf /extra/httpd-userdir. conf

User Directories

An often-used feature of Apache is the aforementioned user directories feature to allow users to serve web pages from their ~/public_html directories. The Apache module needed to enable user directories, mod_userdir, is usually compiled statically into the httpd executable. Whether the external userdir configuration file, httpd-userdir.conf, is “Included” in httpd.conf or whether user directories are enabled directly in httpd.conf, the needed configuration directives for user directories are as follows, assuming that user home directories are under /home:

# UserDir: The name of the directory that is appended onto a user's home

# directory if a -user request is received.

#

UserDir public_html

#

# Control access to UserDir directories.

#

<Directory /home/*/public_html>

AllowOverride AuthConfig Filelnfo Options

</Directory>

With user directories enabled, a user such as jdoe can create a personal home page by creating the file /home/jdoe/public_html/index.html. If jdoe has a user account on a UNIX host called pryor.acme.com that runs Apache with user directories enabled, jdoe’s personal home page would have the URL http://pryor.acme.com/~jdoe.

Virtual Hosts

A less well-known, but particularly useful, feature of Apache is its ability to support virtual hosts. With virtual hosts, a single UNIX host running Apache can serve multiple web sites with unique subdomain names. For example, the Products and Research Departments of acme.com can use the pryor.acme.com UNIX host to host both the http://products.acme.com/ and http://research.acme.com/web sites. This can be done by configuring Apache on pryor.acme.com with the products.acme.com and research.acme.com virtual hosts. Additionally, the two host names, products.acme.com and research.acme.com, must be associated with pryor.acme.com’s IP address on the acme.com domain name server (DNS). Configuration directives in httpd.conf on pryor.acme.com for the products.acme.com and research.acme.com virtual domains would need to include these lines:

NameVirtualHost 192.168.2.150

# 192.168.2.150 is the hypothetical numeric IP address for pryor.acme.com

<VirtualHost 192.168.2.150>

ServerName products.acme.com

Serveralias products

DocumentRoot /usr/local/apache2/htdocs/products

</VirtualHost>

<VirtualHost 192.168.2.150>

ServerName research.acme.com

Serveralias research

DocumentRoot /usr/local/apache2/htdocs/research

</VirtualHost>

CGI Support in Apache

The means for creating dynamic web content for things such as web applications are continually increasing. The Common Gateway Interface (CGI) was one of the first methods used for executing external programs that related to web pages, and it is still a well-used method due to its relative simplicity, as well as the continued popularity of the Perl language, which has traditionally been used to develop CGI programs. (Perl has been called the “duct tape of the Internet” because it is so widely used in web application development, mostly in the form of CGI programs.) CGI is a standard for interfacing external applications with information servers, such as HTTP or web servers. A CGI program is executed in real time so that the output it generates can dynamically become part of the HTML code that is served by a web server such as Apache. Common uses for CGI programs include providing access to a search engine or a database and parsing information that is entered into web forms. (See Chapter 27 for more about CGI scripts.)

There is a standard location for CGI scripts under Apache’s installation root directory.

If Apache was installed using Linux packages, the standard location for CGI scripts is typically /var/www/cgi-bin. Otherwise, if Apache was manually compiled and installed from the source code as prescribed in this chapter, the location would be /usr/local/apache2/cgi-bin. There is a httpd.conf directive called ScriptAlias that creates an alias for the cgi-bin directory to make cgi-bin accessible relative to the DocumentRoot. An example for an Apache installation on Linux follows:

ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

The ScriptAlias directory will contain the CGI programs that a web browser can request. The CGI programs run on the same server that Apache is running. CGI programs under Apache can be written in any programming language that is capable of determining the values of the UNIX environment variables. Common languages used for CGI programs include Perl, Python, and even C. The following shell script code is a quick CGI script that illustrates how an external program can be used to dynamically generate HTML code that Apache can serve on the network:

#!/bin/sh

echo 'Content-type: text/html'

echo

echo "<html><head><title>Hello World</title>"

echo "</head><body><hl>Hello World</h1></body></html>"

The script uses the standard Bourne shell built-in echo command. The first echo serves to inform the calling web browser of the output type (text/html) that will follow. The last two echo commands surround the string, “Hello World” with HTML code to display “Hello World” on the web browser title bar and in the web browser main window. As root, try saving this code to a file called hello_world.cgi in the directory that follows the ScriptAlias directive in httpd.conf, say /var/www/cgi-bin. CGI programs are called by the Apache httpd process. So this .cgi file needs to be made readable and executable for the user (apache, wwwdata, or nobody) that owns the Apache process. The quickest way is to use the chmod command:

# chmod o+rx hello_world.cgi

To actually call this CGI script, use a web browser on the Apache server machine to view the URL, http://localhost/cgi-bin/hello_world.cgi. The resulting browser window should resemble Figure 16–3.

Image from book
Figure 16–3: Output of hello_world.cgi in browser window

If the test CGI script can be successfully executed, your Apache installation should be ready to support more useful and high-quality CGI programs such as web discussion boards, weblogs, and wikis, many of them written in Perl and open sourced. Note that recent advances, such as FastCGI and mod_perl, have addressed performance issues that have been associated with running CGI programs.

CGI Security and Suexec

The impact of a web server on system security should always be a concern because an improperly configured web server can give anyone with a web browser undesirable read access to areas of a web server machine’s file system. This is why it is always recommended that Apache processes be owned by unprivileged users such as nobody. The security of CGI programs is of particular concern because of the potential for abusing CGI programs to write to file systems and to gain remote root access on web server machines. A web programmer should employ good programming practices so that CGI cannot be exploited to compromise system security With CGI programs there is also the question of access security As stated before, CGI programs are called from Apache, which is typically owned by a nonprivileged user such as www-data, apache, or nobody. In the preceding example, we made the hello_world. cgi executable (which was owned by root) world-readable so that the Apache process could read and execute it. Making any CGI program world-readable is problematic; some CGI programs need to have user IDs and passwords embedded in them. If the CGI program needs to read files from an Apache subdirectory, that subdirectory and its contents would also need to be made world-readable, and in some cases, world-writable. It is better to change the ownership of the CGI script to www-data, apache, or nobody, that is, to change the ownership to be the same as the Apache process user, and make it readable and executable for that user only. For the hello_world.cgi example, if the Apache process owner is nobody, you would want to run as root:

# chown nobody hello_world.cgi ; chmod 700 hello_world.cgi

You would also want to change the ownership and access modes of any Apache subdirectories and files to make them accessible to only nobody if they need to be accessed by hello_world.cgi.

Suexec

The suexec feature of Apache, which was introduced in version 1.2, allows for more flexible CGI access control. The use of suexec is particularly suited for private CGI programs that nonroot users are using or testing in their Apache user directories, ~/public_html. Normally, CGI programs run with the same user ID and privileges as Apache httpd. But with suexec enabled, Apache allows CGI programs to run with the user ID of the user who owns the CGI program. For instance, the user jdoe is testing the Perl CGI script myscript.pl that he has saved as ~/public_html/cgi-bin/myscript.pl on the pryor.acme.com UNIX host. Since jdoe is the owner of myscript.pl, when Apache executes myscript.pl through suexec, it will run with user ID jdoe instead of the normal CGI user (nobody, www-data, or apache). Because myscript.pl runs with user ID jdoe, it is able to access files and directories that are owned by jdoe; consequently, there is no need to make these files and directories world-readable or -writable, enhancing security. Suexec also performs several security checks on CGI programs before it runs them. It should be noted that for a normal user such as jdoe to be able to use Apache to serve CGI program out of the ~/public_html/cgi-bindirectory, a <Directory> entry such as the following must be added to httpd.conf:

<Directory "/home/jdoe/public_html/cgi-bin">

Options +ExecCGI

SetHandler cgi-script

</Directory>

After Apache is restarted on pryor.acme.com, jdoe will be able to test his myscript.pl script by using a web browser to request the URL, http://pryor.acme.com/~jdoe/cgi-bin/myscript.pl.

Password-Protected Web Pages with Basic Authentication

Apache provides a way to do simple password protection of selected web pages. This can be done using the Basic HTTP Authentication method. The easiest way to restrict access using one username and password requires you to create two hidden text files. The first file is called .htaccess and is placed in the directory you wish to restrict access to. For example, if the restricted directory is /usr/local/apache2/htdocs/restricted/, you would create the .htaccess file in that directory with the following possible contents:

AuthUserFile /usr/local/apache2/lib/.htpasswd

AuthGroupFile /dev/null

AuthName "Access restricted. Please log in."

AuthType Basic

<LIMIT GET>

require user AcmeRestricted

</LIMIT>

The bottom three lines indicate that only users who log in as AcmeRestricted will be able to access the directory that the .htaccess file is in. The top line that begins with AuthUserFile contains the location of the password file for AcmeRestricted. The AuthGroupFile line is used when you want to have multiple usernames. In this case, there is only one user name, so we point this line to /dev/null. The third line is the title of the authentication message box that would pop up in a web browser when the /usr/local/apache2/htdocs/restricted/ directory is requested. The fourth line indicates that this uses Basic Authentication.

The second file to be created is the .htpasswd file that is referred to in the first line of .htaccess. The htpasswd command that is part of the Apache installation can be used to generate the .htpasswd file. To create the .htpasswd file needed for this example, the command would be

# /usr/local/apache2/bin/htpasswd −c /usr/local/apache2/lib/.htpasswd

AcmeRestricted

When you run this command, you will be prompted to type in the password, which will be encrypted using the UNIX crypt function and inserted into the .htpasswd file. The restricted directory and also .htaccessand .htpasswd must be made readable for the Apache httpd process, which would typically mean making them readable for the nobody, www-data, or apache user.

Figure 16–4 shows the expected authentication login window that would be popped up by a web browser if Basic Authentication is set up correctly for the restricted directory.

Image from book
Figure 16–4: Apache’s basic authentication login window

Apache allows the use of more secure authentication methods beyond Basic Authentication. The Apache documentation recommends using at least HTTP Digest Authentication, which is provided by the mod_auth_digest module, though the documentation also states that Digest Authentication is still in an “experimental” state.

Apache and LAMP

Apache is an integral part of what has become an important web application development platform called LAMP, an acronym whose letters stand for Linux, Apache, MySQL, and Perl/Python/PHR The acronym is sometimes shortened to AMP since Apache, MySQL, and Perl/Python/PHP can run on all UNIX variants, not just Linux. The widely used MySQL database management system provides the back-end data storage for LAMP applications. In these LAMP applications, Perl/Python/PHP are used to write CGI programs or CGI-like programs that are executed by the Apache web server to interact with users (the web front end) and access data stored in MySQL (the database back end). Popular examples of LAMP applications are news/discussion forums such as Slashdot (http://slashdot.org/), content management systems such as PHP-Nuke (http://www.phpnuke.org/), and wiki engines such as Mediawiki (http://www.mediawiki.org/).

The most widely used language in LAMP applications is PHP (http://www.php.net/). Unlike Perl or Python, PHP was developed with web applications in mind. PHP was originally designed to be used in conjunction with a web server, to act as a filter that takes a file containing text and PHP instructions and converts it to HTML for display on a web browser. The most common way of running PHP programs in Apache is not through CGI, but through an Apache module that interprets PHP language instructions that are embedded in HTML documents. This section will step through the proper installation of the PHP module for Apache and should also give a general idea of how third-party Apache modules are built and integrated using apxs, the Apache Extension Tool mechanism.

On Linux distributions and BSD variants such as FreeBSD, installing PHP support for Apache is usually just a matter of installing the available PHP binary packages. On UNIX platforms on which you have manually compiled and installed Apache yourself, you will need to compile and install PHP with Apache support. The steps required to compile PHP and integrate it with Apache follow. Unless otherwise noted, you should be able to perform these steps as a normal (nonroot) user.

Step 1: Obtain the PHP Source Code

First, obtain the PHP source code from http://www.php.net/. As of mid-2006, the latest bzip2compressed tar archive for Apache was php-5.l.4.tar.bz2, so the following examples will assume that you have downloaded and saved php-5.1.4.tar.bz2 to a source directory The PHP tar.bz2 archive needs to be unarchived using the following command:

$ bzip2 −dc php-5.1.4.tar.bz2 | tar −vxf

This will extract the contents of the tar.bz2 archive into a new subdirectory called php-5.1.4.

Step 2: Configure the Source Code, Build, and Install

You should enter the new php-5.1.4 subdirectory that was just created. The INSTALL file found in the php-5.1.4 subdirectory contains useful information for building PHP to work with various web servers including Apache. The PHP build process begins with the included GNU autoconf system’s configure script. The configure script’s options can be viewed as follows:

$ ./configure --help | less

Assuming you are installing PHP in /usr/local/php-5.1.4, the following is a run of the configure script with the appropriate --prefix command switch and also the --with-apxs2 and --with-mysql command switches to interface with an existing Apache installation and an existing MySQL installation, respectively:

$ ./configure --prefix=/usr/local/php-5.1.4 \

--with-apxs2=/usr/local/apache2/bin/apxs --with-mysql

The --with-apxs2=/usr/local/apache2/bin/apxs command-line switch calls the Apache apxs tool, which is used for building and installing extension modules for Apache. The PHP build process uses apxs to build an Apache dynamic shared object (DSO) for PHP, which can then be loaded into the Apache web server at run time (through a directive in the Apache httpd.conf configuration file) to support the PHP language. The –with-mysql command switch will configure the PHP build to build PHP with MySQL database-specific support.

A successful run of configure will generate Makefiles to build and install PHP. After this you must run the make and make install commands. The build of PHP using make will take considerably longer than the build of Apache. The make install command must be executed as root:

$ make

(after becoming root)

# make install

The make install command will copy the compiled PHP executables, libraries, directory structure, and data files into the installation root directory that you specified with the configure --prefix command and option described previously, for example, /usr/local/php-5.1.4. In addition, the make install command will copy the PHP dynamic shared object or module called libphp5.so to the Apache module directory, for example, /usr/local/apache2/modules.

PHP options belong in a file called php.ini, which should be created in the just-created /usr/local/php-5.1.4/lib directory The PHP source code directory includes a default php.ini called php.ini-dist that can be copied into the PHP installation directory with the following command:

# cp php.ini-dist /usr/local/lib/php.ini

Step 3: Configure Apache Support for PHP

Apache needs to be configured to load the PHP module (libphp5.so) at startup to support the PHP language. This is accomplished by adding a Load directive to Apache’s httpd.conf file as follows:

LoadModule php5_module modules/libphp5.so

If you configured the PHP build with the --with-apxs2=/usr/local/apache2/bin/apxs option, this LoadModule line is automatically added to httpd.conf when you run make install as root in the PHP source directory

You also need to configure Apache to parse certain filename extensions as PHP. Most commonly, Apache is configured to parse the .php (and sometimes .phtml) extension as PHP by adding the following line to httpd.conf:

AddType application/x-httpd-php .php .phtml

As with other UNIX network services, when you change httpd.conf, you should restart Apache. If the Apache SysV init script was installed as /etc/init.d/apache2, you can restart Apache with the following command:

# /etc/init.d/apache2 restart

With the PHP module loaded, Apache will recognize and execute PHP programs that are embedded in HTML files that have a .php filename extension. The following is a simple PHP example file that will print “Hello World!” in a web browser window, followed by a call to the phpinfo() function to print the PHP configuration:

<html>

<head>

<title>Very simple PHP program</title>

</head>

<body>

<?php

print 'Hello World!';

phpinfo();

?>

</body>

</html>

If you save this HTML/PHP code to a file such as phpinfo.php under your Apache document root and load it using a web browser, you should see output similar to Figure 16–5, which will indicate that PHP has been installed correctly as an Apache module. Your Apache installation should now be ready to use a rich library of freely available PHP-based web applications that use MySQL as a data back end.

Image from book
Figure 16–5: PHP configuration information from phpinfo()

Apache Configuration Front Ends

The size and complexity of Apache’s configuration file, httpd.conf, can be daunting for beginning administrators. One way to manage the complexity of large UNIX configuration files has been to split the configuration file up into smaller parts and use “Include”-type statements in the main configuration file to bring the parts together into a whole. This approach makes the configuration system more modular. This is an approach that is being frequently used in mainstream Linux distributions. In these Linux distributions, the Apache httpd.conf can be just a container that “Includes” several other files.

An additional measure that can be taken to manage the complexity of large configuration files is to use some type of configuration “front end” that consists of a graphical user interface or web browser interface that displays Apache’s configuration options as graphical menus and drop-down items. Comanche (http://www.comanche.org/) is a graphical user interface application that can be used to configure Apache on UNIX platforms. Webmin (http://www.webmin.com/) is one of the better-known web browser-based front ends. Though Webmin is a general-purpose UNIX system administration interface, it has many standard modules to configure and administer common system services, including Apache. The browser window in Figure 16–6 shows a part of the web interface that Webmin provides to configure the core features of Apache as well as Apache’s bundled modules.

Image from book
Figure 16–6: Configuring Apache through Webmin

Apache Log Files

An Apache web site, particularly one that is exposed to the Internet, will generate extensive logs that you should be aware of and learn to interpret and manage. The Apache logs can reveal any errors that are generated by Apache at run time, possible security problems in the Apache configuration, the network bandwidth used by Apache, and other useful pieces of information.

The location of Apache logs can vary depending on the manner in which Apache was installed. The location can be /var/log/httpd, /var/log/apache, /var/log/apache2, or /usr/local/ apache2/logs (if installed from source code as prescribed in this chapter). There are three main log files for recent versions of Apache: access_log, error_log, and suexec.log. The largest of these log files, the access_log file, contains information on all HTML documents and objects that have been requested from the Apache httpd over the network using the HTTP protocol, the types of all HTTP requests, and the HTTP status codes associated with each request. The error_log file contains errors generated by Apache, including HTTP requests for nonexistent or restricted pages or objects. The access_log and error_log files both contain the numeric IP addresses of remote machines that sent HTTP requests to the httpd and the time and date stamps of those requests. An entry in the access_log file looks like this:

216.35.116.91 - - [19/Apr/2006:14:47:37 −0400] "GET / HTTP/1.0" 200 654

This entry shows a HTTP protocol “GET” method request (see Chapter 10) for the Apache document root (“/”) from the remote host at the numeric IP address 216.35.116.91 (probably a search engine) at 2:47 P.M. on April 19, 2006. The httpd status code “200” (one of many possible codes) signifies a successful transfer. The “654” is the total number of bytes that were transferred. The numeric IP addresses of remote requesting machines, rather than their hostnames, are logged because it can take a significant amount of time to look up and convert each numeric IP address to a hostname, and this would slow Apache’s performance significantly. Apache includes the logresolve command that you can use to convert the IP addresses to hostnames off-line. The following example usage of logresolve creates the file /tmp/access_log.hostnames from access_log:

# /usr/local/apache2/bin/logresolve < /usr/local/apache2/log/access_log >

/tmp/access_log.hostnames

The following is an entry in the error_log file that indicates a request for a nonexistent directory from the remote host at 69.93.197.146:

[Tue May 16 21:28:49 2006] [error] [client 69.93.197.146] File does not

exist: /var/www/html/blogs

The suexec.log file contains messages from Apache that are generated by the suexec facility. It is useful for debugging file permission problems with CGI applications that must run through suexec.

The Apache log files can grow very large over time, especially the access_log file, sometimes even filling up whole file systems on busy web sites if left alone. On Linux distributions, the Apache log files are usually archived and compressed as needed when they reach a certain size or age through the logresolve facility, which is typically executed nightly via a cron job. On UNIX systems, the native equivalent of logresolve should be used. On Solaris, the following logadm command limits the size of Apache’s access_log file to 10 MB. When access_log exceeds 10 MB, it will be renamed and compressed, and a new access_log file will be created:

# logadm −w /var/log/sshd_auth.log −s 10m −z 0

Summary

In this chapter, you learned about the Apache web server, its history and widespread usage today, how to install and configure Apache, and how to manage and interpret Apache’s log files. The chapter discussed ways in which Apache can be installed on Linux and UNIX systems, stepped through a manual compile and install process of Apache from its source code, looked at commonly used options in the Apache httpd.conf configuration file, and went through how a new installation of Apache can be tested using static HTML pages as well as Common Gateway Interface (CGI) scripts. The chapter discussed CGI security issues and how Apache’s suexec facility addresses these issues. The chapter also discussed password protection of web pages through Apache basic authentication. The chapter also stepped through the compile, install, and test process of the popular PHP web application language, which is commonly integrated with Apache in the widely used LAMP web application framework. The chapter concluded with a description of Apache’s log files, how to interpret the information found in them, and how to manage the disk space requirements of these potentially very large files.

How to Find Out More

You may find the following books useful as Apache references:

· Coar, Ken, and Rich Bowen. Apache Cookbook. Newton, MA: O’Reilly Media, Inc., 2003.

· Laurie, Ben, and Peter Laurie. Apache: The Definitive Guide. 3rd ed. Newton, MA: O’Reilly Media, Inc., 2002.

· Wainwright, Peter. Pro Apache. 3rd ed. Berkeley, CA: Apress, 2004.

A CGI reference is

· Hamilton, Jacqueline D. CGI Programming 101: Programming Perl for the World Wide Web. 2nd ed. Houston, TX: CGI101.com, 2004.

A more in-depth treatment of the LAMP framework can be found in the following:

· Glass, Michael K., Yann Le Scouarnec, Elizabeth Naramore, Gary Mailer, Jeremy Stolz, and Jason Gerner. Beginning PHP, Apache, MySQL Web Development. Hoboken, NJ: Wrox, 2004.

· Rosebrock, Eric, and Eric Filson. Setting Up LAMP: Getting Linux, Apache, MySQL, and PHP Working Together. Berkeley, CA: Sybex, 2004.

The Apache web site at http://httpd.apache.org/ contains much up-to-date Apache documentation. O’Reilly Media, Inc.’s http://www.onlamp.com/ is a well-maintained source of current information and tutorials on LAMP.