Running Linux, 5th Edition (2009)
Part III. Programming
Chapter 22. Running a Web Server
Chapter 13 of this book put you on a network. It may have been hard work, but the result was quite an accomplishment: your system is now part of a community. If you are connected to the Internet, the next step is to get access to all the riches this medium offers.
On local area networks both for self-contained organizations and the wider Internet, people generally agree that one of the most useful applications is the World Wide Web. We covered browsers in Chapters 3 and 5. One of the exciting things about Linux is that it facilitates setting up your own web server, the topic of this chapter.
The benefits of having a web server on your system are extensive. Not only can you put up documents and serve up information from databases in a manner that people on any system connected to you can view, but you can also run a range of other tools (for system administration, for instance) that allow remote administration of your system.
With any server, however, you must pay close attention to security, because small errors in configuration can let malicious crackers gain access to documents you don't want, deface web pages, or destroy data. Ponder Chapter 26 before allowing other systems access to your web server.
Configuring Your Own Web Server
Setting up your own web server consists of two tasks: configuring the httpd daemon and writing documents to provide on the server. We don't cover the basics of HTML in this book, because knowledge of it is widespread and many people use GUI tools to help them. But we do discuss the basics of dynamic content (web pages created on the fly from databases) in Chapter 25.
httpd is the daemon that services HTTP requests on your machine. Any document accessed with an HTTPURL is retrieved using httpd. Likewise, FTP URLs are accessed using ftpd, Gopher URLs using gopherd, and so on. There is no single web daemon; each URL type uses a separate daemon to request information from the server.
Many HTTP servers are available. The one discussed here is the Apache server, which is easy to configure and very flexible. There are two major versions of Apache HTTP: the 1.3 family is the older and more widely used, whereas 2.x brings a range of features useful to higher-end sites. The instructions here are valid for either version.
All Linux versions should carry Apache today as their default httpd server. However, if you have selected a "minimal" or "desktop" install, it might not have been installed during the installation procedure, and you might have to install it manually afterward. Or you may want to have a newer version than the one that your distribution carries; for example, you might want the latest version in order to be more secure. In that case, you can download both sources and binaries from http://httpd.apache.org and build it yourself. The http://httpd.apache.org web site contains complete documentation for the software.
Apache: The Definitive Guide, by Ben Laurie and Peter Laurie (O'Reilly), covers everything about Apache, including sophisticated configuration issues.
Where the various files of an Apache installation go depends on your distribution or the package you installed, but the following is a common setup. You should locate the various pieces in your system before continuing.
The binary executable, which is the server itself. On Debian, this is /usr/ sbin/apache instead.
Contains the configuration files for httpd, most notably httpd.conf. We discuss how to modify these files later. On Debian systems, this is /etc/apache instead of /etc/httpd.
Contains the HTML scripts to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.
Holds logfiles stored by the server.
Our task now is to modify the configuration files in the configuration subdirectory. You should notice at least the following four files in this directory: access.conf-dist, httpd.conf-dist, mime.types, and srm.conf-dist. (Newer versions of Apache 1.3.x have abandoned the -dist suffix in favor of the .default suffix, and Apache 2.x places a -std fragment before the extension.) Copy the files with names ending in -dist and modify them for your own system. For example, httpd.conf -dist is copied to httpd.conf and edited.
The latest version of Apache pretty much configures itself, but in case things go wrong, we tell you here how to do it manually so that you can fix things yourself.
At http://httpd.apache.org, you will find complete documentation on how to configure httpd. Here, we present sample configuration files that correspond to an actual running httpd.
The file httpd.conf is the main server-configuration file. First, copy httpd.conf-dist to httpd.conf and edit it. We only cover some of the more important options here; the file httpd.conf-dist is vastly commented.
The ServerType directive is used to specify how the server will run—either as a standalone daemon (as seen here) or from inetd. For various reasons, it's usually best to run httpd in standalone mode. Otherwise, inetd must spawn a new instance of httpd for each incoming connection.
One tricky item here is the port number specification. You may wish to run httpd as a user other than root (that is, you may not have root access on the machine in question and wish to run httpd as yourself). In this case, you must use a port numbered 1024 or above. For example, if we specify:
we may run httpd as a regular user. In this case, HTTP URLs to this machine must be specified as in the following example:
If no port number is given in the URL (as is the usual case), port 80 is assumed.
we specify the DocumentRoot directive, where documents to be provided via HTTP are stored. These documents are written in HTML.
For example, if someone were to access the URL:
the actual file accessed would be /usr/local/httpd/htdocs/fruits.html.
The UserDir directive specifies a directory each user may create in his home directory for storing public HTML files. For example, if we were to use the URL:
the actual file accessed would be ~mdw/public_html/linux-info.html.
The following lines enable the indexing features of httpd :
# If a URL is received with a directory but no filename, retrieve this
# file as the index (if it exists).
# Turn on 'fancy' directory indexes
In this case, if a browser attempts to access a directory URL, the file index.html in that directory is returned, if it exists. Otherwise, httpd generates a "fancy" index with icons representing various file types. Figure 5-2 shows an example of such an index.
Icons are assigned using the AddIcon directive, as seen here:
# Set up various icons for use with fancy indexes, by filename
# E.g., we use DocumentRoot/icons/movie.xbm for files ending
# in .mpg and .qt
AddIcon /icons/movie.xbm .mpg
AddIcon /icons/back.xbm ..
AddIcon /icons/menu.xbm ^^DIRECTORY^^
AddIcon /icons/blank.xbm ^^BLANKICON^^
The icon filenames (such as /icons/movie.xbm) are relative to DocumentRoot by default. (There are other ways to specify pathnames to documents and icons—for example, by using aliases. This is discussed later.) There is also an AddIconByType directive, which lets you specify an icon for a document based on the document's MIME type, and an AddIconByEncoding directive, which lets you specify an icon for a document based on the document's encoding (i.e., whether and how it is compressed).
You can also specify an icon to be used when none of the above matches. This is done with the DefaultIcon directive.
The optional ReadmeName and HeaderName directives specify the names of files to be included in the index generated by httpd:
Here, if the file README.html exists in the current directory, it will be appended to the index. The file README will be appended if README.html does not exist. Likewise, HEADER.html or HEADER will be included at the top of the index generated by httpd. You can use these files to describe the contents of a particular directory when an index is requested by the browser:
# Local access filename.
# Default MIME type for documents.
The AccessFileName directive specifies the name of the local configuration file for each directory. (This is described later in this chapter.) The DefaultType directive specifies the MIME type for documents not listed in mime.types.
The following lines specify directories for useful files:
# Set location of icons.
# Set location of CGI binaries.
The Alias directive specifies an alias for any of the files that would normally not be visible through the web server. Earlier, we used the AddIcon directive to set icon names using pathnames such as /icons/movie.xbm. Here, we specify that the pathname /icons/ should be translated to/usr/local/html/icons/. Therefore, the various icon files should be stored in the latter directory. You can use Alias to set aliases for other pathnames as well.
The ScriptAlias directive is similar, but it sets the actual location of CGI scripts on the system. Here, we wish to store scripts in the directory /usr/local/httpd/cgi-bin/. Any time a URL is used with a leading directory component of /cgi-bin/, it is translated into the actual directory name. More information on CGI and scripts is included in the book CGI Programming with Perl, by Scott Guelich, Shishir Gundavaram, and Gunther Birznieks (O'Reilly).
<Directory> entries specify the options and attributes for a particular directory, as in the following:
# Set options for the cgi-bin script directory.
Options Indexes FollowSymLinks
Here, we specify that the CGI script directory should have the access options Indexes and FollowSymLinks. A number of access options are available. These include the following:
Symbolic links in this directory should be followed to retrieve the documents to which they point. This option is not entirely safe to use on multiuser systems because it allows any user to create a link to some other file or directory (e.g., /etc/passwd). Use SymLinksIfOwnerMatch as a safer (but slightly slower) alternative.
Symbolic links in this directory should be followed only if the target file or directory is owned by the same user ID as the link.
Allow the execution of CGI scripts from this directory.
Allow indexes to be generated from this directory.
Disable all options for this directory.
Enable all options for this directory.
There are other options as well; see the httpd documentation for details.
Next, we configure a very strict default configuration for the complete filesystem.
# Default configuration
# Turn all features off
# Do not allow local files to override configuration.
# In fact, do not allow access
to any of the files.
Deny from all
We have started by denying access to the complete filesystem. Now we proceed to explicitly allow access to the files we want Apache to serve. At the very least we need to enable several options and other attributes for /usr/local/httpd/htdocs, the directory containing our HTML documents. This configuration applies to the base directory and the subdirectories below it.
# Configuration for the web server files.
# Allow automatic indexes and controlled symbolic links.
Options Indexes SymLinksIfOwnerMatch
# Allow the local access file, .htaccess, to override
# any attributes listed here.
# Allow unrestricted access to files in this directory.
Allow from all
Here, we turn on the Indexes and SymLinksIfOwnerMatch options for this directory. The AllowOverride option allows the local access file (named .htaccess) in each directory that contains documents to override any of the attributes given here. The .htaccess file has essentially the same format as the global configuration but applies only to the directory in which it is located. This way, we can specify attributes for particular directories by including a .htaccess file in those directories instead of listing the attributes in the global file.
The primary use for local access files is to allow individual users to set the access permissions for personal HTML directories (such as ~/public_html) without having to ask the system administrator to modify the global access file. Security issues are associated with this, however. For example, a user might enable access permissions in her own directory such that any browser can run expensive server-side CGI scripts. If you disable the AllowOverride feature, users cannot get around the access attributes specified in the global configuration. This can be done by using:
which effectively disables local .htaccess files.
The <Limit GET> field is used to specify access rules for browsers attempting to retrieve documents from this server. In this case, we specify Order allow,deny, which means that allow rules should be evaluated before deny rules. We then instate the rule Allow from all, which simply means any host may retrieve documents from the server. If you wish to deny access from a particular machine or domain, you could add the line:
Deny from ..nuts.com biffnet.biffs-house.us
The first entry denies access from all sites in the nuts.com domain. The second denies access from the site biffnet.biffs-house.us.
srm.conf and access.conf
The srm.conf and access.conf files should be kept empty. In earlier Apache versions, srm.conf stood for Server Resource Map and listed facilities provided by the server, and access.conf controlled access to Apache files. All the resources originally placed in those files are now listed in the main httpd .conf file.
Now you're ready to run httpd, allowing your machine to service HTTP URLs. As mentioned previously, you can run httpd from inetd or as a standalone server. Here, we describe how to run httpd in standalone mode.
All that's required to start httpd is to run the command:
httpd -f configuration-file
where configuration-file is the pathname of httpd.conf. For example:
/usr/sbin/httpd -f /etc/httpd/httpd.conf
starts up httpd, with configuration files found in /etc/httpd.
Watch the httpd error logs (the location of which is given in httpd.conf) for any errors that might occur when trying to start up the server or when accessing documents. Remember you must run httpd as root if it is to use a port numbered 1023 or less. Once you have httpd working to your satisfaction, you can start it automatically at boot time by including the appropriate httpd command line in one of your system rc files, such as /etc/init.d/boot.local.
Apache also provides a utility called apachectl that is more convienent for starting, stopping, and reloading the httpd process. In particular, calling:
is a good way of checking whether the configuration file is actually correct before starting the server. Finally, we should mention that you can also start, restart, and stop Apache by using /etc/init.d/apache plus one of the parameters start, restart, or stop.
Of course, in order to request documents via HTTP from your browser, you'll need to write them, something that we cannot cover in this book. Two good sources for HTML information are the O'Reilly books HTML & XML: The Definitive Guide and HTML Pocket Reference by Jennifer Niederst. To set up a back-end database to your web server, start with Chapter 25.