Apache Web Server Management - Ubuntu as a Server - Ubuntu Unleashed 2017 Edition (2017)

Ubuntu Unleashed 2017 Edition (2017)

Part IV: Ubuntu as a Server

Chapter 24. Apache Web Server Management


In This Chapter

Image About the Apache Web Server

Image Installing the Apache Server

Image Starting and Stopping Apache

Image Runtime Server Configuration Settings

Image File System Authentication and Access Control

Image Apache Modules

Image Virtual Hosting

Image Logging

Image HTTPS

Image References


This chapter covers the configuration and management of the Apache web server and includes an overview of some of the major components of the server and discussions of text-based and graphical server configuration. In this chapter, you learn how to start, stop, and restart Apache using the command line. The chapter begins with some introductory information and then shows you how to install, configure, and use Apache.

About the Apache Web Server

Apache is the most widely used web server on the Internet today.

The name Apache appeared during the early development of the software because it was “a patchy” server, made up of patches for the freely available source code of the NCSA HTTPd web server. For a while after the NCSA HTTPd project was discontinued, a number of people wrote a variety of patches for the code, to either fix bugs or add features they wanted. A lot of this code was floating around and people were freely sharing it, but it was completely unmanaged.

After a while, Brian Behlendorf and Cliff Skolnick set up a centralized repository of these patches, and the Apache project was born. The project is still composed of a small core group of programmers, but anyone is welcome to submit patches to the group for possible inclusion in the code.

There has been a surge of interest in the Apache project over the past several years, partially buoyed by a new interest in open source on the part of enterprise-level information services. It’s also due in part to crippling security flaws found in Microsoft’s Internet Information Services (IIS); the existence of malicious web task exploits; and operating system and networking vulnerabilities to the now-infamous Code Red, Blaster, and Nimda worms. IBM made an early commitment to support and use Apache as the basis for its web offerings and has dedicated substantial resources to the project because it makes more sense to use an established, proven web server.

In mid-1999, the Apache Software Foundation was incorporated as a nonprofit company. A board of directors, who are elected on an annual basis by the ASF members, oversees the company. This company provides a foundation for several open-source software development projects, including the Apache Web Server project.


Tip

You can find an overview of Apache in its FAQs at https://wiki.apache.org/httpd/FAQ. In addition to extensive online documentation, you’ll find the complete documentation for Apache in the HTML directory of your Apache server. If you have Apache running on your system, you can access this documentation by looking at http://localhost/manual/index.html.


To determine the precise version of Apache included with your system, use:

matthew@seymour:~$ apache2 -v

Installing the Apache Server

Install the apache2 package from the Ubuntu software repositories. Updated packages usually contain important bug and security fixes. When an updated version is released, install it as quickly as possible to keep your system secure.


Note

Check the Apache site for security reports. Browse to http://httpd.apache.org/security_report.html for links to security vulnerabilities for Apache 2.0, 2.2, and 2.4. Subscribe to a support list or browse through up-to-date archives of all Apache mailing lists at http://httpd.apache.org/mail/ (for various articles) or http://httpd.apache.org/lists.html (for comprehensive and organized archives).



Caution

You should be wary of installing experimental packages, and never install them on production servers (that is, servers used in “real life”). Very carefully test the packages beforehand on a host that is not connected to a network!


For more information about installing software from the Ubuntu repositories, see Chapter 9, “Managing Software.”


Note

If you are upgrading to a newer version of Apache, APT does not write over your current configuration files.


Starting and Stopping Apache

At this point, you have installed your Apache server with its default configuration. Ubuntu provides a default home page at /var/www/html/index.html as a test.

You can start Apache from the command line of a text-based console or X terminal window, and you must have root permission to do so. How you will do so depends on the release version of Ubuntu that you are running. For Ubuntu 16.04 and later, we use systemd commands. For earlier Ubuntu releases like 12.04 and 14.04 that used Upstart, we use Upstart commands. Some prefer to use apache2ctl commands, which work across most distributions.

Image

The server daemon, apache2, recognizes several command-line options you can use to set some defaults, such as specifying where apache2 reads its configuration directives. The Apache apache2 executable also understands other options that enable you to selectively use parts of its configuration file, specify a different location of the actual server and supporting files, use a different configuration file (perhaps for testing), and save startup errors to a specific log. The -v option causes Apache to print its development version and quit. The -V option shows all the settings that were in effect when the server was compiled.

The -h option prints the following usage information for the server:

Click here to view code image

matthew@seymour:~$ apache2 -h
Usage: apache2 [-D name] [-d directory] [-f file]
[-C "directive"] [-c "directive"]
[-k start|restart|graceful|stop]
[-v] [-V] [-h] [-l] [-L] [-t]
Options:
-D name : define a name for use in <IfDefine name> directives
-d directory : specify an alternate initial ServerRoot
-f file : specify an alternate ServerConfigFile
-C "directive" : process directive before reading config files
-c "directive" : process directive after reading config files
-e level : show startup errors of level (see LogLevel)
-E file : log startup errors to file
-v : show version number
-V : show compile settings
-h : list available command line options (this page)
-l : list compiled in modules
-L : list available configuration directives
-t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings)
-t : run syntax check for config files

Other options include listing Apache’s static modules, or special, built-in independent parts of the server, along with options that can be used with the modules. These options are called configuration directives and are commands that control how a static module works. Note that Apache also includes a large number of dynamic modules, or software portions of the server that can be optionally loaded and used while the server is running.

The -t option is used to check your configuration files. It’s a good idea to run this check before restarting your server, especially if you’ve made changes to your configuration files. Such tests are important because a configuration file error can result in your server shutting down when you try to restart it. There is a bug in the internal username settings for apache2 in Ubuntu that gives you this error if you enter the following:

Click here to view code image

matthew@seymour:~$ sudo apache2 -t
apache2: bad user name ${APACHE_RUN_USER}

If this happens to you, enter the command this way to force the command to use the expected username settings and you will get the proper output:

Click here to view code image

matthew@seymour:~$ sudo APACHE_RUN_USER=www-data APACHE_RUN_GROUP=www-data apache2 -t

Runtime Server Configuration Settings

At this point, the Apache server will run, but perhaps you want to change a behavior, such as the default location of your website’s files. This section covers the basics of configuring the server to work the way you want it to work.

Runtime configurations are stored in just one file—apache2.conf, which is under the /etc/apache2 directory. You can use this configuration file to control the default behavior of Apache, such as the web server’s base configuration directory (/etc/apache2), the name of the server’s PID file (/var/run/apache2.pid), or its response timeout (300 seconds). Apache reads the data from the configuration file when started (or restarted).

Runtime Configuration Directives

You perform runtime configuration of your server with configuration directives, which are commands that set options for the apache2 daemon. The directives are used to tell the server about various options you want to enable, such as the location of files important to the server configuration and operation. Apache supports nearly 300 configuration directives using the following syntax:

directive option option...

Each directive is specified on a single line. See the following sections for some directive examples and how to use them. Some directives only set a value such as a filename, whereas others enable you to specify various options. Some special directives, called sections, look like HTML tags. Section directives are surrounded by angle brackets, such as <Directory>. Sections usually enclose a group of directives that apply only to the directory specified in the section:

Click here to view code image

<Directory somedir/in/your/tree>
directive option option
directive option option
</Directory>

All sections are closed with a matching section tag that looks like this: </Directory>. Note that section tags, like any other directives, are specified one per line.


Tip

Apache is configured with an alias that lets you view the documentation installed in /usr/share/doc using your web browser at localhost/manual. After installing and starting Apache, you can find an index of directives at http://localhost/manual/mod/directives.html.


Editing apache2.conf

Most of the default settings in the config file are okay to keep, particularly if you’ve installed the server in a default location and aren’t doing anything unusual on your server. The file includes clear comments describing most of the settings. In general, if you do not understand what a particular directive is for, leave it set to the default value.

The following sections describe some of the configuration file settings you might want to change concerning operation of your server.

ServerRoot

The ServerRoot directive sets the absolute path to your server directory. This directive tells the server where to find all the resources and configuration files. Many of these resources are specified in the configuration files relative to the ServerRoot directory.

Your ServerRoot directive should be set to /etc/apache2 if you installed the Ubuntu package or /usr/local/apache (or whatever directory you chose when you compiled Apache) if you installed from the source. This is commented out in the file, but apache2 -V shows that this default has been compiled into the package.

Listen

The Listen directive is actually in a file called ports.conf that is included from apache2.conf and indicates on which port you want your server to run. By default, this is set to 80, which is the standard HTTP port number. You might want to run your server on another port—for example, when running a test server that you don’t want people to find by accident. Do not confuse this with real security! See the “File System Authentication and Access Control” section for more information about how to secure parts of your web server.

User and Group

The User and Group directives should be set to the UID and GID the server will use to process requests.

In Ubuntu, set these configurations to a user with few or no privileges. In this case, they’re set to user www-data and group www-data—a user defined specifically to run Apache. If you want to use a different UID or GID, be aware that the server will run with the permissions of the user and group set here. That means in the event of a security breach, whether on the server or (more likely) in your own CGI programs, those programs will run with the assigned UID. If the server runs as root or some other privileged user, someone can exploit the security holes and do nasty things to your site. Always think in terms of the specified user running a command such as rm -rf /because that would wipe all files from your system. That should convince you that leaving apache as a user with no privileges is probably a good thing.

Instead of specifying the User and Group directives using names, you can specify them using the UID and GID numbers. If you use numbers, be sure that the numbers you specify correspond to the user and group you want and that they’re preceded by the pound (#) symbol.

Here’s how these directives look if specified by name:

User apache
Group apache

Here’s the same specification by UID and GID:

User #48
Group #48


Tip

If you find a user on your system (other than root) with a UID and GID of 0, your system has been compromised by a malicious user.


ServerAdmin

The ServerAdmin directive should be set to the address of the webmaster managing the server. This address should be a valid email address or alias, such as webmaster@matthewhelmke.com, because this address is returned to a visitor when a problem occurs on the server.

ServerName

The ServerName directive sets the hostname the server will return. Set it to a fully qualified domain name (FQDN). For example, set it to www.your.domain rather than simply www. This is particularly important if this machine will be accessible from the Internet rather than just on your local network.

You do not need to set this unless you want a name other than the machine’s canonical name returned. If this value isn’t set, the server will figure out the name by itself and set it to its canonical name. However, you might want the server to return a friendlier address, such as www.your.domain. Whatever you do, ServerName should be a real Domain Name Service (DNS) name for your network. If you’re administering your own DNS, remember to add an alias for your host. If someone else manages the DNS for you, ask that person to set this name for you.

DocumentRoot

Set this directive to the absolute path of your document tree, which is the top directory from which Apache will serve files. By default, it’s set to /var/www/. If you built the source code yourself, DocumentRoot is set to /usr/local/apache/htdocs (if you did not choose another directory when you compiled Apache).

UserDir

The UserDir directive disables or enables and defines the directory (relative to a local user’s /home directory) where that user can put public HTML documents. It is relative because each user has her own HTML directory. This setting is disabled by default but can be enabled to store user web content under any directory.

The default setting for this directive, if enabled, is public_html. Each user can create a directory called public_html under her /home directory, and HTML documents placed in that directory are available as

http://servername/~username

where

username

is the username of the particular user.

DirectoryIndex

The DirectoryIndex directive indicates which file should be served as the index for a directory, such as which file should be served if the URL

Click here to view code image

http://servername/_SomeDirectory/ is requested.

It is often useful to put a list of files here so that if index.html (the default value) isn’t found, another file can be served instead. The most useful application of this is to have a CGI program run as the default action in a directory. If you have users who make their web pages on Windows, you might want to add index.htm as well. In that case, the directive looks like DirectoryIndex index.html index.cgi index.htm.

Apache Multiprocessing Modules

Apache version 2.0 and later now uses a new internal architecture supporting multiprocessing modules (MPMs). These modules are used by the server for a variety of tasks, such as network and process management, and are compiled into Apache. MPMs enable Apache to work much better on a wider variety of computer platforms, and they can help improve server stability, compatibility, and scalability.

Apache can use only one MPM at any time. These modules are different from the base set included with Apache (see the “Apache Modules” section later in this chapter) but are used to implement settings, limits, or other server actions. Each module in turn supports numerous additional settings, called directives, which further refine server operation.

The internal MPM modules relevant for Linux include the following:

Image mpm_common—A set of 20 directives common to all MPM modules

Image prefork—A nonthreaded, preforking web server that works similar to earlier (1.3) versions of Apache

Image worker—Provides a hybrid multiprocess multithreaded server

MPM enables Apache to be used on equipment with fewer resources yet still handle massive numbers of hits and provide stable service. The worker module provides directives to control how many simultaneous connections your server can handle.


Note

Other MPMs are available for Apache related to other platforms, such as mpm_netware for NetWare hosts and mpm_winnt for NT platforms. An MPM named perchild, which provides user ID assignment to selected daemon processes, is under development. For more information, browse to the Apache Software Foundation’s home page at www.apache.org.


Using .htaccess Configuration Files

Apache also supports special configuration files, known as .htaccess files. Almost any directive that appears in apache2.conf can appear in an .htaccess file. This file, specified in the AccessFileName directive in apache2.conf sets configurations on a per-directory (usually in a user directory) basis. As the system administrator, you can specify both the name of this file and which of the server configurations can be overridden by the contents of this file. This is especially useful for sites in which there are multiple content providers and you want to control what these people can do with their space.

To limit which server configurations the .htaccess files can override, use the AllowOverride directive. AllowOverride can be set globally or per directory. For example, in your apache2.conf file, you could use the following:

Click here to view code image

# Each directory to which Apache has access can be configured with respect
# to which services and features are allowed and/or disabled in that
# directory (and its subdirectories).
#
# First, we configure the "default" to be a very restrictive set of
# permissions.
#
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>

Options Directives

To configure which configuration options are available to Apache by default, you must use the Options directive. Options can be None; All; or any combination of Indexes, Includes, FollowSymLinks, ExecCGI, and MultiViews. MultiViews are not included in All and must be specified explicitly. These options are explained in Table 24.1.

Image

TABLE 24.1 Switches Used by the Options Directive


Note

These directives also affect all subdirectories of the specified directory.


AllowOverrides Directives

The AllowOverrides directives specify which configuration options .htaccess files can override. You can set this directive individually for each directory. For example, you can have different standards about what can be overridden in the main document root and in UserDir directories.

This capability is particularly useful for user directories, where the user does not have access to the main server configuration files.

AllowOverrides can be set to All or any combination of Options, FileInfo, AuthConfig, and Limit. These options are explained in Table 24.2.

Image

TABLE 24.2 Switches Used by the AllowOverrides Directive

File System Authentication and Access Control

You’re likely to include material on your website that isn’t supposed to be available to the public. You must be able to lock out this material from public access and provide designated users with the means to unlock the material. Apache provides two methods for accomplishing this type of access: authentication and authorization. You can use different criteria to control access to sections of your website, including checking the client’s IP address or hostname, or requiring a username and password. This section briefly covers some of these methods.


Caution

Allowing individual users to put web content on your server poses several important security risks. If you’re operating a web server on the Internet rather than on a private network, read the WWW Security FAQ at www.w3.org/Security/Faq/www-security-faq.html


Restricting Access with Require

One of the simplest ways to limit access to website material is to restrict access to a specific group of users, based on IP addresses or hostnames. Apache uses the Require directive to accomplish this. Here are some examples with comments, that could be placed within the apache2.conf file.

Click here to view code image

<RequireAll>
Require all granted #permit all to access
Require not ip 10.252.46.163 #except from this ip address
Require not host horriblepeople.com #and also not from this domain
Require not host gov #and finally, not from any .gov
</RequireAll>

There are many options beyond RequireAll, including RequireAny, and RequireNone along with a detailed set of options for each. For more, see https://httpd.apache.org/docs/2.4/howto/access.html.

Authentication

Authentication is the process of ensuring that visitors really are who they claim to be. You can configure Apache to allow access to specific areas of web content only to clients who can authenticate their identity. There are several methods of authentication in Apache; Basic Authentication is the most common (and the method discussed in this chapter).

Under Basic Authentication, Apache requires a user to supply a username and a password to access the protected resources. Apache then verifies that the user is allowed to access the resource in question. If the username is acceptable, Apache verifies the password. If the password also checks out, the user is authorized and Apache serves the request.

HTTP is a stateless protocol; each request sent to the server and each response is handled individually, and not in an intelligent fashion. Therefore, the authentication information must be included with each request. That means each request to a password-protected area is larger and therefore somewhat slower. To avoid unnecessary system use and delays, protect only those areas of your website that absolutely need protection.

To use Basic Authentication, you need a file that lists which users are allowed to access the resources. This file is composed of a plain text list containing name and password pairs. It looks very much like the /etc/passwd user file of your Linux system.


Caution

Do not use /etc/passwd as a user list for authentication. When you’re using Basic Authentication, passwords and usernames are sent as base64-encoded text from the client to the server (which is just as readable as plain text). The username and password are included in each request that is sent to the server. So, anyone who might be snooping on Net traffic would be able to get this information!


To create a user file for Apache, use the htpasswd command. This is included with the Apache package. Running htpasswd without any options produces the following output:

Click here to view code image

Usage:
htpasswd [-cmdps] passwordfile username
htpasswd -b[cmdps] passwordfile username password

htpasswd -n[mdps] username
htpasswd -nb[mdps] username password
-c Create a new file.
-n Don't update file; display results on stdout.
-m Force MD5 encryption of the password.
-d Force CRYPT encryption of the password (default).
-p Do not encrypt the password (plaintext).
-s Force SHA encryption of the password.
-b Use the password from the command line rather than prompting for it.
-D Delete the specified user.
On Windows, TPF and NetWare systems the '-m' flag is used by default.
On all other systems, the '-p' flag will probably not work.

As you can see, it is not a difficult command to use. For example, to create a new user file named gnulixusers with a user named wsb, you need to do something like this:

Click here to view code image

matthew@seymour:~$ sudo htpasswd -c gnulixusers wsb

You are then prompted for a password for the user. To add more users, repeat the same procedure, only omitting the -c flag.

You can also create user group files. The format of these files is similar to that of /etc/groups. On each line, enter the group name, followed by a colon (:), and then list all users, with each user separated by spaces. For example, an entry in a user group file might look like this:

Click here to view code image

gnulixusers: wsb pgj jp ajje nadia rkr hak

Now that you know how to create a user file, it’s time to look at how Apache might use this to protect web resources.

To point Apache to the user file, use the AuthUserFile directive. AuthUserFile takes the file path to the user file as its parameter. If the file path is not absolute—that is, beginning with a /—it is assumed that the path is relative to the ServerRoot. Using the AuthGroupFile directive, you can specify a group file in the same manner.

Next, use the AuthType directive to set the type of authentication to be used for this resource. Here, the type is set to Basic.

Now you need to decide to which realm the resource will belong. Realms are used to group different resources that will share the same users for authorization. A realm can consist of just about any string. The realm is shown in the Authentication dialog box on the user’s web browser. Therefore, you should set the realm string to something informative. The realm is defined with the AuthName directive.

Finally, state which type of user is authorized to use the resource. You do this with the require directive. The three ways to use this directive are as follows:

Image If you specify valid-user as an option, any user in the user file is allowed to access the resource (that is, provided she also enters the correct password).

Image You can specify a list of users who are allowed access with the users option.

Image You can specify a list of groups with the group option. Entries in the group list, as well as the user list, are separated by a space.

Returning to the server-status example you saw earlier, instead of letting users access the server-status resource based on hostname, you can require the users to be authenticated to access the resource. You can do so with the following entry in the configuration file:

Click here to view code image

<Location /server-status>
SetHandler server-status
AuthType Basic
AuthName "Server status"
AuthUserFile "gnulixusers"
Require valid-user
</Location>

Final Words on Access Control

If you have host-based as well as user-based access protection on a resource, the default behavior of Apache is to require the requester to satisfy both controls. But assume that you want to mix host-based and user-based protection and allow access to a resource if either method succeeds. You can do so using the satisfy directive. You can set the satisfy directive to All (this is the default) or Any. When set to All, all access control methods must be satisfied before the resource is served. If satisfy is set to Any, the resource is served if any access condition is met.

Here’s another access control example, again using the previous server-status example. This time, you combine access methods so all users from the Gnulix domain are allowed access and those from outside the domain must identify themselves before gaining access. You can do so with the following:

Click here to view code image

<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from gnulix.org
AuthType Basic
AuthName "Server status"
AuthUserFile "gnulixusers"
Require valid-user
Satisfy Any
</Location>

There are more ways to protect material on your web server, but the methods discussed here should get you started and will probably be more than adequate for most circumstances. Look to Apache’s online documentation for more examples of how to secure areas of your site.

Apache Modules

The Apache core does relatively little; Apache gains its functionality from modules. Each module solves a well-defined problem by adding necessary features. By adding or removing modules to supply the functionality you want Apache to have, you can tailor Apache server to suit your exact needs.

A number of core modules are included with the basic Apache server. Many more are available from other developers. The Apache Module Registry is a repository for add-on modules for Apache; you can find it at http://modules.apache.org/. The modules are stored in the /usr/lib/apache2/modules directory.

Each module adds new directives that you can use in your configuration files. As you might guess, there are far too many extra commands, switches, and options to describe them all in this chapter. The following sections briefly describe a subset of those modules available with Ubuntu’s Apache installation.

To enable a module, use this command:

Click here to view code image

matthew@seymour:~$ sudo a2enmod module_name

To disable a module, use this:

Click here to view code image

matthew@seymour:~$ sudo a2dismod module_name

Note that these want the actual name of the module, not the filename; for example, mod_version.so is the filename, but version is the name of the module. You have to know the name of the module to use either command, but in most cases it is as simple as the difference in this example. Also, after you run either command, you need to restart apache2 to activate the new configuration.

mod_access

mod_access controls access to areas on your web server based on IP addresses, hostnames, or environment variables. For example, you might want to allow anyone from within your own domain to access certain areas of your web. See the “File System Authentication and Access Control” section for more information.

mod_alias

mod_alias manipulates the URLs of incoming HTTP requests, such as redirecting a client request to another URL. It also can map a part of the file system into your web hierarchy. For example, the following fetches contents from the /home/wsb/graphics directory for any URL that starts with /images/:

Alias /images/ /home/wsb/graphics/

This is done without the client knowing anything about it. If you use a redirection, the client is instructed to go to another URL to find the requested content. You can accomplish more advanced URL manipulation with mod_rewrite.

mod_asis

mod_asis is used to specify, in fine detail, all the information to be included in a response. This completely bypasses any headers Apache might have otherwise added to the response. All files with an .asis extension are sent straight to the client without any changes.

As a short example of the use of mod_asis, assume you’ve moved content from one location to another on your site. Now you must inform people who try to access this resource that it has moved, as well as automatically redirect them to the new location. To provide this information and redirection, you can add the following code to a file with a .asis extension:

Click here to view code image

Status: 301 No more old stuff!
Location: http://gnulix.org/newstuff/
Content-type: text/html

<HTML>
<HEAD>
<TITLE>We've moved...</TITLE>
</HEAD>
<BODY>
<P>We've moved the old stuff and now you'll find it at:</P>
<A HREF="http://gnulix.org/newstuff/">New stuff</A>!.
</BODY>
</HTML>

mod_auth

mod_auth uses a simple user authentication scheme, referred to as Basic Authentication, which is based on storing usernames and encrypted passwords in a text file. This file looks very much like UNIX’s /etc/passwd file and is created with the htpasswd command. See the “File System Authentication and Access Control” section earlier in this chapter for more information about this subject.

mod_auth_anon

The mod_auth_anon module provides anonymous authentication similar to that of anonymous FTP. The module enables you to define user IDs of those who are to be handled as guest users. When such a user tries to log on, he is prompted to enter his email address as his password. You can have Apache check the password to ensure that it’s a (more or less) proper email address. Basically, it ensures that the password contains an @ character and at least one .character.

mod_auth_dbm

mod_auth_dbm uses Berkeley DB files instead of text for user authentication files.

mod_auth_digest

An extension of the basic mod_auth module, instead of sending the user information in plain text, mod_auth_digest is sent via the message digest 5 (MD5) authentication process. This authentication scheme is defined in RFC 2617, “HTTP Authentication: Basic and Digest Access Authentication.” Compared to using Basic Authentication, this is a much more secure way of sending user data over the Internet. Unfortunately, not all web browsers support this authentication scheme.

To create password files for use with mod_auth_dbm, you must use the htdigest utility. It has more or less the same functionality as the htpasswd utility. See the man page of htdigest for further information.

mod_autoindex

The mod_autoindex module dynamically creates a file list for directory indexing. The list is rendered in a user-friendly manner similar to those lists provided by FTP’s built-in ls command.

mod_cgi

mod_cgi allows execution of CGI programs on your server. CGI programs are executable files residing in the /var/www/cgi-bin directory and are used to dynamically generate data (usually HTML) for the remote browser when requested.

mod_dir and mod_env

The mod_dir module is used to determine which files are returned automatically when a user tries to access a directory. The default is index.html. If you have users who create web pages on Windows systems, you should also include index.htm, like this:

Click here to view code image

DirectoryIndex index.html index.htm

mod_env controls how environment variables are passed to CGI and SSI scripts.

mod_expires

mod_expires is used to add an expiration date to content on your site by adding an Expires header to the HTTP response. Web browsers or cache servers won’t cache expired content.

mod_headers

mod_headers is used to manipulate the HTTP headers of your server’s responses. You can replace, add, merge, or delete headers as you see fit. The module supplies a Header directive for this. Ordering of the Header directive is important. A set followed by an unset for the same HTTP header removes the header altogether. You can place Header directives almost anywhere within your configuration files. These directives are processed in the following order:

1. Core server

2. Virtual host

3. <Directory> and .htaccess files

<Location>

<Files>

mod_include

mod_include enables the use of server-side includes on your server, which were quite popular before PHP took over this part of the market.

mod_info and mod_log_config

mod_info provides comprehensive information about your server’s configuration. For example, it displays all the installed modules, as well as all the directives used in its configuration files.

mod_log_config defines how your log files should look. See the “Logging” section for further information about this subject.

mod_mime and mod_mime_magic

The mod_mime module tries to determine the MIME type of files from their extensions.

The mod_mime_magic module tries to determine the MIME type of files by examining portions of their content.

mod_negotiation

Using the mod_negotiation module, you can select one of several document versions that best suits the client’s capabilities. There are several options to select which criteria to use in the negotiation process. You can, for example, choose among different languages, graphics file formats, and compression methods.

mod_proxy

mod_proxy implements proxy and caching capabilities for an Apache server. It can proxy and cache FTP, CONNECT, HTTP/0.9, and HTTP/1.0 requests. This is not an ideal solution for sites that have a large number of users and therefore have high proxy and cache requirements. However, it is more than adequate for a small number of users.

mod_rewrite

mod_rewrite is the Swiss army knife of URL manipulation. It enables you to perform any imaginable manipulation of URLs using powerful regular expressions. It provides rewrites, redirection, proxying, and so on. There is little that you cannot accomplish using this module.


Tip

If you have Apache installed and running, see http://localhost/manual/misc/rewriteguide.html for a cookbook that gives you an in-depth explanation of the mod_rewrite module’s capabilities.


mod_setenvif

mod_setenvif allows manipulation of environment variables. Using small snippets of text-matching code known as regular expressions, you can conditionally change the content of environment variables. The order in which SetEnvIf directives appear in the configuration files is important. Each SetEnvIf directive can reset an earlier SetEnvIf directive when used on the same environment variable. Be sure to keep that in mind when using the directives from this module.

mod_speling

mod_speling is used to enable correction of minor typos in URLs. If no file matches the requested URL, this module builds a list of the files in the requested directory and extracts those files that are the closest matches. It tries to correct only one spelling mistake.

mod_status

You can use mod_status to create a web page containing a plethora of information about a running Apache server. The page contains information about the internal status as well as statistics about the running Apache processes. This can be a great aid when you’re trying to configure your server for maximum performance. It’s also a good indicator of when something’s amiss with your Apache server.

mod_ssl

mod_ssl provides Secure Sockets Layer (versions 2 and 3) and Transport Layer Security (version 1) support for Apache. At least 30 directives exist that deal with options for encryption and client authorization and that can be used with this module. This mod requires that you also install openssl and generate or buy a certificate. This is covered later in the chapter in the HTTPS section.

mod_unique_id

mod_unique_id generates a unique request identifier for every incoming request. This ID is put into the UNIQUE_ID environment variable.

mod_userdir

The mod_userdir module enables mapping of a subdirectory in each user’s /home directory into your web tree. The module provides several ways to accomplish this.

mod_usertrack

mod_usertrack is used to generate a cookie for each user session. This can be used to track the user’s click stream within your web tree. You must enable a custom log that logs this cookie into a log file.

mod_vhost_alias

mod_vhost_alias supports dynamically configured mass virtual hosting, which is useful for Internet service providers (ISPs) with many virtual hosts. However, for the average user, Apache’s ordinary virtual hosting support should be more than sufficient.

There are two ways to host virtual hosts on an Apache server. You can have one IP address with multiple CNAMEs, or you can have multiple IP addresses with one name per address. Apache has different sets of directives to handle each of these options. (You learn more about virtual hosting in Apache in the next section of this chapter.)

Again, the available options and features for Apache modules are too numerous to describe completely in this chapter. You can find complete information about the Apache modules in the online documentation for the server included with Ubuntu or at the Apache Project’s website.

Virtual Hosting

One of the more popular services to provide with a web server is to host a virtual domain. Also known as a virtual host, a virtual domain is a complete website with its own domain name, as if it was a standalone machine, but it’s hosted on the same machine as other websites. Apache implements this capability in a simple way with directives in the apache2.conf configuration file.

Apache now can dynamically host virtual servers by using the mod_vhost_alias module you read about in the preceding section of the chapter. The module is primarily intended for ISPs and similar large sites that host a large number of virtual sites. This module is for more advanced users and, as such, it is outside the scope of this introductory chapter. Instead, this section concentrates on the traditional ways of hosting virtual servers.

Address-Based Virtual Hosts

After you’ve configured your Linux machine with multiple IP addresses, setting up Apache to serve them as different websites is simple. You need only put a VirtualHost directive in your apache2.conf file for each of the addresses you want to make an independent website:

Click here to view code image

<VirtualHost 212.85.67.67>
ServerName gnulix.org
DocumentRoot /home/virtual/gnulix/public_html
TransferLog /home/virtual/gnulix/logs/access_log
ErrorLog /home/virtual/gnulix/logs/error_log
</VirtualHost>

Use the IP address, rather than the hostname, in the VirtualHost tag.

You can specify any configuration directives within the <VirtualHost> tags. For example, you might want to set AllowOverrides directives differently for virtual hosts than you do for your main server. Any directives that aren’t specified default to the settings for the main server.

Name-Based Virtual Hosts

Name-based virtual hosts enable you to run more than one host on the same IP address. You must add the names to your DNS as CNAMEs of the machine in question. When an HTTP client (web browser) requests a document from your server, it sends with the request a variable indicating the server name from which it’s requesting the document. Based on this variable, the server determines from which of the virtual hosts it should serve content.

Name-based virtual hosts require just one step more than IP address-based virtual hosts. You must first indicate which IP address has the multiple DNS names on it. This is done with the NameVirtualHost directive:

NameVirtualHost 212.85.67.67

You must then have a section for each name on that address, setting the configuration for that name. As with IP-based virtual hosts, you need to set only those configurations that must be different for the host. You must set the ServerName directive because it is the only thing that distinguishes one host from another:

Click here to view code image

<VirtualHost 212.85.67.67>
ServerName bugserver.gnulix.org
ServerAlias bugserver
DocumentRoot /home/bugserver/htdocs
ScriptAlias /home/bugserver/cgi-bin
TransferLog /home/bugserver/logs/access_log
</VirtualHost>

<VirtualHost 212.85.67.67>
ServerName pts.gnulix.org
ServerAlias pts
DocumentRoot /home/pts/htdocs
ScriptAlias /home/pts/cgi-bin
TransferLog /home/pts/logs/access_log
ErrorLog /home/pts/logs/error_log
</VirtualHost>


Tip

If you are hosting websites on an intranet or internal network, users will likely use the shortened name of the machine rather than the FQDN. For example, users might type http://bugserver/index.html in their browser location field rather than http://bugserver.gnulix.org/index.html. In that case, Apache would not recognize that those two addresses should go to the same virtual host. You could get around this by setting up VirtualHostdirectives for both bugserver and bugserver.gnulix.org, but the easy way around it is to use the ServerAlias directive, which lists all valid aliases for the machine:

ServerAlias bugserver

For more information about VirtualHost, refer to the help system on http://localhost/_manual.


Logging

Apache provides logging for just about any web access information you might be interested in. Logging can help with the following:

Image System resource management, by tracking usage

Image Intrusion detection, by documenting bad HTTP requests

Image Diagnostics, by recording errors in processing requests

Two standard log files are generated when you run your Apache server: access_log and error_log. They are found under the /var/log/apache2 directory. (Others include the SSL logs ssl_access_log, ssl_error_log and ssl_request_log.) All logs except for the error_log (by default, this is just the access_log) are generated in a format specified by the CustomLog and LogFormat directives. These directives appear in your apache2.conf file.

A new log format can be defined with the LogFormat directive:

Click here to view code image

LogFormat "%h %l %u %t \"%r\" %>s %b" common

The common log format is a good starting place for creating your own custom log formats. Note that most of the available log analysis tools assume you’re using the common log format or the combined log format, both of which are defined in the default configuration files.

The following variables are available for LogFormat statements:

Image %a—Remote IP address.

Image %A—Local IP address.

Image %b—Bytes sent, excluding HTTP headers. This is shown in Apache’s Combined Log Format (CLF). For a request without any data content, a - is shown instead of 0.

Image %B—Bytes sent, excluding HTTP headers.

Image %{VARIABLE}e—The contents of the environment variable VARIABLE.

Image %f—The filename of the output log.

Image %h—Remote host.

Image %H—Request protocol.

Image %{HEADER}i—The contents of HEADER; header lines in the request sent to the server.

Image %l—Remote log name (from identd, if supplied).

Image %m—Request method.

Image %{NOTE}n—The contents of note NOTE from another module.

Image %{HEADER}o—The contents of HEADER; header lines in the reply.

Image %p—The canonical port of the server serving the request.

Image %P—The PID of the child that serviced the request.

Image %q—The contents of the query string, prepended with a ? character. If there’s no query string, this evaluates to an empty string.

Image %r—The first line of request.

Image %s—Status. For requests that were internally redirected, this is the status of the original request (%>s for the last).

Image %t—The time, in common log time format.

Image %{format}t—The time, in the form given by format.

Image %T—The seconds taken to serve the request.

Image %u—Remote user from auth; this might be bogus if the return status (%s) is 401.

Image %U—The URL path requested.

Image %V—The server name according to the UseCanonicalName directive.

Image %v—The canonical ServerName of the server serving the request.

You can put a conditional in front of each variable to determine whether the variable is displayed. If the variable isn’t displayed, - is displayed instead. These conditionals are in the form of a list of numeric return values. For example, %!401u displays the value of REMOTE_USER unless the return code is 401.

You can then specify the location and format of a log file using the CustomLog directive:

CustomLog logs/access_log common

If it is not specified as an absolute path, the location of the log file is assumed to be relative to the ServerRoot.

HTTPS

The mod_ssl module listed above gives Apache2 the ability to encrypt communications using openssl. What this means is that your website can be accessed using https:// instead of just http:// and all communications to and from the site will be encrypted. The module is included in the main apache2-common package, so if you installed that from the Ubuntu repositories when you installed Apache, you don’t have to install additional apachepackages.

Enter this to enable the module:

Click here to view code image

matthew@seymour:~$ sudo a2enmod ssl

This includes a default HTTPS configuration file, found in /etc/apache2/sites-available/default-ssl. For HTTPS to work, a certificate and a key are required. The default configuration includes a certificate and key generated by the ssl-cert package, and they are adequate for testing. However, for real use you should either generate a self-signed certificate and key (adequate for internal use or for personal sites) or buy a certificate from a certified CA authority (necessary if you want anyone to trust your site for commercial ventures).

To configure Apache2 for HTTPS using the default configuration for testing, use this command:

Click here to view code image

matthew@seymour:~$ sudo a2enmsite default-ssl

And restart Apache2.

You can now access web pages on your server using https://. This is adequate for testing, but not for anything else.

Next we will create a self-signed certificate and key, which is a step in the right direction.

To generate a key for the certificate, use this command:

Click here to view code image

matthew@seymour:~$ openssl genrsa -des3 -out server.key 2048

This will generate a basic key using Triple-DES and 2048-bit encryption. See the man page for openssl for more information about possible settings.

To generate a Certificate Signing Request (CSR), use this command:

Click here to view code image

matthew@seymour:~$ openssl req -new -key server.key -out server.csr

You will be asked for some information to complete the request.

To generate a self-signed certificate, use this command:

Click here to view code image

matthew@seymour:~$ openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

This will create a certificate that is valid for 365 days. Certificates, even from vendors, have expiration dates. Certificates should be renewed regularly to make your site visitors feel comfortable that they are dealing with who they think they are dealing with.

To copy the certificate to its proper location, use this command:

Click here to view code image

matthew@seymour:~$ cp server.crt /etc/ssl/certs/

To copy the key to its proper location, use this command:

Click here to view code image

matthew@seymour:~$ cp server.key /etc/ssl/private/

Next we will edit the default-ssl file, /etc/apache2/sites-available/default-ssl, to change the values of these lines to what I show here. This tells Apache2 to use SSL and where to find the proper certificate and key files.

Click here to view code image

SSLEngine on
SSLCertificateFile /etc/ssl/certs/server.crt
SSLCertificateKeyFile /etc/ssl/private/server.key

To configure Apache2 for HTTPS using the edited default configuration with the self-signed certificate and key file, use this command:

Click here to view code image

matthew@seymour:~$ sudo a2enmsite default-ssl

And restart Apache2.

During the restart you will be asked to input the certificate’s key password. Enter it when requested. You now have a server that is secure and good for internal use, but not for a customer-facing production environment.

The best thing to do, if you are going to host a professional site, is to use a certified CA authority. Every CA has their preferred method, so you should read their requirements before you start. The basic process is usually like this:

1. Create a private and public encryption key pair.

2. Create a certificate based on the public key.

3. Create a certificate request with information about your server and the company hosting it.

4. Send your certificate request and public key along with proof of your company’s identity and payment to the CA.

5. They verify the request and your identity and send back a certificate like the self-signed one we created, but signed by the CA.

6. You install that certificate on your server and configure Apache2 to use it.

The CA-signed certificate provides advantages. First, browsers are built with data about most CA authorities and will automatically recognize a signature from one of them on your certificate most of the time. A self-signed certificate will cause the browser to display a rather scary looking (to a non-technical person) warning and require the user to bypass it before viewing your site. In addition, when the CA issues the signed certificate they are guaranteeing the identity of the organization providing the web pages.

To learn more about certificates and keys, including installation of keys and certificates you pay for, see http://tldp.org/HOWTO/SSL-Certificates-HOWTO/index.html.

References

Image http://news.netcraft.com/archives/web_server_survey.html—A statistical graph of web server usage by millions of servers. The research points out that Apache is by far the most widely used server for Internet sites.

Image https://httpd.apache.org/—The Apache HTTP Server Project website where you can find extensive documentation and information about Apache.