Proxy Servers: Squid - Network Support - Ubuntu 15.04 Server with systemd: Administration and Reference (2015)

Ubuntu 15.04 Server with systemd: Administration and Reference (2015)

Part IV. Network Support

Chapter 14. Proxy Servers: Squid

A proxy server operates as an intermediary between a local network and services available on a larger one, such as the Internet. Requests from local clients for web services can be handled by the proxy server, speeding transactions as well as controlling access. Proxy servers maintain current copies of commonly accessed web pages, speeding web access times by eliminating the need to access the original site constantly. They also perform security functions, protecting servers from unauthorized access.

Protocol

Description and Port

HTTP

Web pages, port 3128

FTP

FTP transfers through websites, port 3128

ICP

Internet Caching Protocol, port 3130

HTCP

Hypertext Caching Protocol, port 4827

CARP

Cache Array Routing Protocol

SNMP

Simple Network Management Protocol, port 3401

SSL

Secure Socket Layer

Table 14-1: Protocols Supported by Squid

Squid is a free, open source, proxy-caching server for web clients, designed to speed Internet access and provide security controls for web servers. It implements a proxy-caching service for web clients that caches web pages as users make requests. Copies of web pages accessed by users are kept in the Squid cache, and as requests are made, Squid checks to see if it has a current copy. If Squid does have a current copy, it returns the copy from its cache instead of querying the original site. If it does not have a current copy, it will retrieve one from the original site. Replacement algorithms periodically replace old objects in the cache. In this way, web browsers can then use the local Squid cache as a proxy HTTP server. Squid currently handles web pages supporting the HTTP, FTP, and SSL protocols (Squid cannot be used with FTP clients), each with an associated default port (see Table 14-1 ). It also supports ICP (Internet Cache Protocol), HTCP (Hypertext Caching Protocol) for web caching, and SNMP (Simple Network Management Protocol) for providing status information.

You can find out more about Squid at http://squid-cache.org. For detailed information, check the Squid FAQ and the user manual located at their website. The FAQ is also installed in your /usr/share/doc under the squid directory.

As a proxy, Squid does more than just cache web objects. It operates as an intermediary between the web browsers (clients) and the servers they access. Instead of connections being made directly to the server, a client connects to the proxy server. The proxy then relays requests to the web server. This is useful for situations where a web server is placed behind a firewall server, protecting it from outside access. The proxy is accessible on the firewall, which can then transfer requests and responses back and forth between the client and the web server. The design is often used to allow web servers to operate on protected local networks and still be accessible on the Internet. You can also use a Squid proxy to provide web access to the Internet by local hosts. Instead of using a gateway providing complete access to the Internet, local hosts can use a proxy to allow them just web access. You can also combine the two, allowing gateway access, but using the proxy server to provide more control for web access. In addition, the caching capabilities of Squid can provide local hosts with faster web access.

Technically, you could use a proxy server to simply manage traffic between a web server and the clients who want to communicate with it, without doing caching at all. Squid combines both capabilities as a proxy-caching server.

Squid also provides security capabilities that let you exercise control over hosts accessing your web server. You can deny access by certain hosts and allow access by others. Squid also supports the use of encrypted protocols such as SSL. Encrypted communications are tunneled (passed through without reading) through the Squid server directly to the web server.

Squid is supported and distributed under a GNU Public License by the National Laboratory for Applied Network Research (NLANR) at the University of California, San Diego. The work is based on the Harvest Project to create a web indexing system that includes a high-performance cache daemon called cached. You can obtain current source code versions and online documentation from the Squid home page at http://squid-cache.org. The Squid software package (squid) consists of the Squid server and several support scripts for services like LDAP and HTTP. You can also install the cache manager script called cachemgr.cgi, the squid-cgi package. The cachemgr.cgi script lets you view statistics for the Squid server as it runs. Squid version 2.7 is available on the main Ubuntu repository. You can also install the Squid 3 version (Universe repository), but updates are not supported by Canonical.

sudo apt-get install squid3

Check the Ubuntu Server Guide | Web Servers | Squid - Proxy Server for basic configuration.

https://help.ubuntu.com/stable/serverguide/squid.html

Also check the Ubuntu Community Documentation on Squid at:

https://help.ubuntu.com/community/Squid

The Squid server is managed by systemd using the squid3.service file, shown here. It is started for the multi-user.target (runlevels 2, 3, 4, and 5) (WantedBy). The /usr/sbin/squid3 application is used to start, stop, and restart the server (ExecStart, ExecReload, ExecStop).

squid3.service

# Automatically generated by systemd-sysv-generator

[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/init.d/squid3
Description=LSB: Squid HTTP Proxy version 3.x
Before=runlevel2.target runlevel3.target runlevel4.target runlevel5.target shutdown.target
After=network-online.target remote-fs.target systemd-journald-dev-log.socket nss-lookup.target
Wants=network-online.target
Conflicts=shutdown.target

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
ExecStart=/etc/init.d/squid3 start
ExecStop=/etc/init.d/squid3 stop
ExecReload=/etc/init.d/squid3 reload

You can use the service command to manually stop, start, and restart the server.

service squid stop

Configuring Client Browsers

Squid supports both standard proxy caches and transparent caches. With a standard proxy cache, users will need to configure their browsers to specifically access the Squid server. A transparent cache, on the other hand, requires no browser configuration by users. The cache is transparent, allowing access as if it were a normal website. Transparent caches are implemented by IPtables, using net filtering to intercept requests and direct them to the proxy cache.

With a standard proxy cache, users need to specify their proxy server in their web browser configuration. For this, they will need the IP address of the host running the Squid proxy server as well as the port it is using. Proxies usually make use of port 3128. To configure use of a proxy server running on the private network, you enter the following. The proxy server is running on turtle.mytrek.com (192.168.0.1) and using port 3128.

192.168.0.1 3128

On Firefox, Mozilla, and Netscape, the user on the sample local network first selects the Proxy panel located in Preferences under the Edit menu. Then, in the Manual proxy configuration’s View panel, you enter the previous information. The user will see entries for FTP, HTTP, and security proxies. For standard web access, enter the IP address in the FTP and web boxes. For their port boxes, enter 3128.

For GNOME, select Network Proxy tab in the System Settings Network dialog, and for Konqueror on the KDE Desktop, select the Proxies panel on the Preferences | Web Browsing menu window. Here, you can enter the proxy server address and port numbers.

On Linux and UNIX systems, local hosts can set the http_proxy and ftp_proxy shell variables to configure access by Linux-supported web browsers such as Lynx. You can place these definitions in your .profile or /etc/profile file to have them automatically defined whenever you log in.

http_proxy=192.168.0.1:3128
ftp proxy=192.168.0.1:3128
export http_proxy ftp_proxy

Alternatively, you can use the proxy’s URL.

http_proxy=http://turtle.mytrek.com:3128

For the Elinks browser, you can specify a proxy in its configuration file, /etc/elinks.conf. Set both FTP and web proxy host options, as in:

protocol.http.proxy.host turtle.mytrek.com:3128
protocol.ftp.proxy.host turtle.mytrek.com:3128

Before a client on a local host can use the proxy server, access permission has to be given to it in the server’s squid.conf file, described in the later section “Security.” Access can easily be provided to an entire network. For the sample network used here, you would have to place the following entries in the squid.conf file. These are explained in detail in the following sections.

acl mylan src 192.168.0.0/255.255.255.0
http_access allow mylan

Tip: Web clients that need to access your Squid server as a standard proxy cache will need to know the server’s address and the port for Squid’s HTTP services, by default 3128.

The squid.conf File

The Squid configuration file is squid.conf, located in the /etc/squid3 directory. In the /etc/squid3/squid.conf file, you set general options such as ports used, security options controlling access to the server, and cache options for configuring caching operations. The default version ofsquid.conf provided with Squid software includes detailed explanations of all standard entries, along with commented default entries. Entries consist of tags that specify different attributes. For example, maximum_object_size sets limits on objects transferred.

maximum_object_size 4 MB

As a proxy, Squid will use certain ports for specific services, such as port 3128 for HTTP services like web browsers. Default port numbers are already set for Squid. Should you need to use other ports, you can set them in the /etc/squid3/squid.conf file. The following entry shows how you set the web browser port:

http_port 3128

Note: Squid uses the Simple Network Management Protocol (SNMP) to provide status information and statistics to SNMP agents managing your network. You can control SNMP with the snmp access and port configurations in the squid.conf file.

Proxy Security

Squid can use its role as an intermediary between web clients and a web server to implement access controls, determining who can access the web server and how. Squid does this by checking access control lists (ACLs) of hosts and domains that have had controls placed on them. When it finds a web client from one of those hosts attempting to connect to the web server, it executes the control. Squid supports a number of controls with which it can deny or allow access to the web server by the remote host’s web client (see Table 14-2 ). In effect, Squid sets up a firewall just for the web server.

The first step in configuring Squid security is to create ACLs. These are lists of hosts and domains for which you want to set up controls. You define ACLs using the acl command, creating a label for the systems on which you are setting controls. You then use commands such ashttp_access to define these controls. You can define a system, or a group of systems, by use of several acl options, such as the source IP address, the domain name, or even the time and date. For example, the src option is used to define a system or group of systems with a certain source address. To define a mylanacl entry for systems in a local network with the addresses 192.168.0.0 through 192.168.0.255, use the following ACL definition:

acl mylan src 192.168.0.0/255.255.255.0

Options

Description

src ip-address/netmask

Client’s IP address

src addr1-addr2/netmask

Range of addresses

dst ip-address/netmask

Destination IP address

myip ip-address/netmask

Local socket IP address

srcdomain domain

Reverse lookup, client IP

dstdomain domain

Destination server from URL; for dstdomain and dstdom_regex, a reverse lookup is tried if an IP-based URL is used

srcdom_regex [-i] expression

Regular expression matching client name

dstdom_regex [-i] expression

Regular expression matching destination

time [day-abbrevs] [h1:m1-h2:m2]

Time as specified by day, hour, and minutes. Day abbreviations: S = Sunday, M = Monday, T = Tuesday, W = Wednesday, H = Thursday, F = Friday, A = Saturday

url_regex [-i] expression

Regular expression matching on whole URL

urlpath_regex [-i] expression

Regular expression matching on URL path

port ports

A specific port or range of ports

proto protocol

A specific protocol, such as HTTP or FTP

method method

Specific methods, such as GET and POST

browser [-i] regexp

Pattern match on user-agent header

ident username

String match on ident output

src_as number

Used for routing of requests to specific caches

dst_as number

Used for routing of requests to specific caches

proxy_auth username

List of valid usernames

snmp_community string

A community string to limit access to your SNMP agent

Table 14-2: Squid ACL Options

Once it is defined, you can use an ACL definition in a Squid option to specify a control you want to place on those systems. For example, to allow access by the mylan group of local systems to the web through the proxy, use an http_access option with the allow action specifying mylan as theacl definition to use, as shown here:

http_access allow mylan

The default squid.conf file provides entries for a recommended minimum configuration, beginning with entries for controlling access to your local net and server ports. Local net entries are listed for different local addresses (see Chapter 18).

acl localnet src 192.168.0.0.0/16 # RFC1918 possible internal network

Access is supported on the SSL port (443, 591, 873) and server ports such as 80 for the Web server and 21 for the FTP server are designated as safe.

acl SSL_ports port 443 # https
acl SSL_ports port 591 # filemaker
acl SSL_ports port 873 # rsync
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp

Default http_access entries deny access to outside users, and allow access by hosts on the local network and the local host (Squid server host). Access is also denied on ports not deemed safe or without SSL security. The http_access entries already defined in the squid.conf file are shown here.

http_access allow localhost manager
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports

http_access allow localhost

http_access deny all

By defining ACLs and using them in Squid options, you can tailor your website with the kind of security you want. You should add your own ACLs after the comment label located near the middle of the file after the http_access entries for safe ports, and before the http_access entries for the localnet and local host.

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#

The following example allows access to the web through the proxy by only the mylan group of local systems, denying access to all others. Two acl entries are set up: one for the local system and one for all others; http_access options first allow access to the local system and then deny access to all others.

acl mylan src 192.168.0.0/255.255.255.0
acl all src 0.0.0.0/0.0.0.0
http_access allow mylan
http_access deny all

Basic default entries that you will find in your squid.conf file, along with an entry for the mylan sample network, are shown here.

acl manager proto cache_object
acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.1/0.0.0.0/32 ::1
acl mylan src 192.168.0.0/255.255.255.0
acl SSL_ports port 443 563 873

The order of the http_access options is important. Squid starts from the first and works its way down, stopping at the first http_access option with an ACL entry that matches. In the preceding example, local systems that match the first http_access command are allowed, whereas others fall through to the second http_access command and are denied.

For systems using the proxy, you can also control what sites they can access. For a destination address, you create an acl entry with the dst qualifier. The dst qualifier takes as its argument the site address. Then you can create an http_access option to control access to that address. The following example denies access by anyone using the proxy to the destination site rabbit.mytrek.com. If you have a local network accessing the web through the proxy, you can use such commands to restrict access to certain sites.

acl myrabbit dst rabbit.mytrek.com
http_access deny myrabbit

Proxy Caches

Squid primarily uses the Internet Cache Protocol (ICP) to communicate with other web caches. It also provides support for the more experimental Hypertext Cache Protocol (HTCP) and the Cache Array Routing Protocol (CARP).

Using the ICP protocols, your Squid cache can connect to other Squid caches or other cache servers, such as Microsoft proxy server, Netscape proxy server, and Novell BorderManager. This way, if your network’s Squid cache does not have a copy of a requested Web page, it can contact another cache to see if it is there instead of accessing the original site. You can configure Squid to connect to other Squid caches by connecting it to a cache hierarchy. Squid supports a hierarchy of caches denoted by the terms child, sibling, and parent. Sibling and child caches are accessible on the same level and are automatically queried whenever a request cannot be located in your own Squid’s cache. If these queries fail, a parent cache is queried, which then searches its own child and sibling caches—or its own parent cache, if needed—and so on.

You can set up a cache hierarchy to connect to the main NLANR server by registering your cache using the following entries in your squid.conf file:

anounce_period 1 day
announce_host tracker.ircache.net
announce_port 3131

Use cache_peer to set up parent, sibling, and child connections to other caches. This option has five fields. The first two consist of the hostname or IP address of the queried cache and the cache type (parent, child, or sibling). The third and fourth are the HTTP and the ICP ports of that cache, usually 3128 and 3130. The last is used for cache_peer options such as proxy-only to not save fetched objects locally, no-query for those caches that do not support ICP, and weight, which assigns priority to a parent cache. The following example sets up a connection to a parent cache:

cache_peer sd.cache.nlanr.net parent 3128 3130

Squid provides several options for configuring cache memory. The cache_mem option sets the memory allocated primarily for objects currently in use (objects in transit). If available, the space can also be used for frequently accessed objects (hot objects) and failed requests (negative-cache objects). The default is 8MB. The following example sets it to 256MB:

cache_mem 256 MB

You can use the cache manager (cachemgr.cgi) to manage the cache and view statistics on the cache manager as it runs. To run the cache manager, use your browser to execute the cachemgr.cgi script (this script should be placed in your web server’s cgi-bin directory).

Logs

Squid keeps several logs detailing access, cache performance, and error messages. The log files are located in the /var/log/squid3 directory.

access.log holds requests sent to your proxy.

cache.log holds Squid server messages such as errors and startup messages.

store.log holds information about the Squid cache such as objects added or removed.