Ubuntu 15.04 Server with systemd: Administration and Reference (2015)
Part IV. Network Support
Chapter 14. Proxy Servers: Squid
A proxy server operates as an intermediary between a local network and services available on a larger one, such as the Internet. Requests from local clients for web services can be handled by the proxy server, speeding transactions as well as controlling access. Proxy servers maintain current copies of commonly accessed web pages, speeding web access times by eliminating the need to access the original site constantly. They also perform security functions, protecting servers from unauthorized access.
Protocol |
Description and Port |
HTTP |
Web pages, port 3128 |
FTP |
FTP transfers through websites, port 3128 |
ICP |
Internet Caching Protocol, port 3130 |
HTCP |
Hypertext Caching Protocol, port 4827 |
CARP |
Cache Array Routing Protocol |
SNMP |
Simple Network Management Protocol, port 3401 |
SSL |
Secure Socket Layer |
Table 14-1: Protocols Supported by Squid
Squid is a free, open source, proxy-caching server for web clients, designed to speed Internet access and provide security controls for web servers. It implements a proxy-caching service for web clients that caches web pages as users make requests. Copies of web pages accessed by users are kept in the Squid cache, and as requests are made, Squid checks to see if it has a current copy. If Squid does have a current copy, it returns the copy from its cache instead of querying the original site. If it does not have a current copy, it will retrieve one from the original site. Replacement algorithms periodically replace old objects in the cache. In this way, web browsers can then use the local Squid cache as a proxy HTTP server. Squid currently handles web pages supporting the HTTP, FTP, and SSL protocols (Squid cannot be used with FTP clients), each with an associated default port (see Table 14-1 ). It also supports ICP (Internet Cache Protocol), HTCP (Hypertext Caching Protocol) for web caching, and SNMP (Simple Network Management Protocol) for providing status information.
You can find out more about Squid at http://squid-cache.org. For detailed information, check the Squid FAQ and the user manual located at their website. The FAQ is also installed in your /usr/share/doc under the squid directory.
As a proxy, Squid does more than just cache web objects. It operates as an intermediary between the web browsers (clients) and the servers they access. Instead of connections being made directly to the server, a client connects to the proxy server. The proxy then relays requests to the web server. This is useful for situations where a web server is placed behind a firewall server, protecting it from outside access. The proxy is accessible on the firewall, which can then transfer requests and responses back and forth between the client and the web server. The design is often used to allow web servers to operate on protected local networks and still be accessible on the Internet. You can also use a Squid proxy to provide web access to the Internet by local hosts. Instead of using a gateway providing complete access to the Internet, local hosts can use a proxy to allow them just web access. You can also combine the two, allowing gateway access, but using the proxy server to provide more control for web access. In addition, the caching capabilities of Squid can provide local hosts with faster web access.
Technically, you could use a proxy server to simply manage traffic between a web server and the clients who want to communicate with it, without doing caching at all. Squid combines both capabilities as a proxy-caching server.
Squid also provides security capabilities that let you exercise control over hosts accessing your web server. You can deny access by certain hosts and allow access by others. Squid also supports the use of encrypted protocols such as SSL. Encrypted communications are tunneled (passed through without reading) through the Squid server directly to the web server.
Squid is supported and distributed under a GNU Public License by the National Laboratory for Applied Network Research (NLANR) at the University of California, San Diego. The work is based on the Harvest Project to create a web indexing system that includes a high-performance cache daemon called cached. You can obtain current source code versions and online documentation from the Squid home page at http://squid-cache.org. The Squid software package (squid) consists of the Squid server and several support scripts for services like LDAP and HTTP. You can also install the cache manager script called cachemgr.cgi, the squid-cgi package. The cachemgr.cgi script lets you view statistics for the Squid server as it runs. Squid version 2.7 is available on the main Ubuntu repository. You can also install the Squid 3 version (Universe repository), but updates are not supported by Canonical.
sudo apt-get install squid3
Check the Ubuntu Server Guide | Web Servers | Squid - Proxy Server for basic configuration.
https://help.ubuntu.com/stable/serverguide/squid.html
Also check the Ubuntu Community Documentation on Squid at:
https://help.ubuntu.com/community/Squid
The Squid server is managed by systemd using the squid3.service file, shown here. It is started for the multi-user.target (runlevels 2, 3, 4, and 5) (WantedBy). The /usr/sbin/squid3 application is used to start, stop, and restart the server (ExecStart, ExecReload, ExecStop).
squid3.service
# Automatically generated by systemd-sysv-generator
[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/init.d/squid3
Description=LSB: Squid HTTP Proxy version 3.x
Before=runlevel2.target runlevel3.target runlevel4.target runlevel5.target shutdown.target
After=network-online.target remote-fs.target systemd-journald-dev-log.socket nss-lookup.target
Wants=network-online.target
Conflicts=shutdown.target
[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
ExecStart=/etc/init.d/squid3 start
ExecStop=/etc/init.d/squid3 stop
ExecReload=/etc/init.d/squid3 reload
You can use the service command to manually stop, start, and restart the server.
service squid stop
Configuring Client Browsers
Squid supports both standard proxy caches and transparent caches. With a standard proxy cache, users will need to configure their browsers to specifically access the Squid server. A transparent cache, on the other hand, requires no browser configuration by users. The cache is transparent, allowing access as if it were a normal website. Transparent caches are implemented by IPtables, using net filtering to intercept requests and direct them to the proxy cache.
With a standard proxy cache, users need to specify their proxy server in their web browser configuration. For this, they will need the IP address of the host running the Squid proxy server as well as the port it is using. Proxies usually make use of port 3128. To configure use of a proxy server running on the private network, you enter the following. The proxy server is running on turtle.mytrek.com (192.168.0.1) and using port 3128.
192.168.0.1 3128
On Firefox, Mozilla, and Netscape, the user on the sample local network first selects the Proxy panel located in Preferences under the Edit menu. Then, in the Manual proxy configuration’s View panel, you enter the previous information. The user will see entries for FTP, HTTP, and security proxies. For standard web access, enter the IP address in the FTP and web boxes. For their port boxes, enter 3128.
For GNOME, select Network Proxy tab in the System Settings Network dialog, and for Konqueror on the KDE Desktop, select the Proxies panel on the Preferences | Web Browsing menu window. Here, you can enter the proxy server address and port numbers.
On Linux and UNIX systems, local hosts can set the http_proxy and ftp_proxy shell variables to configure access by Linux-supported web browsers such as Lynx. You can place these definitions in your .profile or /etc/profile file to have them automatically defined whenever you log in.
http_proxy=192.168.0.1:3128
ftp proxy=192.168.0.1:3128
export http_proxy ftp_proxy
Alternatively, you can use the proxy’s URL.
http_proxy=http://turtle.mytrek.com:3128
For the Elinks browser, you can specify a proxy in its configuration file, /etc/elinks.conf. Set both FTP and web proxy host options, as in:
protocol.http.proxy.host turtle.mytrek.com:3128
protocol.ftp.proxy.host turtle.mytrek.com:3128
Before a client on a local host can use the proxy server, access permission has to be given to it in the server’s squid.conf file, described in the later section “Security.” Access can easily be provided to an entire network. For the sample network used here, you would have to place the following entries in the squid.conf file. These are explained in detail in the following sections.
acl mylan src 192.168.0.0/255.255.255.0
http_access allow mylan
Tip: Web clients that need to access your Squid server as a standard proxy cache will need to know the server’s address and the port for Squid’s HTTP services, by default 3128.
The squid.conf File
The Squid configuration file is squid.conf, located in the /etc/squid3 directory. In the /etc/squid3/squid.conf file, you set general options such as ports used, security options controlling access to the server, and cache options for configuring caching operations. The default version ofsquid.conf provided with Squid software includes detailed explanations of all standard entries, along with commented default entries. Entries consist of tags that specify different attributes. For example, maximum_object_size sets limits on objects transferred.
maximum_object_size 4 MB
As a proxy, Squid will use certain ports for specific services, such as port 3128 for HTTP services like web browsers. Default port numbers are already set for Squid. Should you need to use other ports, you can set them in the /etc/squid3/squid.conf file. The following entry shows how you set the web browser port:
http_port 3128
Note: Squid uses the Simple Network Management Protocol (SNMP) to provide status information and statistics to SNMP agents managing your network. You can control SNMP with the snmp access and port configurations in the squid.conf file.
Proxy Security
Squid can use its role as an intermediary between web clients and a web server to implement access controls, determining who can access the web server and how. Squid does this by checking access control lists (ACLs) of hosts and domains that have had controls placed on them. When it finds a web client from one of those hosts attempting to connect to the web server, it executes the control. Squid supports a number of controls with which it can deny or allow access to the web server by the remote host’s web client (see Table 14-2 ). In effect, Squid sets up a firewall just for the web server.
The first step in configuring Squid security is to create ACLs. These are lists of hosts and domains for which you want to set up controls. You define ACLs using the acl command, creating a label for the systems on which you are setting controls. You then use commands such ashttp_access to define these controls. You can define a system, or a group of systems, by use of several acl options, such as the source IP address, the domain name, or even the time and date. For example, the src option is used to define a system or group of systems with a certain source address. To define a mylanacl entry for systems in a local network with the addresses 192.168.0.0 through 192.168.0.255, use the following ACL definition:
acl mylan src 192.168.0.0/255.255.255.0
Options |
Description |
src ip-address/netmask |
Client’s IP address |
src addr1-addr2/netmask |
Range of addresses |
dst ip-address/netmask |
Destination IP address |
myip ip-address/netmask |
Local socket IP address |
srcdomain domain |
Reverse lookup, client IP |
dstdomain domain |
Destination server from URL; for dstdomain and dstdom_regex, a reverse lookup is tried if an IP-based URL is used |
srcdom_regex [-i] expression |
Regular expression matching client name |
dstdom_regex [-i] expression |
Regular expression matching destination |
time [day-abbrevs] [h1:m1-h2:m2] |
Time as specified by day, hour, and minutes. Day abbreviations: S = Sunday, M = Monday, T = Tuesday, W = Wednesday, H = Thursday, F = Friday, A = Saturday |
url_regex [-i] expression |
Regular expression matching on whole URL |
urlpath_regex [-i] expression |
Regular expression matching on URL path |
port ports |
A specific port or range of ports |
proto protocol |
A specific protocol, such as HTTP or FTP |
method method |
Specific methods, such as GET and POST |
browser [-i] regexp |
Pattern match on user-agent header |
ident username |
String match on ident output |
src_as number |
Used for routing of requests to specific caches |
dst_as number |
Used for routing of requests to specific caches |
proxy_auth username |
List of valid usernames |
snmp_community string |
A community string to limit access to your SNMP agent |
Table 14-2: Squid ACL Options
Once it is defined, you can use an ACL definition in a Squid option to specify a control you want to place on those systems. For example, to allow access by the mylan group of local systems to the web through the proxy, use an http_access option with the allow action specifying mylan as theacl definition to use, as shown here:
http_access allow mylan
The default squid.conf file provides entries for a recommended minimum configuration, beginning with entries for controlling access to your local net and server ports. Local net entries are listed for different local addresses (see Chapter 18).
acl localnet src 192.168.0.0.0/16 # RFC1918 possible internal network
Access is supported on the SSL port (443, 591, 873) and server ports such as 80 for the Web server and 21 for the FTP server are designated as safe.
acl SSL_ports port 443 # https
acl SSL_ports port 591 # filemaker
acl SSL_ports port 873 # rsync
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
Default http_access entries deny access to outside users, and allow access by hosts on the local network and the local host (Squid server host). Access is also denied on ports not deemed safe or without SSL security. The http_access entries already defined in the squid.conf file are shown here.
http_access allow localhost manager
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access deny all
By defining ACLs and using them in Squid options, you can tailor your website with the kind of security you want. You should add your own ACLs after the comment label located near the middle of the file after the http_access entries for safe ports, and before the http_access entries for the localnet and local host.
#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
The following example allows access to the web through the proxy by only the mylan group of local systems, denying access to all others. Two acl entries are set up: one for the local system and one for all others; http_access options first allow access to the local system and then deny access to all others.
acl mylan src 192.168.0.0/255.255.255.0
acl all src 0.0.0.0/0.0.0.0
http_access allow mylan
http_access deny all
Basic default entries that you will find in your squid.conf file, along with an entry for the mylan sample network, are shown here.
acl manager proto cache_object
acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.1/0.0.0.0/32 ::1
acl mylan src 192.168.0.0/255.255.255.0
acl SSL_ports port 443 563 873
The order of the http_access options is important. Squid starts from the first and works its way down, stopping at the first http_access option with an ACL entry that matches. In the preceding example, local systems that match the first http_access command are allowed, whereas others fall through to the second http_access command and are denied.
For systems using the proxy, you can also control what sites they can access. For a destination address, you create an acl entry with the dst qualifier. The dst qualifier takes as its argument the site address. Then you can create an http_access option to control access to that address. The following example denies access by anyone using the proxy to the destination site rabbit.mytrek.com. If you have a local network accessing the web through the proxy, you can use such commands to restrict access to certain sites.
acl myrabbit dst rabbit.mytrek.com
http_access deny myrabbit
Proxy Caches
Squid primarily uses the Internet Cache Protocol (ICP) to communicate with other web caches. It also provides support for the more experimental Hypertext Cache Protocol (HTCP) and the Cache Array Routing Protocol (CARP).
Using the ICP protocols, your Squid cache can connect to other Squid caches or other cache servers, such as Microsoft proxy server, Netscape proxy server, and Novell BorderManager. This way, if your network’s Squid cache does not have a copy of a requested Web page, it can contact another cache to see if it is there instead of accessing the original site. You can configure Squid to connect to other Squid caches by connecting it to a cache hierarchy. Squid supports a hierarchy of caches denoted by the terms child, sibling, and parent. Sibling and child caches are accessible on the same level and are automatically queried whenever a request cannot be located in your own Squid’s cache. If these queries fail, a parent cache is queried, which then searches its own child and sibling caches—or its own parent cache, if needed—and so on.
You can set up a cache hierarchy to connect to the main NLANR server by registering your cache using the following entries in your squid.conf file:
anounce_period 1 day
announce_host tracker.ircache.net
announce_port 3131
Use cache_peer to set up parent, sibling, and child connections to other caches. This option has five fields. The first two consist of the hostname or IP address of the queried cache and the cache type (parent, child, or sibling). The third and fourth are the HTTP and the ICP ports of that cache, usually 3128 and 3130. The last is used for cache_peer options such as proxy-only to not save fetched objects locally, no-query for those caches that do not support ICP, and weight, which assigns priority to a parent cache. The following example sets up a connection to a parent cache:
cache_peer sd.cache.nlanr.net parent 3128 3130
Squid provides several options for configuring cache memory. The cache_mem option sets the memory allocated primarily for objects currently in use (objects in transit). If available, the space can also be used for frequently accessed objects (hot objects) and failed requests (negative-cache objects). The default is 8MB. The following example sets it to 256MB:
cache_mem 256 MB
You can use the cache manager (cachemgr.cgi) to manage the cache and view statistics on the cache manager as it runs. To run the cache manager, use your browser to execute the cachemgr.cgi script (this script should be placed in your web server’s cgi-bin directory).
Logs
Squid keeps several logs detailing access, cache performance, and error messages. The log files are located in the /var/log/squid3 directory.
access.log holds requests sent to your proxy.
cache.log holds Squid server messages such as errors and startup messages.
store.log holds information about the Squid cache such as objects added or removed.