Computer Security Basics, 2nd Edition (2011)
Part II. Computer Security
Chapter 6. Web Attacks and Internet Vulnerabilities
If you are at all interested in computer and network security, you’ll need to know something about the Internet, and its subset, the World Wide Web (WWW). This chapter deals with the basics of the Internet and the Web, as well as several important Internet protocols that keep the Internet humming from behind the scenes. The chapter then discusses vulnerabilities of several of these services, as well as exploits that can be used to attack them. Finally, this chapter gives several suggestions of how users can defend against those who misuse the Internet to steal or annoy.
About the Internet
The Internet may have been invented by former Vice President Al Gore, but it has since taken on a life of its own.When many people think of the Internet, the first thing that comes to mind is often the World Wide Web; but the Web is only part of the story, if a highly visible one. This works to the advantage of the attacker, who crafts exploits based on less familiar parts of the Web in order to shut down the parts more readily seen.
In truth, the Internet is composed of many different connection schemes called protocols, all of which transmit over a common system of packetized communication called Transmission Control Protocol/Internet Protocol (TCP/IP). Among these are the following:
File Transfer Protocol (FTP)
The File Transfer Protocol allows rapid, reliable transfer of data files between repositories, called FTP servers, and between computers with FTP client software installed, called FTP clients.
Hypertext Transport Protocol (HTTP)
The Hypertext Transport Protocol allows users to access pages of text that are marked up using a special format called the Hypertext Markup Language (HTML). HTML tags are inserted into a web document to indicate the desired font, color, and position of text, and it facilitates linking to different different web sites, files, or pages. This allows an author to create the displayed page one time and to have it display more or less the same on any platform.
Simple Mail Transfer Protocol (SMTP)
The SMTP service allows a standardized method of electronic mail transmission.
Domain Name Service (DNS)
The Domain Name Service resolves the easy-to-read names familiar to Internet users, such as http://www.oreilly.com, to the Internet Protocol addresses that actually guide information around the network, such as 172.16.32.15.
Dynamic Host Configuration Protocol (DHCP)
When requested, DHCP automatically provides an Internet Protocol (IP) address, such as 172.16.32. 15, to a computer on a local area network. An IP address is required to communicate with other network devices that exist beyond the immediate proximity of the computer requesting the address.
Each of these is very useful to the Internet as we know it. But each is subject to attacks that can cause no end of problems. This chapter will describe several of these protocols, what they do, how they can be subverted, and provide some insight into what can be done about it.
History of Data and Voice Communications
When you call a friend on the phone, you imagine that the telephone company forms a direct connection between you and your friend. You don’t care if the connection is always there, or if the phone company just whips it up to order, provided it is there when you need it. This method of connection is called circuit-switched. Circuit switching solved a ton of problems in the early days of the phone network because it saved the telcos from having to run a wire from each subscriber to every other subscriber. Instead, a single wire could go from a subscriber to a central office (CO), from which the call could be switched down a trunk to the CO serving the recipient, and from there to the desired recipient.
When you use a modem to connect to school, office, or to a dial-up Internet provider, you are using circuit switching. Provided you are willing to wait out the time required to initiate the call, and are willing to take the risk that sometimes there will be more callers on the network than there are circuits (as happens when you hear the rapid busy signal), you pay for the connection only when you use it.
The opposite of circuit switching is to use a dedicated circuit. In a dedicated circuit, the wire you use is yours alone. You use it anytime you want, with no competition, and you pay for it whether you use it or not.
Dedicated and switched circuits were the basis of all wired and wireless communications from the beginning of electronic communications (about 1876) until a time just after World War II, when network supervisors decided to do something about a lingering problem with both systems. The problem was one of continuous underutilization, and the solution was a system of forced sharing between many users, called multiplexing. In a voice conversation, there are pauses (at least for the majority of people). During these intervals of silence, the two callers need not be connected. The line can do something else while two people are saying nothing to each other, as long as the parties are hooked up again before either of them has the next thing to say.
Over the years, multiplexing has taken many forms. Sometimes the contents of the line were shifted in frequency, so that one conversation theoretically could not be heard by the other. This system was called frequency division multiplexing (FDM). Other systems seized the line and rotated it between a number of conversations, returning it to you before you noticed it was gone. This was called time division multiplexing (TDM). A few systems capitalized on the nature of the phone systems themselves to create extra, phantom circuits by borrowing one of the two lines of your circuit and one of the two lines of a neighbor’s, and allowing other users to conduct conversations over them.
All of these systems increased the capability and capacity of systems, but they did little to overcome another key drawback: how to handle retransmission of missed or scrambled message segments, or missed or garbled speech? If a message was mangled in transmission, it was necessary to determine that fact, then to notify the sender, and await retransmission. All of this took time and resources. And while it was all getting sorted out and resent, current messages often had to wait.
Packets, Addresses, and Ports
To increase the reliability of communications, a new paradigm developed called packet switching. In a packet-switched network, messages are chopped up into chunks of uniform length, called packets. A packet-switched network gives each packet an individual address label and then shoots it out onto the network, trusting that each packet will eventually make it to its destination, although the packets of a single message may travel by different routes. The packet address includes mechanisms to insure that all packets are accounted for and assembled in the proper order at the other end of the link. (Broadly speaking, this is called User Datagram Protocol, or UDP.) Other mechanisms provide the capability to recognize which packets have been corrupted or delayed, and to facilitate retransmission of replacements. This is generally called Transport Control Protocol (TCP). TCP and UDP together work with the IP addressing system to facilitate all the services that the Internet provides to users. Additional protocols and services are also at work behind the scenes to keep the network smoothly running.
So how do packets help make networks and the Internet reliable? First, a packet travels over the circuit quickly. If it goes missing, its replacement can be retransmitted without taking a long time. When dealing with larger messages, the entire message has to traverse the circuit before it can be checked for errors in transmission. If an error is detected, the entire message must be transmitted again instead of just the errant packet or packets. Second, because it is understood that packets may take one of several possible routes to their destination, there is a possibility that packets may actually spend part of their journey traveling in parallel, rather than waiting constantly for the packets in front of them to move along. This parallel movement may increase overall transmission speed.
Packet transmission revolutionized communications, first remaking data transmission networks, then reforming the methods used for voice traffic, and finally, video. Packets have been with us since the early ARPANET, the precursor to today’s Internet. They are likely to be with us through the foreseeable future. In fact, the successor to packet communications, asynchronous transfer mode (ATM), the system on which most of today’s heavy communications backbones are built, refines packets further. Instead of using packets of roughly even length, ATM uses cells that are precisely 53 bits long.
There is much more that can be said about digital communications and packets, but we’ll go into only two more concepts: addresses and ports.
An Internet Protocol address describes a location on the network. Technically speaking, a network address, or an IP address in the case of the Internet, is a logical address. Users don’t often need to concern themselves with this logical, network address because the network has services that correlate such addresses with more friendly names, such as those that end in .com or .org. Beneath this layer of friendly conversion, however, logical addresses hold sway. The use of logical address allow the network to route packets to the correct part of the network, without concerning itself where in the world that location is. In fact, it is somewhat difficult to know by inspection where a packet actually goes based on its IP address alone, a concept that lends the Internet its “no-borders” characteristic.
IP addresses generally take the form of four numbers, separated by periods, in which each number is between 1 and 255. For instance, the IP address, 192.168.32.12 leads you to a destination on your network that’s administrator-designated. Between the action of the conversion layer and the protocols that work with the IP address, the packets that compose the messages get delivered.
Just as there is a conversion layer between the common name and the IP address, there is a conversion system that connects the IP address to a specific network device. This system is called the Address Resolution Protocol (ARP), and the address of each device is called the physical address, or in network speak, the machine access code (MAC) address. Think of the MAC address as you would the vehicle identification number (VIN) that uniquely identifies every automobile that is manufactured, and the IP or logical address as the license plate, or the identification that is required to operate the vehicle on public roadways.
There is one more component to addresses, and in some ways it is the most important of all, at least from a security perspective. It is important that packets are identified by function, that is, that their addresses contain some clue as to what they are intended to do. This allows them to be switched to the correct location by inspection (like reading the license plate) without having to open them up and examine their contents. For this reason, it is not uncommon for packets to have the same IP address, and to use the port number to state the packet function, so that the packet will be routed properly once it makes it to the device. This is similar to the various destinations used for things that come into a house. It is traditional, for instance, for mail to be delivered to the mail box or slot, for the newspaper to be placed or thrown onto the porch, and for water and gas to come into the house through the appropriate pipes. (To awaken and find natural gas coming out of the kitchen faucet, water pouring out of the oven, the mail in the flower bed, and the paper through the front room window is generally a sign that it will be a bad day.) In the physical world, getting these delivery functions right is the job of the respective couriers and plumbers. In the network world, it is the job of the port number.
Each IP address comes with roughly 65,000 port numbers, which can be thought of as cubbyholes, or individual message slots in a hotel lobby. Different types of network traffic use different ports. This is how the network keeps things straight, or knows to deliver mail in one place and newspapers in another.
Why do you need to understand this multitier system of addressing? Simple: most network attacks in some way involve falsely manipulating or replacing the IP address, MAC address, or port. In fact, one of the most important tools used today for network safety, the firewall, is based almost entirely on recognizing suspicious or invalid combinations of addresses and ports.
 A flurry of press clippings to the contrary, Gore appears never to have made the claim that he invented the Internet, except perhaps to point out that he promoted technical progress during his terms in the House of Representatives and as the Vice President. It is all a ruse, and a well-executed one. There was even circulated a bogus RFC, RFC 3000, in which Gore purportedly detailed his involvement. This was published before the RFC counter got that high. The real RFC 3000 is way less entertaining.
What Are the Network Protocols?
A protocol is a defined procedure for interconnecting and interacting. Within a protocol, defined behaviors allow the participants to move towards an anticipated condition or result. The protocols that determine how data are transported over the Internet, or over a LAN that uses TCP/IP, provide a variety of services. Some are directly visible to the end user, others operate in the wings or serve as aids in troubleshooting. Some protocols move web pages, some move email, some move files, and some move streaming media. Others exist to provide watchdog functions over the system or to allow various network components to communicate. Many of the most important network protocols, which also happen to be most commonly attacked, are the protocols needed to make communication over a network possible.
Data Navigation Protocols
The fundamental network protocol is likely Internet Protocol, which describes how packets will navigate from network to network. It does this by defining the structure of IP addresses. The IP also provides a fragmentation and reassembly function, which means that if a message, or datagram, is too long, an IP packet can be split into smaller chunks for transmission through the network and then put it back together when it gets to its final destination.
What IP does not do is keep track of whether messages actually make it to where they are going, or pace the transmission of messages so that a link does not overfill. IP treats each piece of a message, or Internet datagram, as an independent entity unrelated to any other Internet datagram. The IP must link up with several other protocols to insure reliable end-to-end delivery and retransmission of missing messages.
For instance, the Transmission Control Protocol wraps itself around the IP packet and provides the information needed to see a packet through multiple hops to its destination and determines if all packets made the trip. TCP can figure out which packets were lost and order up replacements. UDP, on the other hand, is the stripped-down version of TCP, moving packets with speed but sacrificing end-to-end reliability.
The File Transfer Protocol mentioned previously operates using TCP. All the data travels reliably over the network, and the transmission is not finished until the packets have all made the trip and been reassembled in order at the destination. FTP’s UDP cousin, however, Trivial File Transfer Protocol (TFTP), transmits the packets as a firehose transmits water. It streams, but it has no way to determine by itself if the water is hitting the target.
TCP can detect errors because each packet uses a cyclical redundancy check (CRC), which is like a parity or checksum, to check itself. A checksum is a mathematical mechanism that detects errors in transmission, usually by adding up the numeric value of all the characters transmitted and seeing if the total is the same at both ends of the link.
If IP needs to report errors to the sender, it uses helping protocols from a suite called Internet Control Message Protocol (ICMP). IP includes a facility for limiting the transmission of misdirected messages, a self-destruct mechanism called time-to-live (TTL). Every time a packet passes through a network node that retransmits it, the TTL counter gets decremented, that is, subtracted by one. If the TTL reaches zero before the Internet datagram reaches its destination, the Internet datagram is considered lost, and is destroyed.
Data Navigation Protocol Attacks
These four protocols: IP, TCP, UDP, and ICMP are the basis for Internet communications. They are also the basis of many attacks that use the Internet or of attacks against the Internet itself.
For instance, setting the TTL counter to a high or infinite value causes bad packets to crash on indefinitely. This takes up bandwidth and clogs the pipes so other traffic can’t get through. False ICMP messages can put the network on alert against threats that aren’t there, slowing or stopping communications, or causing unneeded reroutes. This can be particularly vexing when heavy traffic gets shipped down a narrow street. Network administrators will have a hard time limiting the flow without cutting off the desirable traffic.
TCP assures reliability by introducing sequence counters and acknowledgments to IP. Also, TCP transmissions proceed after the communicating systems give each other a handshake, that is, after both ends go through a short three-part exchange to confirm which other system they are talking with, similar to exchanging business cards before sitting down to make a deal. A favorite hacker trick is to open up a session (begin a communication) with a system under attack, receive an acknowledgment, and then leave the connection half-completed, tying up resources and memory on the attacked device. Do this enough times, and unprotected systems will buckle under the load, similar to meeting too many interesting people at one time at a party. Affected systems can hang up or cease functioning, denying services to legitimate users, or they can crash, possibly allowing attackers to modify the operating software with illicit changes that can create secret entrances to open the device to attackers.
Why is IP such a pushover? Because it’s not being used for that which it was built. The military wanted a network protocol that would survive a worst case scenario—something along the lines of global thermonuclear war. The network needed to pass traffic to every location smoothly and efficiently, and to be able to reconfigure itself around bad routes and sudden outages. Were the balloon to go up, and everyone except people in deep bunkers had to spend two weeks hiding in basements and under doors in slit trenches, while waiting for fallout to decay, the network was supposed to reconfigure itself and be ready to go once humanity reemerged, serving every place that was still a place.
Instead, the Internet became an “information superhighway” that led to economic growth, prosperity, and jobs. It became a tool of enhanced communications, helping to bring the entire human family closer together. True enough, there are robbers in the bushes around that highway, and attacks for money, both of the travelers and of the destinations, are increasingly common. There is also increasing concern about pedophiles who use the Internet to form associations with unsupervised innocents. These are unintended consequences against which the Internet was never fortified.
Other Internet Protocols
FTP, SMTP, and lots of the behind-the-scenes protocols use datagrams to communicate. These protocols can be subjected to attack. The easiest way to attack these datagrams is by monitoring the network using a packet sniffer. A packet sniffer surreptitiously monitors and decodes packets, allowing the attacker to gather information about the network and the devices and persons attached to it. A more sophisticated attack would be to change the contents of a datagram (data modification) or to make it appear as if it came from a different party (spoofing). Packet sniffers are, however, very useful tools for network administrators because they allow you to see what protocols are on the network.
File Transfer Protocol
The File Transfer Protocol was designed to promote sharing files, such as computer programs or data, by connecting machines reliably and efficiently, without getting tangled up in whether or not the host machine was the same brand or used the same operating system as the client. As a result, remote access of computers became more commonplace. In fact, even though simple FTP terminal programs are available, web browsers can often perform such transfers simply and transparently. This appears to be in line with the original intent of the standard.
However, the FTP protocol is subject to abuse. In the first place, it transmits in the clear without encryption shielding. Just sit and listen to a network connection, and in time, files by the boatload will come streaming by for the copying. FTP is also very subject to anonymous access. This is highly desirable in many environments, where to regulate access requires issuing passwords to every applicant, creating a tremendous administrative burden. It also means, however, that while you are in the process of gathering usage information from your FTP server, you will not learn anything more about me than I choose to reveal. Like an honor system coffee club, FTP is vulnerable to those lacking honor.
Simple Mail Transfer Protocol
The SMTP is designed to transfer email messages reliably and efficiently, again without regards to the particular computers or operating systems encountered along the way. It does this by setting up a channel between the initial sender and a receiver, which can be either the ultimate destination or some waypoint. Once the transmission channel is established, the mail sender issues a MAIL command, which identifies the sender and states that there is traffic to send. If the mail receiver can accept mail, it responds with an OK reply. The mail sender then sends a RCPT command identifying the mail recipient. If the mail receiver can accept mail for that recipient, it responds with an OK reply. If not, it responds with a reply rejecting that recipient (but not the whole mail transaction).
The mail sender and mail receiver may negotiate with several recipients. When the recipients have been negotiated, the sender sends the mail data. If the SMTP receiver successfully processes the mail data, it responds with an OK reply.
In the case that mail is sent to an intermediary stop, or waypoint, the process is repeated. If the mail receiver is the intended destination, the message is forwarded to a mailbox for storage until the recipient calls for it with her mail client.
Mail that can’t be delivered because of incorrect or invalid addresses are returned with a note from whichever mail server determined the problem, stating that delivery was impossible.
The SMTP system works so well that email has become an important means of doing business. This same reliability, however, is its undoing. Email is normally transmitted in the clear, which means that a host that pretends to be an email relay can access all email that passes through it; mail could then be copied or modified. When an attacker suspects that a user or administrator is getting suspicious, it is relatively easy to disconnect the relay and lay low. The flow of message receipts and returns may be delayed but will likely not be disrupted because of the self healing nature of the robust SMTP protocol.
Further, it is very easy to create an email message that looks as if it was sent from someone other than the true sender. This can create problems in its own right (for example, a university student notifies everyone in a class that a certain test has been cancelled, and the message appears to emanate from the professor’s computer). This also makes it easy to formulate an attack that sends tens of thousands of emails out to various addresses on the Internet, valid or not, using the spoofed return address of someone you wish to annoy or attack. As the emails bounce off the bad recipient addresses, your target will get a flood of annoying messages saying that the address is no longer valid. A few of the addresses will be valid, so your victim may get a couple of irate responses from legitimate but uninterested recipients as well.
SMTP and spam
The ability to spoof a return address and easily mail the same message to multiple recipients has lead to the uncontrolled outbreak of junk email, or spam. Spam, by some accounts, represents up to 50% of email traffic and is popular for one reason: email is dirt cheap. Junk physical mail is annoying, but it has a certain cost of delivery, and each piece must be handled and addressed, even if the name used is “occupant.” This inherent cost limits the total amount of junk mail sent out because someone has to pay the bill. Email, on the other hand, has few costs: scraping up a few million email addresses off newsgroups and chain letters is not really that hard, especially if software is used to scan newsgroups and other places where addresses are densely displayed. Plus, launching and sending such messages is largely automatic; just turn on the spambot machine, and forget it.
Some email recipients resent the intrusion, and send responses asking that they no longer be disturbed. This proves the user’s email is valid and increases the value of the address when compiled into lists sold to other spammers. Other addresses are nonexistent or already abandoned. Email systems will send back notices to the sender to this effect. As the bad addresses bounce back and fill up the sender’s inbox, there are no worries: it probably wasn’t a valid inbox. Most spammers are careful to avoid using their real Internet addresses.
Tracking spammers down requires a lot of detective work; for example, you have to request that ISPs and the owners of various intermediate systems check their logs. This makes detection difficult. Because most spam today leads respondents to a web page anyway, dead-letter information is not really important to the sender. In fact, as stated, the most malicious spammers mimic a return address of someone they wish to annoy, letting them deal with the crush of dead-letter notifications, which often come in such volume as to shut down the unlucky victims email account in order to protect the server.
The cost-per-thousand of spam is so low that any spammer with technical savvy and a little software can send millions of junk emails a day from a very humble facility. The potential returns make it worthwhile grubbing about for addresses.
In an experiment I completed recently, a valid but unadvertised email account was left active but unopened for one year’s time. At the end of that period, several hundred megabytes of email had accumulated—over 17,000 pieces in all. All of it spam.
Recent rule changes have made it illegal to send spam. However, just because a law is in effect in one country does not mean it’s valid in others. Many spammers operate remotely, from countries that don’t get too negative about earning revenue from a nonpolluting, high-tech industry such as bulk email. Other spammers use hacking techniques to turn ordinary computers, perhaps yours, into machines that will either generate or relay their spam for them. The best way to cope is likely the same way you cope with unsolicited physical mail. Use the antispam features of your email client software to filter undesired email into the recycle bin before you even see it.
Domain Name Service
Of the many other protocols inherent in the Internet, most are subject to attack or subversion. The Domain Name Service (DNS) has its own vulnerabilities.
DNS is used to resolve a friendly name, such as http://www.oreilly.com, to an IP address, such as 192.168,32.10. DNS is needed because while the Internet runs with IP addresses, people tend to think in words. The DNS service keeps a distributed directory handy, which allows you, the user, to type a uniform resource locator (URL) or Internet address, something that in most cases is fairly straightforward and easy to remember, into the address block on your web browser, and the computer will sweat the numbers.
DNS is not usually the first step in address resolution. To save time and prevent wasted bandwidth, a table of address and their URLs is usually stored or cached on the local machine. Your computer starts at this table when you make a web request, looking to see if it already has the IP address of the site you desire. When your local machine cannot find where to send a web request, it contacts the nearest DNS server, which tells the computer every thing it knows about the desired IP address. If if the address is unknown at the DNS server, that DNS server consults the next DNS server up the chain, until your address is found or you hit the top, come up empty, and are sent an error message.
This suggests three very convenient DNS attacks. First, if you seed the local machine’s cache with incorrect data, it sends the user’s communications to the wrong place, including possibly a decoy site of the attacker’s own design. Second, if you pollute the database of one of the nation’s big DNS servers, you may shut down a major portion of the Internet, which is always good for achieving status in the cracker underworld. Fortunately, the distributed nature of the DNS system makes this a little far-fetched because backup systems will likely kick in. Third, if you deny access to the DNS server that provides address resolution to a population of users, say the LAN that serves your company, users of that LAN are not going to be able to contact web sites for which they do not already have IP addresses. Attackers do this in at least two ways: take out the server with some kind of attack or change the place that your desktop computer looks for DNS resolution. It may be easier to force a DNS error by changing the place computers look for DNS by modifying the local information in cache than it would be to take down the server.
Poisoning the DNS system doesn’t only slow down or prevent the access of web pages and services. Mail may not work, remote filesystems may be rendered inaccessible, and network printing may go down. Essentially everything that involves an external communication is at risk when DNS fails.
Dynamic Host Configuration Protocol
To access the Internet, users need an IP address. But there are situations in which there may not be enough IP addresses to serve all users. (This used to happen frequently when networks expanded beyond their original projected size, and administrators discovered that they had not reserved enough address numbers.)
To overcome this shortage, a system to share IP addresses was created, called the Dynamic Host Configuration Protocol. DHCP provided an IP address to those users who were actually logged on at the moment, drawing them from a pool of all available IP addresses. The pool was usually much smaller than the number of users, but that was okay, because all the users were rarely in the office or using the Internet at once. Shortly after the user logged off, the IP address could be reassigned to another user that was just logging on. Oversubscribing, that is, operating with fewer addresses than computers, is a tactic also used by phone companies to provide everybody sharing a common bank of equipment.
Network Address Translation
Network Address Translation (NAT) has also become a popular way to share addresses among many users. With NAT, the IP addressing system inside the network is known only to those in the network. Outsiders see only a small number of external addresses, which are rotated among all the users who may need one. A table in the NAT server keeps track of which internal addresses map to which external addresses. The number of internal addresses can be almost unlimited (after all, they stay inside the closed network). Because these addresses never appear to the outside world, NAT network administrators usually adopt the Private Address space allocations set aside in the IP address system.
RFC 1918 ADDRESS ALLOCATION FOR PRIVATE INTERNETS
The Internet Assigned Numbers Authority (IANA), the organization that allocates the assignment of IP addresses, created the following three blocks of the IP address space for private internets.
§ Class A 10.0.0.0 to 10.255.255.255 (10/8 prefix)
§ Class B 172.16.0.0 to 172.31.255.255 (172.16/12 prefix)
§ Class C 192.168.0.0 to 192.168.255.255 (192.168/16 prefix)
Home networks and Wi-Fi wireless systems likely use addresses that fall into this assignment.
In some Microsoft networks, the Automatic Private IP Addressing (APIPA) addressing scheme may come be used in cases in which a DHCP system is nonfunctional. When the DHCP server is again able to service requests, clients update their addresses automatically.
Before NAT, DHCP was used to ration IP addresses. With NAT, DHCP is used to tell a PC which IP address it has been issued. This is in line with the root of DHCP, which is called the Bootstrap Protocol (BOOTP). BOOTP and DHCP tell a recently awakened PC many things it needs to know about its configuration, such as what address it can use and where it can locate various network resources. This is needed in case these things have changed while the computer was turned off. It is also necessary to receive this configuration information in case a computer is new to the network.
If the DHCP server is compromised and starts issuing wrong information, the ability of the computers on the internal network to access the external network or the Internet will be severely limited.
Attacks against DHCP usually involve interrupting these processes. For instance, one item that is frequently shared by DHCP is the location of DNS servers. If the DHCP server is compromised and starts issuing wrong information, the ability of the computers on the net to access the Internet will be severely limited.
This is not the only DHCP attack, however. Another popular attack is to change the pool assignments so that DHCP starts to issue IP addresses that are either invalid, or which are in use elsewhere. When this occurs, the routers and switches learn these new addresses and share them, and soon much of the traffic on the network can be going to the wrong place. Further, it may not be long before duplicate IP addresses begin to appear on the network. Many pieces of network equipment will blacklist devices that are using illicit, duplicate addresses. Finally, the routers and switches themselves will begin to labor under the strain of having to update so much information, and soon the network will be severely degraded. And this is not all. Bad address data must be purged, and good data must repopulate all the cache tables that need it. This takes time, is a burden to the network equipment, and it consumes bandwidth as well.
Port Address Translation
It is possible to overload a single external NAT address so that it can be used by several internal users. You can use port numbers in addition to network addresses to keep all ongoing exchanges organized; the resulting system is called Port Address Translation (PAT). (Ports were compared previously to the various services such as gas and water that entered a house separately, even though they were at the same physical address.) In a sense, PAT fills part of the role of DHCP because it shares a small number of public IP addresses, that would be one (1), with a larger number of users. Unlike DHCP, which may open some security holes, NAT and PAT can actually increase security because they obscure the true addresses used by users.
The Fragile Web
Of all the languages in the world, it is somehow fitting that the one that unifies most people in a common experience is a product of the Electronic Age. It’s not English, Chinese, or even Japanese; rather it is Hypertext Markup Language (HTML). One of the most important protocols on the Internet, HTML is used to format the text and images people place in their web pages.
The actual heavy lifter for web work on the Internet is Hypertext Transfer Protocol, a special protocol that carries HTML codes and other information over the Internet efficiently.
It would follow that anything so useful as HTML and HTTP would be subject to special attacks, and this is precisely the case. A whole family of attacks can be perpetrated against these formats, and when web pages become corrupted, they can actually become instruments of attack and destruction themselves. Attacks can take place on the user’s end, in which case they are called client-side attacks, or on the servers that send out the web pages that clients view, in which case they are called server-side attacks. This section describes the most important client- and server-side attacks.
How HTML Formats the Web
HTTP is a sparse language with a few terse commands such as “GET.” What HTTP “gets” is a small file of HTML code.
HTML is blocks of text surrounded by commands. The commands are generally embedded within greater than and less than signs, which are called start tags (< command >) and end tags (< /command >), respectively. The start tag instructs the user’s web browser to do something, say turn on the bold text function, and the end tag tells the browser to stop doing whatever the start tag kicked off. Thus the command that makes text bold applies to all the text between <bold> and </bold>. Given the many available text and commands in the HTML language, lots of effects are possible.
More advanced commands deal with inserting images on the page. Image commands are actually a call back to a server to download a picture, which the HTML code positions on a user’s screen.
Hyperlinks are also important. A user activates a hyperlink by clicking on it. This calls a new file from the server, which may be a whole new web page. Hyperlinks greatly increase the interactivity of a web site, while making it easy to organize a lot of information with a point-and-click interface.
You can view HTML code simply by calling up a web site and clicking on the browser’s View Source command. It is usually located on the command line at the top of the browser window. If you cannot find it, use the browser’s Help function to look it up.
Advanced Web Services
Merely displaying text, graphics, and photos, might make an attractive web site, but is unlikely to make much money. With increased advertising, the Web has become commercialized, and innovation, not all of it strictly beneficial, has become commonplace. In the first place, ads, instead of images, can be downloaded to users. Banner ads, for instance rotate among a variety of sponsors, who pay for page views, knowing that some percentage of those who view a page will be exposed to the ad. In order to make the ads change from time to time, the embedded commands needed to be a little bit more sophisticated. Various scripting languages have been adapted to this purpose.
A web page script is a little bit of computer code that executes either as the page loads into the computer from the server or on the user’s computer as the page is displayed. The scripts that execute on the server are called server-side scripts; scripts that run on the user’s machine are calledclient-side scripts. Using a script, complex functions and displays can be generated programmatically, greatly increasing the functionality and interactivity of web-based applications.
With the power of scripting comes also the opportunity for abuse. It is possible, for instance, to use scripting languages to force disallowed states on either the client or server machines, which can result in unpredictable results and crashes. Some scripts can even exploit vulnerabilities in the underlying operating systems. In this case, the scripts become exploits, and can cause serious trouble or damages. A good deal of work has now gone into trying to ensure that scripting languages on the Web are secure, and for the most part they are. However, every now and again, somebody somewhere discovers a new code sequence that can get around the controls and checks. Most browsers enable users to disable scripts and applets, or to run them in a restrictive fashion. Assuming users stay in the mainstream in terms of web sites, it should not be an issue.
What is a script?
The term script in a computing context dates back to the early 1970s and comes from the Unix operating system term shell script. A shell script is a sequence of commands that are read by the computer from a file. Using scripts, multistep commands can be embedded in a web page. Scripting languages have increased in flexibility and robustness; today, scripts can be executed either at the server as the page is called from memory or at the user’s computer as the material is called up for display.
The advantage to client-side scripting is that it allows the local computer to check against data entry errors such as proper abbreviation of a state name, or the format of something such as a phone number, before the data crosses the network and ties up the server, or worse, creates an entry in the server which has to be corrected. The advantage to scripting on the server side is that it helps preserve confidentiality. Processes that involve company secrets are executed in the safety of the server room rather than distributed out to where they can be captured and analyzed. In addition, some processes are so intensive that it might bog down the user’s machine to execute them on the client side. Anything that involves lots of tables or uses a database is usually best kept server side.
Scripting allows a lot of interaction between the client and the server. Responses from the field may be organized and checked by a client-side script, which transmits them to the server. On the server side, a script may use the information contained in a client’s response to modify on the fly what the server sends back to the user. This way the application shows the user exactly what is required or what has changed as a result of the user’s entry. A raft of conventional business applications can be operated in real time on the Web using judicious scripting. Focusing demands for computational power on the server also allows for the use of simpler clients. This means that cell phones, video games, and personal digital assistants (PDAs) can be viable terminals in complex interactions.
Client-side scripting languages
There are several different scripting languages. While a few are used both client side and server side, most languages are used on one or the other. Language choice is generally based on the specific features or characteristics of the language, adherence to company- or project-wide standards, or simply personal preference.
Occasionally the native browser does not have the program code required to execute everything that is asked of it. Certain media files, for instance, may require a special code to play back properly. A plug-in is a software component that loads automatically to extend the range of files types that a web browser can understand and execute.
These are the most popular scripting languages for client-side applications:
§ Java (requires plug-ins on the receiving machine)
§ ActiveX Controls
§ Macromedia Flash (requires a plug-in on the receiving machine)
Multiple scripts can run on the same web page, but languages cannot be mixed within the same script.
Java may require the use of a browser plug-in, but it is a popular language because it can operate on any computer that can host a Java Runtime Engine (JRE). The programmer must write the Java code and compile it, but thereafter it will run on any type of computer or device that has a JRE, with little extra programming effort. Java scripts for the client side are usually called applets. When they are constructed to use as server-side, they are called servlets.
Java has lots of power, but it has certain built-in security features that are designed to keep it from doing anything damaging to the machine on which it is running. These features create a virtual environment for the Java code to run in, formerly called a sandbox. The Java security manager keeps the Java code from crawling out of the sandbox. A Java that tries to escape the sandbox and do harm to the client is called a malicious applet.
ActiveX is a powerhouse language that features a lot of capabilities and a fair amount of danger. ActiveX controls can be integrated into web pages and use numerous functions such as buttons, drop-down list boxes, text entry, and display fields. ActiveX controls can take nearly complete control of the host machine, unlike Java, which is at least partially constrained by the security manager. ActiveX can cause real trouble if malicious code is included. On the other hand, ActiveX is powerful and flexible, making it highly attractive to programmers and developers.
Server-side scripting languages
When a server-side script runs, it is often because a user has submitted answers to an application or form. The user then downloads a page that shows the effect of the user’s submissions. A general property of all server-side processing is that when a user submits a form, the server processes it and creates a new HTML page. The user downloads this page to observe the effect of what they have requested, in effect, the completed form.
The use of scripts on the server is divided between two main paths. Some set ups communicate with the server directly; others communicate via the Common Gateway Interface (CGI).
The Common Gateway Interface (CGI) is a middleman between the web server and the clients. CGI establishes the rules under which various programs will talk to each other. The purpose of the CGI is to translate between unlike languages and systems. CGI programs can be written in many languages. C, C++, Java, Perl, Visual Basic, or any other language that can accept user input, process it, and output a response, can be used in CGI scripts.
If the scripting language is portable between platforms, such as Java, the script can play in multiple environments, and CGI might not be employed.
Scripting languages for server-side applications include:
§ Active Server Pages (ASP)
§ Java Server Pages (JSP)
Which programming language to use is a matter of preference and performance. Perl, PHP, and Ruby tend to look like code written in C, C++, or Java. They bristle with curly braces. ColdFusion code, on the other hand, is stuffed with HTML-like tags. This bothers some programmers because the resulting code seems to lack precision. Metrics that matter include reliability, flexibility, ease of making changes, speedy performance, and an ability to wring the most performance out of the computer it runs on.
Support for the application is also important. A firm that specializes in Windows applications that also uses some Unix or Linux merely because one of the employees is a whiz at it may find themselves in a difficult position if that employee leaves.
Perl is a language tightly focused on the Web and web applications. It has a long history of service. Veteran programmers swear by it. Perl is also an open source language, which means it is produced and supported by a community that allows members to access the internal workings of the language. This may increase its inherent security because any changes will be seen by many eyes, rather than a few employees working under the cover of confidentiality agreements. Combining Perl with an open source web server, such as Apache, creates a powerful combination for web servers.
PHP is open source. Because PHP resembles C, Java, and Perl, this can shorten the time it takes to learn it if you are already an experienced programmer. PHP features easy interface with numerous databases and allows you to program on both Windows and Unix machines.
ColdFusion is a popular web development tool because it uses a system of tags called ColdFusion Markup Language (CFML), which resembles HTML. The ColdFusion development environment tries to automate the coding process as much as possible. This may limit coding errors, since less is done by hand.
Java Server Pages (JSP), or servlets, are a set of Java classes optimized for client server interactions. JSP operates with true Java, which brings portability, multithreading, extensive class libraries, strong safety features, robust security measures, and several other of Java’s advantages. Java servlets are efficient because they are persistent. Once created, usually at the time of the first request, the servlets stay resident in memory as compact Java objects. When a subsequent request comes by, the servlet can quickly build the HTML page the client requires. With ASP, the code may need to be reinterpreted for every client request, slowing down the process of page generation.
Python is an interpreted, interactive, object-oriented programming language that resembles Perl and Java. The syntax is clear and avoids nested curly braces. Python is copyrighted but freely usable and distributable, even for commercial use. Python can run on many brands of Unix, Windows, Macintosh, and others.
Each of these programming languages can make the web experience better for the user and more efficient for the provider. However, each of these environments can be a source of trouble. Every language or technology has its own set of bugs and exploits that server operators must track and patch against. It is important to monitor language-specific newsgroups and bulletins to learn about the problems and solutions.
Web Attacks and Preventions
The following sections describe both client-side and server-side web attacks, and what you can do about them.
Client-side web attacks
Client-side web attacks include the following:
Malicious HTML tags in web requests
Malicious code in a form window can cause the server to generate pages that are unpredictable or dangerous if run on the server. Malformed pages sent back to the client for execution may cause further problems.
Curative: Webmasters must not allow nonvalidated input. Use client-side scripting to clean up form data before it is transmitted.
Malicious code from other clients
A web site with a discussion group may be open to attacks of the form:
Hello Group- Here is my message!
That is all!
If a victim client has scripting enabled, their browser may run this code unexpectedly.
Curative: Users should turn off script functions, web servers should screen for embedded tags that show a script may be present.
Clients sending malicious code to themselves
An attacker can slip a client a message or file and encourage them to post it to the server. When the server echoes or displays the posting, the client’s machine may execute it.
Curative: Webmasters should screen data, even if the intended recipient is the client that sent it.
Abuse of tags
Tags such as <FORM>, normally harmless enough, can cause trouble if they’re embedded at the wrong place. An intruder can trick users into revealing sensitive information by modifying the behavior of an existing form or can display information that may have been held in the form of a previous user.
Other HTML tags can alter the appearance of a page, insert unwanted or offensive images or sounds, break things, and otherwise disturb the peace by interfering with the page’s intended appearance and behavior.
Curative: Set browser security to high and lower it only for those users you are sure will not violate that trust.
While visiting a web site, a simple text file called a cookie is often placed in the user’s computer. At the next visit, the web server scans for cookies, and if it locates one, can use the cookie data to recall the previous conversation. A poisoned cookie is one that has been altered to trigger the download of malicious code.
Curative: Keep security settings high until trust is earned. Scan all incoming files (cookies included) for viruses to prevent the injection of malicious code.
Using the wrong character set
Browsers interpret the information they receive according to the character set chosen by the user. If the user fails to specify a character set, the web server uses a default setting, which can result in garbled displays or unintended meanings.
Curative: Users should declare their character set when configuring their browsers.
General client-side attack preventatives
General measures that can be taken to help prevent client-side attacks include the following:
§ There are a host of security options built into most browsers, but it is up to the user to tailor them to her specific situation and needs. Web page developers should filter their page output to eliminate these types of problems.
§ Users should set security high and set scripting off. This may disable some web functionality, so users should know how to make changes to browser configurations as required.
§ Remember the client may not be the intended victim. A carefully constructed attack may execute script on the client machine that is designed not to hurt the client, but rather some other computer or network to which the client is connected. For instance, the client may have cached security information when it last connected to a particular server, and this authorization information may be co-opted by the attacker in order to attack the server via the client.
Server-side web attacks
Servers can be attacked just as easily as clients, or perhaps more readily. Servers have the dual disadvantage of having to be exposed to many users, and possibly also to the Internet.
One of the most serious attacks against a server involves causing an intentional buffer overflow. Although the arrangement varies slightly from computer to computer and from operating system to operating system, in most computers, RAM memory is organized by roping off a piece for the operating system, then roping off a section to be used for temporary variable storage called the stack. Above the stack is cordoned off yet another section of memory, this one called the heap, after which is the memory storage spot for code waiting for execution.
If one of these areas, often the stack, suddenly grows too large, it may overwrite the area above it. This is called smashing the stack. When this happens the values that were stored in those regions are changed to whatever was being written into memory at the time the overwriting occurred. This may cause the computer to behave erratically or to crash.
If the values designed to be overwritten are chosen with extreme care, they may actually end up being stored, as if they were instructions. They may execute the next time the computer reads those memory locations. This is one way to inject arbitrary code into the server; such code could be instructions that allow an attacker to take over the computer.
One popular approach is to install a back door —code that allows an intruder to enter the machine without going through security first. Another approach is to install a rootkit—code that promotes the privileges of the attacker to that of the system administrator. Either would be a home run. It would be enough to install a code stub to facilitate installing more code later on. This can be as simple as telling the computer to go to a previously prepared web address or URL, where the malicious code is waiting.
Curative: The defense against buffer overflows is good programming practice. No user input should ever be permitted without first verifying that it is of the correct length and that it contains no characters that may be invalid or that may be misinterpreted.
Because most Internet transactions last only as long as you are logged on, all the information about that session usually dissolves as soon as the link is broken. Cookies remind the server who you are and what you talked about last time. If someone copies your cookies, the potential exists for them to trick the server into revealing what it knows about you. This may not be critical, but if the server has a credit card number on file, or personal information, this can expose too much information about you.
This becomes a more powerful attack if a really complex business process is involved. For instance, if one server hosts your credit card information and address information, and a second holds your academic records, while a third holds your current schedules, it may be possible to use minimal access to one server to trick the others into revealing a great deal. This exploits the server-to-server trust. These attacks are called cross-site scripting (CSS).
Curative: Cross-site scripting is a complicated exploit. Once the attacker is communicating from server-to-server as a peer, many unpredictable things can happen. The strongest defense is for each server to be properly protected in the first place.
Malicious ActiveX controls
As part of the .NET initiative, ActiveX allows powerful interactions between computers. ActiveX operates in a way that resembles a plug-in for Java, but ActiveX does not use an isolating area of memory as does Java. Today’s fine-grained access controls prevent a lot of Java problems because it confines the code to allowable portions of the machine it is running on. Malicious code using ActiveX, on the other hand, can have much more power. Such code can facilitate many computers working together, or it can be exploited, with devastating consequences.
TIME (XNTP) exploits
Many network processes require that servers and clients keep track of time, and that there is no variance between elements in the network. Time is kept the same all over using the XNTP Network Time Protocol. If an attacker can assign legitimate times to fake messages, the messages could masquerade as real ones. Alternatively, a legitimate message can be rendered suspect by shifting its time component to an illegitimate value.
Internet protocols, while they have allowed the internetworking of computers and the distribution of applications, have opened up a host of security issues. The protocols were meant to be robust, not secure. The enemy was never thought to be within. The trust by which computers are able to interact and complete complicated business processes turns out to be the undoing of much business, because sensitive information can be spilled and confidence lost. Further, the attacks that can be accomplished via the Web are crippling, because they can use the Web’s flexibility to invade and cripple the computers that are attached to it.
In a way, society might be fortunate that the Internet was built first to be durable. This appearance of dependability allowed it to gain a following, then a foothold, then a share of mind, and finally enough dependence upon it that we now demand its weaknesses be fixed. Had the Internet been created with an isolationist mind, many of its applications and potential applications that we now struggle to protect may never have been developed.