Linux Administration (2016)
Network Troubleshooting
Network troubleshooting is a large and complex topic. How you approach a situation will largely depend on the circumstances and environment in which you are doing the network troubleshooting. However, in this chapter, you will learn some of the most common tools you can use to perform network diagnostics.
First you'll learn about the ping command and how to test network connectivity with it. Next you'll learn how to examine network routes using the traceroute and tracepath commands. You'll also learn how to see various network statistics with the netstat command. We'll cover how to analyze raw network traffic using tcpdump. Finally, you'll learn how to test if a port is actually open by connecting to it with the telnet command.
Ping
If you are having trouble connecting to a host over a network, one of the first things you can is to ping the host. The ping command sends one or more ICMP packets to a host that you specify and waits for a reply.
To use the ping command, simply run ping and provide a hostname or IP address. By default, ping will keep sending packets until you stop the program with Control-C. If you want to specify the number of packets to send, use the -c option. For example, to send three packets to google.com run ping -c 3 google.com.
Here is the output of a ping command.
$ ping -c 3 google.com
PING google.com (2.5.2.7) 56 bytes of data.
64 bytes from 2.5.2.7: icmp_seq=1 ttl=53 time=20.1 ms
64 bytes from 2.5.2.7: icmp_seq=2 ttl=53 time=20.2 ms
64 bytes from 2.5.2.7: icmp_seq=3 ttl=53 time=23.9 ms
--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 21.489/22.924/24.154/1.111 ms
You should notice that the hostname was translated into an IP address. In this case, google.com resolved to 2.5.2.7. If the name doesn't resolve you'll get an "unknown host" error. In that case you should use the IP, address of the system you are trying to connect to. Also, if you can ping by IP address but not by name, there is a problem with the resolution of the DNS name.
That ping command sent three packets. The statistics section reported that 3 replies were received and thus no packet loss was encountered. This means we have network connectivity to google.com. You'll also notice that each packet has a time associated with it. In this example the first reply was received in 20 milliseconds as was the second packet. The third reply took 23.9 milliseconds. You'll see a summary of this activity on the last line in the output. RTT stands for "Round Trip Time".
Here is an example where no replies were received. You'll see that 100% packet loss is reported. This means there is no network connectivity between this host and google.com.
$ ping -c 3 google.com
PING google.com (2.5.2.7) 56 bytes of data.
From 2.5.2.7 icmp_seq=1 Destination Host Unreachable
From 2.5.2.7 icmp_seq=2 Destination Host Unreachable
From 2.5.2.7 icmp_seq=3 Destination Host Unreachable
--- google.com ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2002ms
pipe 3
Here is an example where no replies were received. You'll see that 100% packet loss is reported. This means there is no network connectivity between this host and google.com.
At this point, the only thing we know is we can't ping google.com. It doesn't necessarily mean that google.com is down. At this point, I should try to ping something on my local network. If I cannot ping anything on my local network, then I have a problem with my host. Maybe my network cable was accidentally disconnected. Maybe I performed an upgrade on my server and the network drivers didn't update properly. Maybe I forget to start the networking services on my server after I performed some maintenance. The point is I at least know where to start looking.
If I could successfully ping another host on my local network, then the problem lies outside of my system. Maybe the router on the edge of my company's network is down and I cannot reach any hosts on the public internet. We could test that scenario by pinging other hosts like facebook.com or youtube.com. If we can ping Facebook and Youtube, then it's a problem specifically getting to Google. Perhaps Google installed a firewall that simply discards ICMP packets and thus pings will never work. If that turns out to be the case, then we'll need to use other tools to test network connectivity, which we'll be covering soon.
This example demonstrates pinging an IP address. This IP address is on same local network as the host I'm running the command from and the responce times are very fast. It's less than 1 millisecond in fact.
$ ping -c 3 10.0.2.2
PING 10.0.2.2 (10.0.2.2) 56(84) bytes of data.
64 bytes from 10.0.2.2: icmp_seq=1 ttl=63 time=0.272 ms
64 bytes from 10.0.2.2: icmp_seq=2 ttl=63 time=0.103 ms
64 bytes from 10.0.2.2: icmp_seq=3 ttl=63 time=0.202 ms
--- 10.0.2.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.103/0.192/0.272/0.070 ms
Trace Route
Ping tests an endpoint, but it doesn't tell you anything about the path or route the network packets take. To examine the route, use the traceroute command. Note that the traceroute command requires root privileges to function properly.
By default, traceroute will attempt to translate IP addresses into DNS names. If you want to skip that step and just work with IP address, use the -n option. This will speed things up a bit and can be helpful if you are experiencing DNS issues. This output is easier to read than DNS names, in my opinion.
# traceroute -n google.com
traceroute to google.com (2.5.2.7), 30 hops max, 60 byte packets
Diagnosing Network Connections 413
1 10.0.2.2 0.296 ms 0.178 ms 0.220 ms
2 192.168.1.1 2.529 ms 2.713 ms 2.630 ms
3 72.14.237.231 23.750 ms 22.087 ms 12.12.132.137 22.701 ms
4 216.58.216.78 20.549 ms 12.250.16.30 22.904 ms 216.58.216.78 20.724 ms
The traceroute command sends 3 packets to each hop along the way. You can see the response times for each hop along the route. The first hop is very quick while the last hop is slower. This is expected behavior. However, if one of the hops along the path takes a very long time to respond, that's an indication of where an issue may exist. Maybe there is network congestion on that particular router, for instance.
If you see an asterisk where you normally see times, that means a reply wasn't received. Some routers are actually configured to block traceroute data. In these cases the traceroute command may be of little use to you. If network connectivity is otherwise working and you see asterisks in the traceroute output, that probably means a router is blocking traceroute data and not that there is an actual problem.
Network troubleshooting consists of looking at the same situation from multiple angles, using multiple tools, and drawing conclusions from the overall picture. It also helps to know how your particular network is configured. Situational awareness is the key to network troubleshooting. You cannot simply rely on one tool like ping or traceroute and be guaranteed you know what is happening on a network.
An alternative to traceroute is tracepath. The tracepath command does not require root privileges. You can use the -n option to use IP addresses instead of DNS names just like you can with traceroute.
The tracepath command will produce one line of output for each response it receives, unlike tracreoute which produces one line of output per hop. In the following example, you'll see that two responses were received from 10.0.2.2.
$ tracepath -n google.com
1?: [LOCALHOST] pmtu 1500
1: 10.0.2.2 0.470ms
1: 10.0.2.2 0.649ms
2: 192.168.1.1 2.147ms asymm 64
...
For simple checks, tracepath can do the trick. For advanced options, you'll probably end up using traceroute.
Network Statistics
The netstat command can be used to collect a wide variety of network information. Here are some of my favorite and most used netstat options.
-n Display numerical addresses and ports.
-i Displays a list of network interfaces.
-r Displays the route table. (netstat -rn)
-p Display the PID and program used.
-l Display listening sockets. (netstat -nlp)
-t Limit the output to TCP (netstat -ntlp)
-u Limit the output to UDP (netstat -nulp)
The -n option is used to display numerical IP addresses and ports as opposed to hostnames and service names. You can use this option in conjunction with most other netstat options.
Get a list of network interfaces on your system by using the -i option.
To display routing information, use -r. I often use netstat -rn to display the routes using IP addresses.
The -p option displays the PID and program that is using a given socket. For example, if you are connected via SSH to a server and you run netstat -p, you will see the PID of the specific SSH process you are connected to. Note that you'll need to use root privileges with the -p option.
The -l option displays listening sockets. Use this option in conjunction with the -p option to see what processes are listening on what ports. On a web server, for example, it will show that a process such as nginx or apache is listening on port 80. If you cannot connect to a given port on a system, run this command to make sure that a process is actually listening on that port.
You can limit the output of netstat to a specific protocol. To limit output to the TCP protocol, use netstat -t. For UDP, use the -u option. If you want a list of all programs that are listening on tcp ports, you can use netstat -ntlp.
Here is some sample output from the netstat command. The first bit of output is a list of network interfaces from the netstat -i command. Next, the routing information is displayed with netstat -rn. Finally, a list of programs that are listening on TCP ports is displayed. In this example, SSH is listening on port 22, and a program called master, which is the Postfix master process, is listening on the SMTP port, port 25. Remember to use root privileges with the -p option. I accomplished that by using the sudo command.
[jason@linuxsvr ~]$ netstat -i
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 3975 0 0 0 2627 0 0 0 BMRU
lo 65536 8 0 0 0 8 0 0 0 LRU
[jason@linuxsvr ~]$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 0 0 0 eth0
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
[jason@linuxsvr ~]$ sudo netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 943/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1313/master
Packet Sniffing
Sometimes it's not enough to know that network connectivity is in place. Sometimes you need to examine the contents of the network traffic to ensure payloads are actually being delivered. Perhaps one host is claiming to send data to another; to be sure that data is reaching its destination, you can look at the traffic it is receiving. To do this, you'll want to use some sort of packet sniffing tool, such as tcpdump.
Even though there are several other tools that perform this same task, tcpdump is one of the older and most commonly installed tools. It requires root privileges to run. If you run it without arguments, it prints out a description of the contents of network packets being received.
It will display information such as a timestamp, the source system address and port, the destination system address and port, and packet specific information. The tcpdump utility will continue to examine packets until you stop it with Control-c.
Like other networking commands we've covered, tcpdump uses the -n option to both suppress DNS queries and display numerical addresses and ports.
To display information in ASCII—or human readable—format, use the -A option. This will allow you to see human readable text, if that type of data is being received on the host. For example, if you are using tcpdump to examine incoming traffic on a webserver, you can see the URL paths that are being requested if you use the -A option.
If you want even more output and information, use the -v option. To increase the verbosity, use -vv; for the most verbosity, use -vvv.
The following is some sample output from tcpdump. On the far left hand side of the output is the time stamp. Next is the source information followed by the destination. Finally, information about the network packet is displayed at the end of the line.
$ sudo tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
19:25:49.639495 IP linuxsvr.ssh > 10.0.2.2.64440: Flags [P.], seq 3312803324:3312803408, ack 2443835, win 40880, length 84
19:25:49.639586 IP linuxsvr.ssh > 10.0.2.2.64440: Flags [P.], seq 84:120, ack 1, win 40880, length 36
19:25:49.639750 IP 10.0.2.2.64440 > linuxsvr.ssh: Flags [.], ack 84, win 65535, length 0
19:25:49.639763 IP 10.0.2.2.64440 > linuxsvr.ssh: Flags [.], ack 120, win 65535, length 0
The following output shows an example of verbose ASCII output. You'll notice that a client requested the /about page from the web server on this host. Remember to use root privileges when executing tcpdump.
$ sudo tcpdump -Anvvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
19:44:27.067530 IP (tos 0x10, ttl 64, id 5120, offset 0, flags [DF], proto TCP (6), length 64)
10.0.2.44.37534 > 10.0.2.15.80: Flags [P.], cksum 0xfe34 (incorrect -> 0xce40), seq 1:13, ack 1, win 683, options [nop,nop,TS val 1585227 ecr 1584441], length 12
E..@..@.@.(............P..>.:.......0K..-9GET /about
Telnet
The telnet command is practically obsolete. It was originally used to log into remote systems. Today, SSH has taken its place, but telnet can be used in network troubleshooting. Since telnet has fallen out of favor for interactive logins, it may not be installed by default on some linux distributions.
You can use telnet to initiate a TCP connection to a host on a specific port. Let's go back to a previous hypothetical situation. Let's say we cannot ping google.com from our host. We know that in and of itself doesn't necessarily mean that google.com is down. To see if google.com is accepting web traffic, we can connect to the HTTP port, which is port 80. To do this, we type telnet google.com 80.
If the port is open, we’ll get a message like "connected to google.com." If you want to, you can send data directly to the port by typing in some data. The HTTP protocol does accept human readable commands. For example, to request a web page, use "GET" followed by the path. To get the home page, type "GET /". Once you are ready to close the connection, hold down the ctrl key and press the right bracket key (^]). This will bring you to a telnet prompt. To exit telnet, type quit and press enter.
When you connect, you may get a message like "operation timed out" or "connection refused". “Operation timed out" means a connection could not be established. This could because traffic is silently getting dropped before it reaches the port or that port is not open on that host. If you get a "connection refused" message, that means the port is being blocked by a firewall.
$ telnet google.com 80
Trying 216.58.2.7...
Connected to google.com.
Escape character is '^]'.
GET /
HTTP/1.0 200 OK
^]
telnet> quit
closed.
Summary
In this chapter, you learned how the ping command can be used to determine if network connectivity exists between two hosts. You also learned that even if ping fails it does not necessarily mean the host you are pinging is down.
Next you learned how to trace the path network traffic takes on the way to a host. You also learned how to list network interfaces, show the route table, and display the applications that are listening on ports by using the netstat command.
We also covered how to perform sniff network packets using tcpdump. Finally, you learned how to test for port connectivity with the telnet command.
Quiz
1. Which commands can be used for network troubleshooting?
1. ping
2. traceroute
3. netstat
4. tcpdump
5. All of the above.
2. If you can't ping a host, you can always be assured that the host you are attempting to ping is down.
1. True
2. False
3. If you can ping by IP address but not by name, there is a problem with the resolution of the DNS name.
1. True
2. False
Quiz Answers
1. E
2. B
3. A