Text Processing/Advanced Command Line - CompTIA Linux+ / LPIC-1 Cert Guide (Exams LX0-103 & LX0-104/101-400 & 102-400) (2016)

CompTIA Linux+ / LPIC-1 Cert Guide (Exams LX0-103 & LX0-104/101-400 & 102-400) (2016)

Chapter 6. Text Processing/Advanced Command Line

This chapter covers the following topics:

Image Working with Input/Output Streams

Image Pipes

Image Executing Multiple Commands

Image Splitting and Processing Streams

Image Filters

Image Formatting Commands

Image Using Regular Expressions and grep

This chapter covers the following objectives:

Image Use streams, pipes, and redirects: 103.4

Image Process text streams using filters: 103.2

Image Search text files using regular expressions

This chapter focuses on the concepts and practice of getting a lot done on the shell command line. Chief among the skills you gain from this chapter is the ability to choose commands and chain them together properly, and sometimes interestingly, to get your work done.

Unix and Linux have a toolset mentality that needs to be cultivated, and it must be learned to be successful in your sysadmin/programming work and on the exams.

Everything in the Linux command arena needs to do three things:

Image It should do one thing very well.

Image It should accept standard input.

Image It should produce standard output.

With commands and the shell’s constructs that help connect things together (pipes, redirection, and so on), it becomes possible to accomplish many things with just one command string of connected commands.

“Do I Know This Already?” Quiz

The “Do I Know This Already?” quiz enables you to assess whether you should read this entire chapter or simply jump to the “Exam Preparation Tasks” section for review. If you are in doubt, read the entire chapter. Table 6-1 outlines the major headings in this chapter and the corresponding “Do I Know This Already?” quiz questions. You can find the answers in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and Review Questions.”

Image

Table 6-1 “Do I Know This Already?” Foundation Topics Section-to-Question Mapping

1. Which of the following is the file descriptor that matches stdout?

a. /proc/self/fd/0

b. /proc/self/fd/1

c. /proc/self/fd/2

d. /proc/self/fd/3

2. What is the result of the following command?

$ find / –iname "*.txt" | file > sort

a. A file named “file” in the current directory

b. An error message that the files could not be found

c. A file named “sort” in the current directory

d. An endless loop on the system

e. None of the above

3. While creating a script to perform backups from one server to another, you want to ensure that if the first command isn’t successful a second command will run. Which of the following when inserted between the two commands accomplishes this?

a. &

b. >>

c. ||

d. ;

e. @

4. As the sysadmin of a financial organization, you receive a real-time feed of information that your web team uses to display financial data to customers. You must keep a log of the data you receive as well as send that data on for further processing.

Which of the following commands would you use to accomplish this?

a. tac

b. split

c. tee

d. branch

e. culvert

5. You are trying to display a formatted text file in a uniform manner as output on the console, but some of the columns don’t line up properly. What command could you use to filter this file so that any tabs are replaced with spaces when it’s output to the console?

a. convert

b. expand

c. detab

d. spacer

6. You want to replace the words “Linux Tarballs” with “Linus Torvalds” in a given text file. Which of the following commands would you most likely use to execute a search and replace on these words?

a. ex

b. ed

c. edlin

d. tr

e. sed

7. You are using a series of command line tools to display output that is useful, but what you would rather see as output is the exact opposite of what is being shown. Which of the following commands with option shows the reverse or inverse of the normal output?

a. grep -v

b. sort –r

c. find -x

d. indent -u

e. None of the above

Foundation Topics

Working with Input/Output Streams

Linux supports separate streams to handle data on the shell command line. These are called file descriptors and are used primarily to send data to and from programs and files and to handle errors.

Table 6-2 lists the three file descriptors and their associated files.

Image

Image

Table 6-2 Linux File Descriptors

Standard In

Standard in, or stdin, is what all programs accept, or it’s assumed they will accept. Most programs accept stdin either from a redirected file argument or using the file as an argument to the program:

program < file

or

program file

For most programs, these examples are identical. We cover several commands that don’t handle arguments properly, such as the tr command.


Note

Often, the stdin of a program is the stdout of another program, connected by a pipe symbol.


Standard Out

Standard out, or stdout, is the text or data that’s produced by a command and shows up on the screen or console. By default, all text-mode commands produce stdout and send it to the console unless it’s redirected. To understand this, run the following command:

cat /etc/fstab

The text shown onscreen is a perfect example of the stdout stream. It’s considered elegant to run commands first to see what they produce before redirecting the output to a file, particularly because you might get errors.

Standard Error

Image

Standard error, or stderr, is a parallel stream to the stdout, and by default it shows up mixed into the stdout stream as the errors occur.


Note

To visualize this, think of two pitchers of water. One is stdout and the other stderr. If you pour them out and mix the streams of water together, they may be different, but they both go to the same place.


It’s hard to produce errors on purpose; however, we can always use the find command to produce some access denied or permission errors to experiment with.

As a normal user, run the following command:

find / -iname "*.txt"

Right away you see errors that indicate the user can’t access certain directory trees to find certain items. Notice that useful output (stdout) is mixed directly with error messages (stderr), making it potentially hard to separate the streams and use the good data for anything constructive.


Note

The life of a sysadmin is defined by the search for producing good data and properly discarding or storing the errors produced. Good data can be sent on to another program, while errors are usually dumped at the earliest possible moment to use the processor and resources to produce good data more efficiently.


To clean up the good data and get rid of the bad data, we need to use redirection operators. To see this work, use the up arrow and rerun the previous command, as shown here:

find / -iname *.txt 2> /dev/null | more

This produces output similar to

./.kde/share/apps/kdeprint/printerdb_cups.txt
./1.txt
./2.txt
./3.txt

Notice that you get only good data (stdout) after using a 2> redirection symbol to dump the bad data (stderr) to the system’s black hole, or garbage disposal—in other words, a pseudo-device designed to be a place to discard data securely.

You learn more about this in the “Redirecting Standard Error” section.

Redirection of Streams

Image

In the quest for good data, being able to redirect or change the destination of stdout and stderr, and to some degree stdin, is essential to your tasks.

Redirection symbols include

Image <—Redirects a file’s contents into a command’s stdin stream. The file descriptor for the < input redirection symbol is 0, so it’s possible to see <0 used.

Image >—Redirects the stdout stream to the file target to the right of the symbol. The file descriptor for the > output redirection character is 1, which is implied, except in certain instances.

Image >>—Redirects stdout to a file, appending the current stream to the end of the file, rather than overwriting the file contents. This is a modifier of the > output redirection descriptor.

Image 2>—Redirects stderr to a file, appending the current stream to the end of the file, rather than overwriting the file contents. This is a modifier of the > or 2> output redirection descriptor.


Note

If using the > redirection symbol to write to a file, that file is overwritten unless the noclobber bash shell option is set. With that option set, you cannot overwrite the file; it produces an error and file. The only way to get data into that file is to use the >> redirection append symbols. This can be configured by running the Set –o noclobber command.


Redirecting Standard Input

Redirecting stdin consists of sending a file’s contents to a program’s stdin stream. An example of this is sort < file1.

Although it might seem odd to have a couple of ways to do the same thing, the previous command is essentially the same as the cat file1 | sort command.

Redirecting Standard Output

Redirecting stdout consists of either a single redirection symbol (>) or two (>>) for appending. The main difference is that the use of a single redirection descriptor overwrites a file, whereas using double redirection descriptors appends to the end of a file, like so:

cat file1 > file2

This overwrites the contents of file2 or, if it doesn’t exist, creates it.

The following command appends the data from file1 to file2 or, if it doesn’t exist, creates file2:

cat file1 >> file2

As an alternative example, say you run the find command and it produces errors and found files to the screen. You can capture the good data to a file and let the errors show on the console with the command shown here:

find / -iname *.txt > foundfiles

When you are redirecting stdout, the numeral or file descriptor 1 doesn’t need to be used for most cases. (In other words, you can just use > and not have to use 1>; the 1 is implied and not necessary.) Redirection of stdout is so common that a single > symbol suffices.

Redirecting Standard Error

Redirecting stderr consists of understanding that, by default, stderr shows up on the same target as the stdout, mixed right in but separable.

To continue the previous example but capture the stderr and let the good data show on the default target (console), you would change the command to

find / -iname *.txt 2> errors

The 2> errors section of the command redirects the stderr and puts it into the file errors, leaving the stdout stream free to show on the default target (console) or even get written to another file.

Image

The key to understanding what happens when using stdout and stderr is to visualize them as shown in Figure 6-1.

Image

Figure 6-1 Path of data streams

As you can see, the > character grabs the stdout stream and puts that data into the file gooddata, whereas the stderr stream is unaffected and is sent to the console for display.

Grabbing both streams and putting them into different files is as simple as adding a redirection symbol preceded with the stderr numeral:

find / -iname *.txt > gooddata 2> baddata

This grabs both streams and puts them into their proper files, with nothing displayed to the console.

Redirection Redux

Sometimes all the possible output from a particular command must be trapped because it will cause problems, such as when a command is run as a background job and you’re using vi or some other console-based program. Having stderr show up onscreen during an editing session is disconcerting at the least, and if you’re configuring important files, it’s downright dangerous.

Image

To trap all output from a command and send it to the /dev/null or black hole of the system, you use the following:

find / -iname *.txt > /dev/null 2>&1

You will see items like the previous as exam topics, and it’s important that you’ve done the task yourself, multiple times if possible. Take a few minutes to experiment with the examples shown in this text. Getting this set of symbols right in a fill-in-the-blank question is difficult if you’ve not typed it a number of times.


Note

Don’t be confused by the use of the /dev/null device; its sole purpose is to be a catch-all fake device that you can use to discard output of any kind.


Pipes

Image

A pipe (|) is used for chaining two or more programs’ output together, typically filtering and changing the output with each successive program the data is sent through.

Quite possibly the simplest usage of a pipe is to take the output of a particular command and use a pipe to send it to one of the pagers, such as more or less. Pagers are called that because, especially with the more command, you are shown the output not in a running stream down the console but as if it had been cut up into pages that fit your screen. For example, in Figure 6-2 the ls command’s standard output is sent to the less command as that command’s standard input.

ls –l | less

Image

Figure 6-2 The output of a command being piped to less

The less command offers a lot of great functionality for viewing output. You can search forward for something by entering the / character followed by the string you want to find, for example:

/somestring

Also while in less, you can use most of the typical navigation commands used in other programs such as vi or vim (covered in later chapters), such as 1G to go to the first character of the first line of the file or G to go to the end of the file. The page up and page down keys work as well as the traditional cursor movement (hjkl) and arrow keys to navigate around the file.

Further or more complex use of pipes includes chaining together several commands that each add to the output of the first command to produce something useful.

Image


Note

It is important to remember that commands pipe output to other commands until the output is finally sent to the screen/console or it’s committed to a file with an output redirect, such as the > or >> characters. Standard error is not being redirected or trapped unless the 2> designator sends it to a file location or the /dev/null device.


For example, to print a code sample with numbered lines (nl) and printer formatting (pr), you use the following command string:

cat codesamp.c | nl | pr | lpr

It’s essential that you know the difference between a redirection symbol and a pipe. Say you are shown a similar command such as

cat file1 | nl > pr

This command produces a file in the current directory named pr, not output that has been filtered through both nl and pr.

Ross’s Rule: Redirection always comes from or goes to a file, whereas piping always comes from or goes to a program.

Good examples of using pipes to produce usable data include

sort < names | nl


Note

Remember that you usually don’t have to include the < or input redirect when specifying a file argument to a program such as sort, but we include this here because you may see this on an exam at some point.


This sorts and then numbers the file names.

Another example is

who | wc –l

This counts the users attached to the system and shows just the total number.

Here’s one more example:

lsof /mnt/cdrom | mail root –s"CD-ROM Users"

The previous command is designed to show you who is currently accessing or has opened files on the CD-ROM of the server, so you know who to tell to log off when it’s needed.

Executing Multiple Commands

There are several methods for executing multiple commands with a single Enter key. You can use special characters to just have it execute multiple commands or get fancy with if/then types of multiple-command execution.

Multiple Command Operators

Image

When compiling software, scheduling backup jobs, or doing any other task that requires a particular program’s exit status to be a particular value, you need to use these operators:


Note

It’s important to remember that each of the commands has its own set of stdin, stdout, and stderr descriptors. They flow into and out of each of the commands in between the operators.


Image ;—The semicolon causes all listed commands to be executed independently of each other. The following example echoes back when a long compile is done:

make modules ; echo DO MAKE MODULES_INSTALL NEXT

The commands are independently executed and neither command fails nor succeeds based on the other’s exit status.

Image &&—The double ampersand causes the second command to be executed if the first command has an exit status of 0 (success). If an exit status of nonzero (fails) is returned, the second command is not attempted. If you’re a sysadmin and want to have a second program do something if the first succeeds, use the double ampersand like this:

longcompile && mail root –s "compile complete"

This set of commands starts a long compile; if it succeeds, you get an email stating compile complete in the subject line.

Image ||—The double pipe causes the second command to not be attempted if the first command has an exit status of 0 (success). If the first command has an exit status of nonzero (fails), the second command is attempted. What if you want to have a second command let you know whether a particular process failed without having to dig through the log files every morning? You could use the following:

tar –czvf /dev/st0 / || mail root –s "doh, backup failed"

As you can probably guess, this command set attempts a full system backup to a SCSI tape device. Only if it fails does the root user get an email with the subject line indicating it failed.

Command Substitution

Image

In some instances, you need to take the output of a command and place it into a variable, usually for scripting purposes. Substituting the output of a command for the command itself is accomplished by bracketing the command with the backtick (`), aka the unshifted tilde (~) key, like so:

'somecmd'

An example of this is inserting the output of the date command into a variable, possibly for use in a script, such as in this example:

export DATETIME='date'
echo $DATETIME
Tue Jan 13 17:18:35 PST 2004

Image

The export command is used to create a variable named DATETIME that is being populated by the `date` command. When this is executed, the backticks around the date command cause the output for that command to be inserted into the DATETIME variable as a value.

Another facet of substituting commands is to enclose the command itself between parentheses and declare it as a variable, as in this example:

file $(grep –irl crud /usr/src/linux-2.4)

The main reason to use a command substitution like this is it allows you to nest commands within commands. Rather than having to use wildcards, you just use the right substitution.

Another fun example of using command substitution is looking at a given binary and seeing what libraries it requires without knowing where that binary is actually located.

ldd 'which ls'

linux-gate.so.1 => (0xb778c000)
libselinux.so.1 => /lib/i386-linux-gnu/libselinux.so.1
(0xb774e000)
libacl.so.1 => /lib/i386-linux-gnu/libacl.so.1 (0xb7745000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7595000)
libpcre.so.3 => /lib/i386-linux-gnu/libpcre.so.3 (0xb7557000)
libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xb7552000)
/lib/ld-linux.so.2 (0xb778d000)
libattr.so.1 => /lib/i386-linux-gnu/libattr.so.1 (0xb754c000)

Splitting and Processing Streams

Two commands that work well with and complement the use of pipes are the tee and xargs commands.

Splitting Streams with the tee Command

Image

The tee command is designed to accept a single stdin stream and simultaneously send one set of identical output to a specified file and the other one to be used as stdin to another program via pipe.

You might use tee when running a program that must produce output to a file and you want to monitor its progress onscreen at the same time, such as the find command. To redirect one stream of output to a single file and also see the same output on the screen, use

find / -iname *.txt | tee findit.out

This command is designed to log the standard output of a stream to a file and pass another complete stream out to the console. Financial institutions that have to log and simultaneously process data find tee useful.

Processing Output with the xargs Command

Image

The xargs command is another useful command. It takes a list of returned results from another program (such as the output of the locate or find commands, essentially a series of path/filenames) and parses them one by one for use by another simpler or less-capable command.

A good example of this is your wanting to have all readme files on the entire system in one large file called mongofile.txt in your home directory. This would enable you to search the documentation with a single less mongofile.txt command.

To do this, we use the find command to find the full path and filename of the readme files on your system; then we use the cat command to take the contents and redirect the results to our target file:

find / –iname readme | cat > mongofile.txt

It would appear from the apparent lack of critical errors that we were able to get the results we wanted—that of all lines from each file being output to the mongofile.txt, one after the other. This turns out not to be the case.

To see what went wrong, issue the less mongofile.txt command, which reveals that we didn’t get the output we wanted—the cat command isn’t smart enough to determine that the output from the find command is actually discrete lines that can be used individually as if they were arguments. It just echoes the output as it was given it, in one big text blob, not treating the individual lines as lines of usable output for the cat command.


Note

Perhaps an easier way to see what happened is to use the wc –l command against mongofile.txt. It comes back with a discrete number of lines in the file for your initial reference. (For reference, I got 679 lines of output from my command.)


Run the command again with the xargs command acting as a buffer for the cat command. It reads all the output and individually feeds cat a single line as an argument until there are no more lines to feed to cat, like so:

find / –iname readme | xargs cat > mongofile.txt

Now use the less mongofile.txt command to verify that it worked as we originally intended. We now see that all the files have been enumerated and appended to each other to make one large file.


Note

Again, the wc –l command shows the resulting number of output lines, which after we used xargs, will be significantly larger, as the final mongofile.txt represents the catenation of all the output of the files, not just the output of the file listing as in the first command. (For reference, I got 76,097 lines of output from my command.)


Filters

A filter is a command that accepts stdin as input and performs an action, alteration, or other process on the input, producing stdout and (if necessary) stderr from it.

Sorting

The sort command is a typical filter. It takes a command as an argument or can have stdin sent to it; then it either performs a default sort or accepts a large number of options. A good example of how sort can help is to take a file (file1) that has the following lines in it:

Ross Brunson
Peabody McGillicuddy
Ursula Login
Snuffy Jones

Sorting this file with the default sort options produces the following output:

$ sort file1
Peabody McGillicuddy
Ross Brunson
Snuffy Jones
Ursula Login

The sort command uses fields to sort by, and the default field is 0, or starting at the first column and ending at the first blank space. The typical field separator or delimiter is a space character or sometimes a tab. However, the delimiter can be set to be any nonblank character. This particular file was sorted by the first name of the listed people.

To sort by the last names, use the following command and options:

sort +1 file1
Ross Brunson
Snuffy Jones
Ursula Login
Peabody McGillicuddy

Image

Another useful option with sort is the -n (numeric sort) option. If a file contains numbered lines that start with 1 and go to 30, the standard sort would sort them as

sort numbers
1
11
12
13


Note

The above output is truncated for readability.


To tell sort to read the longest number and pad the lower numbers with leading zeroes internally during the sorting process, you use the command shown here:

sort –n numbers
1
2
3
4


Note

This illustrates is the difference between a “human-friendly” sort and a “machine” sort. When humans sort we typically want to see things progress from 1 to whatever eventual number, but this wouldn’t make sense if you are doing a literal or machine sort.


Numbering Lines

Image

The nl command is useful for numbering either every line in a file or stream of input, or just the lines with data on them. This is helpful when trying to troubleshoot source code or producing a list of numbered items automatically.

To number only the lines in a file that contain data, use this command:

nl file1

To number every line in a file, regardless of the line having data in it, use this:

nl –ba file1

Expect to see the nl command used with commands such as pr and tac chained or piped together to produce a particular output. Order matters, so be sure to notice whether the question wants to number all lines or just nonempty lines.


Note

An actual example of using the nl command to number lines of a file occurred early on in the author’s career. There was a large programming staff and we used to do printouts of the code base for team review. Until we discovered how to properly put line numbers on all lines of the file, as well as number the pages in an automated fashion, preparing for the meeting took a very long time.


In many situations, being able to determine the number of lines or words in a particular file or output is useful. The wc command shows items in any of three counts:

Image -l—Lines of output

Image -w—Words of output

Image -c—Characters (or bytes) of output

For example, a great way to see how many users are on the system at a given time is to use wc to count the lines of output from the who or w commands:

who | wc –l
34
w | wc –l
36

Both of these commands were done on the same system, one right after the other, with the same number of users.


Note

The w command has two header lines counted by the wc program, whereas the who command simply outputs a line for each user. Be careful when using wc to count items without first viewing the raw output; otherwise, you’ll misreport or represent inaccurate information.


Tabs

Image

When output is formatted for the screen, a tab-delimited file can display with oddly placed columns due to the length of the data in each field. The expand command helps change tabs to a set number of spaces.

Consider a file that contains the following lines (the line numbers are just for reference):

1: steve johnson
2: guillermo villalobos
3: bo regard
4: lawrence aribacus
5: marge innovera

If tabs were used to separate the data fields, lines 1, 3, and 5 would have two tabs between fields 1 and 2, so the columns would line up. Lines 2 and 4 would have only one tab, due to the length of the first field. expand converts the tabs to a set number of spaces, making the data display right in most cases.


Note

Watch out on the exams. You’ll be presented with a number of plausible-sounding distractors for the expand command, possibly including the convert command, which is used to convert graphics file formats between each other.


Cutting Columns

There will be plenty of times when you need to take a file that contains columns of data, regardless of the delimiter or separator, and either extract information on a column-by-column basis or perhaps even reorder the columns to make it more usable. Although the objectives of the Level 1 of LPI’s exams don’t include the awk command (the king of columnar data), they do test you on the cut command, which is more than adequate for the task:

cut –c 20-40 /etc/passwd | tail –n 5
ar/spool/postfix:/bin
hare/pvm3:/bin/bash
ross brunson:/home/rb
home/snuffy:/bin/bash
:/home/quotaboy:/bin/

This displays only from column 20 to column 40 of file1, excluding all other data on each line.

Image

It can also grab certain fields from a file, such as the /etc/passwd file. To grab the username, description, and home directory fields for each user, use the following command:

cut –d: -f 1,5,6 /etc/passwd | tail –n 5
postfix::/var/spool/postfix
pvm::/usr/share/pvm3
rbrunson:ross brunson:/home/rbrunson
snuffy::/home/snuffy
quotaboy::/home/quotaboy

The -d option sets the delimiter, which in this case is the : character. By default, cut uses tabs for a delimiter.

Pasting and Joining

Two commands that are similar in function are paste and join. paste doesn’t remove any data from the output, but join removes redundant key fields from the data.

For example, say you have the following files:

file1:
Line one of file1
Line two of file1
file2:
Line one of file2
Line two of file2

Using paste on these two files produces the output:

Line one of file1 Line one of file2
Line two of file1 Line two of file2

Notice that nothing is lost from the files. All the data is there, but this can be redundant in the extreme if you want to produce a joint file from two or more files.

Image

The join command is more of a database join style than a catenation style (just paste one file’s contents after the other on the same line). It takes a file as the first argument and by default treats the first field of that file as a key field. The second and subsequent files are treated in the same fashion. The output is each matching line of the files in order, minus the redundant key fields from any but the first file.

For example, say you have the following files, users and location:

users:
rbrunson:500:
snuffy:501:
quotaboy:502:

location:
rbrunson 123 anystreet anytown ID 83858
snuffy 123 circle loop chicago IL 88888
quotaboy 123 some lane anyburg MT 59023

As you can see, the output of these includes only the unique information from each file, leaving out the location key field:

join users location
rbrunson:500: 123 anystreet anytown ID 83858
snuffy:501: 123 circle loop chicago IL 88888
quotaboy:502: 123 some lane anyburg MT 59023

Unique Data

There will be times when you need to take a number of disparate files with similar data and produce a “master list” from them, such as when you consolidate servers. One of the tasks to accomplish may be merging the /etc/passwd files. As long as all the users have the same UID/GID settings in each file, merging the files still makes an output file that contains multiple entries for various users.


Note

You should always sort the contents of a file you are about to use the uniq command on, as it will group the nonunique lines together and uniq will then remove all but one unique instance of a line.


For example, if you copied all the /etc/passwd files from three servers into a single file, running the following command outputs only the unique lines from the entire file:

uniq –u /etc/bigpasswd
rbrunson:x:500:500::/home/rbrunson:/bin/bash
snuffy:x:501:501::/home/snuffy:/bin/bash
quotaboy:x:502:502::/home/quotaboy:/bin/bash

Image

The -u option causes only the unique lines from the file to be output, so the command shown here could be used to redirect the output to a new /etc/passwd file by just adding a redirection symbol and the new filename:

uniq –u /etc/bigpasswd > /etc/newpasswd

To print a single example of each line that is a duplicate in a file, use the following command:

uniq –d bigpasswd

To print every instance of each repeated line, use this command:

uniq –D bigpasswd

Heads or Tails?

The head command is used primarily (and by default) to see the first 10 lines of a given text file by default. head can be made to show a particular number of lines, starting at the top of the file. The -n parameter followed by the number of lines to be shown starting from the beginning of the file is used to show more than the default. This parameter is used in the following manner:

head –n 5 /etc/fstab
LABEL=/ / ext3 defaults
1 1
none /dev/pts devpts gid=5,mode=620
0 0
none /proc proc defaults
0 0
none /dev/shm tmpfs defaults
0 0
/dev/hda6 swap swap defaults
0 0

Image

The head command can’t display ranges of lines, only from the beginning of the file.

The tail command is the exact opposite of the head command: It displays the last 10 lines of a given file by default and can be configured to show less or more lines, but only from the end of the file. It can’t show ranges. Here’s an example:

tail –n 5 /etc/passwd
netdump:x:34:34:Network Crash Dump user:/var/crash:/bin/bash
quagga:x:92:92:Quagga routing suite:/var/run/quagga:/sbin/nologin
radvd:x:75:75:radvd user:/:/sbin/nologin
rbrunson:x:500:500::/home/rbrunson:/bin/bash
snuffy:x:501:501::/home/snuffy:/bin/bash

The tail command is also useful for following log files, such as the /var/log/messages file to see the latest attempts to log on to the system:

tail -f /var/log/messages

This returns output similar to what’s shown here:

Feb 23 21:00:01 localhost sshd(pam_unix)[29358]:
session closed for user root
Feb 23 21:00:04 localhost sshd(pam_unix)[29501]:
session opened for user root by (uid=0)
Feb 23 21:00:13 localhost sshd(pam_unix)[29501]:
session closed for user root
Feb 23 21:00:16 localhost sshd(pam_unix)[29549]:
session opened for user root by (uid=0)

When you combine the two commands truly interesting things become possible. For example, say you wanted to view lines 31 to 40 of a 50-line file. Remember that you can’t display ranges with either of the commands, but by putting them together, you can display a range of lines from the file 50linefile with the following command:

head –n 40 50linefile | tail
31
32 Both software and hardware watchdog drivers are available in the
standard
33 kernel. If you are using the software watchdog, you probably also
want
34 to use "panic=60" as a boot argument as well.
35
36 The wdt card cannot be safely probed for. Instead you need to pass
37 wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11".
38
39 The SA1100 watchdog module can be configured with the "sa1100_
margin"
40 commandline argument which specifies timeout value in seconds.

Watch for the head and tail commands on the exam—particularly the -f option for following a log file’s latest entries.

Splitting Files

The split command is useful for taking a large number of records and splitting them into multiple individual files that contain a certain amount of data.

The split command’s options include (the # character represents a number of prefix characters)

Image -a #—Uses a suffix a specified number of characters long (the default is xaa).

Image -b #—Output files contain the specified number of bytes of data.

Image -c #—Output files contain the specified number of lines of bytes of data.

Image -l #—Output files contain the specified number of lines of data.

The # value can be b (which is 512 bytes), k (which is 1024 bytes), or m (which is 1024 kilobytes). For example, if you need to split an 8.8MB text file into 1.44MB chunks, you can use this command:

split –b1440000 bigtextfile.txt
ls –l x??
-rw-r--r-- 1 root root 1440000 Feb 23 09:25 xaa
-rw-r--r-- 1 root root 1440000 Feb 23 09:25 xab
-rw-r--r-- 1 root root 1440000 Feb 23 09:25 xac
-rw-r--r-- 1 root root 1440000 Feb 23 09:25 xad
-rw-r--r-- 1 root root 1440000 Feb 23 09:25 xae
-rw-r--r-- 1 root root 1440000 Feb 23 09:25 xaf
-rw-r--r-- 1 root root 249587 Feb 23 09:25 xag


Note

When taking the exams, you expected to split a file with a known number of lines into multiple component files, be able to determine the default names for the files, and know how many of them would be produced given the defaults for the split command.


When cat Goes Backward

In some cases you want to display a file backward or in reverse order, which is where the tac command comes in. You can use it like so:

cat file1

This produces output similar to the following:

1 Watchdog Timer Interfaces For The Linux Operating
2
3 Alan Cox <alan@lxorguk.ukuu.org.uk>
4
5 Custom Linux Driver And Program Development

Using tac on this file produces the following output:

tac file1

This produces output similar to

5 Custom Linux Driver And Program Development
4
3 Alan Cox <alan@lxorguk.ukuu.org.uk>
2
1 Watchdog Timer Interfaces For The Linux Operating

Viewing Binary Files Safely

Many times a sysadmin has accidentally used cat to send the contents of a file to the screen, only to have it apparently contain machine code or the Klingon language. Usually you can type clear and have the screen come back, and then you have to use something other than cat to view the contents of that file.

Image

You would typically use the od command to safely view binary or non-ASCII files; otherwise, the display will likely become garbled and the system will beep plaintively as the console attempts to interpret the control codes in the binary file.

od is capable of displaying files in different methods, including

Image -a—Named

Image -o—Octal

Image -d—Decimal

Image -x—Hexadecimal

Image -f—Floating point

Most of these formats aren’t for daily use, with only the hexadecimal and octal formats displaying output of much interest.


Note

Watch out for questions about how to view binary files. The od command is just about the only possibility for such viewing. If you do have a problem with a garbled console after accidentally viewing a binary file, use the reset command. It reinitializes the terminal and makes it readable again. Another fun option is the strings command, which shows you the text contained in a binary file, instead of garbling up the screen.


Formatting Commands

Image

The pr and fmt commands are used to do line wrapping and formatting for the printing of text streams or files and are often used with each other to format a set of text properly for a particular purpose, be it peer review or managers who like to think they can read code.


Note

We used the following 50linefile earlier and it’s been numbered by the nl command, so if you are trying this and either don’t have a file named 50linefile or yours is not numbered, go back and complete the previous examples where this is done.


The pr command is useful in formatting source code and other text files for printing. It adds a date and time block, the filename (if it exists), and page numbers to only the top of each formatted 66-line page, like so:

pr 50linefile

This produces output similar to what’s shown here:

2112-02-23 21:19 50linefile
Page 1

1 Watchdog Timer Interfaces For The Linux Operating
System
2
3 Alan Cox <alan@lxorguk.ukuu.org.uk>
4
5 Custom Linux Driver And Program Development

The pr command can display columns of data, cutting the columns to fit the number per page, like so:

pr --columns=2 50linefile
2004-02-23 21:02 50linefile Page 1

1 Watchdog Timer Inte 26 and some Berkshire cards. T
2 27 internal temperature in deg
3 Alan Cox <a 28 giving the temperature.
4 29
5 Custom Linux D 30 The third interface logs ke

The fmt command is useful for formatting text files too, but it’s limited to wrapping long lines to fit on smaller pages or within columns that pr has set.

The previous example of pr columns chops the data off at the columns, losing data on the page. This can be fixed by mixing the commands, as shown here:

[root@localhost root]# fmt -35 50linefile | pr --column=2
2004-02-23 21:49 50linefile
Page 1

1 Watchdog temperature. 29 30 The third
Timer Interfaces For The interface logs kernel messages
Linux Operating System 2 on additional alert events.
3 Alan Cox 31 32 Both software and

Translating Files

The tr command is for changing characters in files or streams, but not whole words or phrases—that’s for sed to do.

For example, if you have a file that contains a lot of commands from a sample in a book, but some of the commands are dysfunctional because the editor capitalized the first characters of the lines, you can translate the file’s uppercase letters to lowercase with the following command:

tr 'A-Z 'a-z' < commands.txt

The tr command isn’t capable of feeding itself, or accepting a file as an argument. It’s unfair, but we often say that the tr command is less intelligent than cat, since cat can feed itself. As this is the case, the < operator is therefore mandatory when using tr; otherwise, the command won’t work.

The following command can be used to accomplish the same results:

tr [:upper:] [:lower:] < commands.txt


Note

Remember that tr is incapable of feeding itself or accepting a file as an argument, so the < redirection symbol is needed to send the input file to the command. Anything else is a broken command and produces a syntax error.


He sed, She sed

The sed, or stream editor, command is used to process and perform actions on streams of text, such as the lines found in a text file. sed is amazingly powerful, which is a way of saying it can be difficult to use.

A good analogy of the way sed works is to imagine that your text file is a long string that stretches from one side of the room to the other. On that string, you can put special transforming beads and slide them down the string, having it perform that particular transformation as it slides along the string or the lines of the file. The neat thing about sed is that you can stack the beads or send them one behind the other and make what can appear to be an almost magical transformation of a text file occur with a single command or set of commands.

Image

One of sed’s most-used operations is searching and replacing text, including words and complete phrases. Whereas tr works only on characters/numerals as individuals, sed is capable of complex functions, including multiple operations per line.

sed uses the following syntax for commands:

sed –option action/regexp/replacement/flag filename

Rather than struggle through an explanation of what happens when certain options are entered, let’s see what sed does when we use those options. Using sed properly includes being able to, ahem, “reuse” sed commands from other sysadmins.

To replace the first instance of bob with BOB on each line in a given file, use this command:

sed s/bob/BOB/ file1

To replace all instances of bob with BOB on each line in a given file, use this command:

sed s/bob/BOB/g file1

sed allows for multiple operations on a given stream’s lines, all of which are performed before going on to the next line.

To search for and replace bob with BOB and then search for BOB and replace it with snuffy for every line and every instance for a given file, use this:

sed 's/bob/BOB/g ; s/BOB/snuffy/g' file1

The use of a semicolon character is similar to bash’s capability to run several commands independently of each other. However, this whole operation, from the first single quotation mark (‘) to the last single quotation mark is all performed inside sed, not as part of bash.

When sed is used for multiple commands, you can either use a semicolon to separate the commands or use multiple instances of -e to execute the multiple commands:

sed –e s/bob/BOB/g –e s/BOB/snuffy/g file1

On the exam, and whenever you might use sed with spaces in your patterns, bracket the whole pattern/procedure in single quotation marks, such as

sed 's/is not/is too/g' file1

This keeps you from getting syntax errors due to the spaces in the strings.

Sooner or later, you’ll get tired of typing the same operations for sed and want to use a script or some method of automating a recurring task. sed has the capability to use a simple script file that contains a set of procedures. An example of the previous set of procedures in a sed script file is shown here:

s/bob/BOB/g
s/BOB/snuffy/g

This script file is used in the following manner:

sed -f scriptfile targetfile

Many multiple procedures can be performed on a single stream, with the whole set of procedures being performed on each successive line.

Obviously, doing a large number of procedures on a given text stream can take time, but it is usually worth it because you only need to verify that it worked correctly when it’s done. It sure beats doing it all by hand in vi!

Another feature of sed is its capability to suppress or not have displayed any line that doesn’t have changes made to it.

For example, if you want to replace machine with MACHINE on all lines in a given file but display only the changed lines, use the following command with the -n option to make the command suppress normal output:

sed –n 's/machine/MACHINE/pg' watchdog.txt

The pg string at the end prints the matched or changed lines and globally replaces for all instances per line, rather than just the first instance per line.

To do a search and replace on a range of lines, prefix the s/ string with either a line number or a range separated by a comma, such as

sed –n '1,5s/server/SERVER/pg' sedfile
The X SERVER uses this directory to store the compiled version of the
current keymap and/or any scratch keymaps used by clients. The X
SERVER
time. The default keymap for any SERVER is usually stored in:
On the exam, the sed questions are all about what will find and
replace strings, with particular attention on global versus single
replaces.

Getting a grep

One of the more fun text-processing commands is grep. Properly used, it can find almost any string or phrase in a single file, a stream of text via stdin, or an entire directory of files (such as the kernel source hierarchy).

grep (global regular expression print) uses the following syntax for its commands:

grep –options pattern file

The grep command has many useful options, including

Image -c—This option shows only a numeric count of the matches found, no output of filenames or matches.

Image -C #—This option surrounds the matched string with X number of lines of context.

Image -H—This option prints the filename for each match; it’s useful when you want to then edit that file, as well as being the default option when multiple files are searched.

Image -h—This option suppresses the filename display for each file and is the default when a single file is searched.

Image -i—This option searches for the pattern with no case-sensitivity; all matches are shown.

Image -l—This option shows only the filename of the matching file; no lines of matching output are shown.

Image -L—This option displays the filename of files that don’t have a match for the string.

Image -w—This option selects only lines that have the string as a whole word, not part of another word.

Image -r—This option reads and processes all the directories specified, along with all the files in them.

Image -x—This option causes only exact line matches to be returned; every character on the line must match.

Image -v—This option shows all the lines in a file that don’t match the string; this is the exact opposite of the default behavior.

Examples of Using grep

grep can either use files and directories as the target argument or be fed stdout for parsing. An example of using grep to parse output follows:

who | grep ross

This command parses the who command’s stdout for the name ross and prints that line if found.

A more complex example of grep being used is to combine it with another command, such as find:

find / -name readme –exec grep –iw kernel {} \;

The previous command finds all the files on the system named readme and then executes the grep command on each file, searching for any instance of the whole word kernel regardless of case. A whole word search finds the string kernel but not kernels.

Image

The key to using the -exec option is to know that when the find command output is returned, instead of getting output as a series of full path and filenames of the found results, the output is fed to the command following the -exec option, represented by the {} character, and each line is then executed upon.

Example of a find command:

find ./ -iname file*.txt
/home/rossb/test/file.txt
/home/rossb/test/file1.txt
/home/rossb/test/file2.txt

Then when you add the -exec and {} \; string, it effectively is as if you had run each of the lines of output through the command you wanted to execute. The find -exec combination continues to execute on lines of output as long as they keep coming, ending when they are finished.

An innovative use of the grep command’s options for finding strings is to have it show you lines that don’t match the string. For example, you might want to check the /etc/passwd group periodically for a user that doesn’t have a shadowed password:

grep –v :x: /etc/passwd
snuffy:$1$3O238jrk$WcT15uH7V0EgxdtFTlxkK1:501:501::/home/snuffy:/bin/
bash

It looks like snuffy has an encrypted password in the /etc/passwd file. You should therefore run the pwconv command to fix this and make snuffy change his password immediately.

In classes, we usually spend a reasonable amount of time searching through man pages and additional documentation, reading the friendly manuals, as it were. Now and then a student sees an amusing phrase or a few words that strike them as funny in the documentation and that’s when we take some time to show everyone how to use the grep command.

We all have some fun, there are any number of chuckles at what we find, and everyone ends up learning a lot about the use of grep and perhaps a few new ways to use its options.

Every attempt has been made to use this section as a learning tool by having the reader use grep to search for certain terms, the least disturbing having been chosen carefully. The ability to use grep and its options is essential to the exam.


Note

The use of the tail command and its options in the following examples is to limit the number of items of output so that it fits on a typical screen.


To search for the word “fool” in the additional documentation directories, use the following command (see Figure 6-3):

grep –ir fool /usr/share/doc | tail -n 15

Image

Figure 6-3 Output of the grep command search for fool

Notice that you got matches with the pattern “fool” as a whole word and as part of things like “fooled” or foolish.” Hit the “up” arrow and change the word “fool” to “foolish” and execute the command again (see Figure 6-4).

Image

Figure 6-4 Output of the grep command search for “foolish”

Now let’s consider that you want to just do a search for the exact word “fool,” not any variation of it, just that exact word. To tighten this up to show you lines that only contain the whole word, add the -w option (see Figure 6-5):

grep –irw fool /usr/share/doc | tail -n 15

Image

Figure 6-5 Output of the grep command search for “fool” as a whole word

And finally, let’s include a useful option to the grep command (grep -v) that allows you to see not the normal output of lines that match the item you are searching for, but a listing of the files that match the search. This is useful for then taking the output and sending it to an editor, stream editor, or other file transformation command (see Figure 6-6).

Image

Figure 6-6 Output of the grep command search with options to show only path/files

The grep command is versatile. You can discover more of its useful options via the man page or other documentation. The discussion in this section is more than enough to be useful and covers the items typically on the exam.


Note

As a final recommendation, try adding the -n option to show the line number of found items within that file, and also the -v option for showing you not lines that match your query, but those lines that do not match your query. This is useful when you know what you don’t want but don’t quite know what you want.


Expanding grep with egrep and fgrep

Image

The most important thing to know about egrep and fgrep is that they exist as commands primarily so you do not have to use grep -E and grep -F. Historically, there have been separate binaries, and indeed there are separate binaries on both the RPM-based distribution and the DPKG-based distribution we typically use.

The egrep command has many uses, but the main one we need to focus on is the ability to use egrep or grep -E to process search terms that feature operators, such as OR. For example, you may want to find lines in a very large /etc/passwd file that start with the letter “r” and the next letter is either “p” or “t” followed by any other letters. You can try the following:

egrep '^r(p|t) /etc/passwd

This search finds the following lines, if they exist on your learning system:

rpc:x:490:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
rtkit:x:492:490:RealtimeKit:/proc:/bin/false

You can also search using egrep or grep -E for any line in the /etc/passwd file that contains “false” or “nologin” with the command:

egrep '(false|nologin)' /etc/passwd

This command should return a number of output lines, all of which have either “false” or “nologin” somewhere in the line.

The fgrep command is similar in execution, but it’s essentially for using a file that contains a set of terms to be searched for, instead of having to specify them all separated by pipes. First create a file named filetosearch.txt and make it match the following output:

one
two
three
four
five
six
seven
eight
nine
ten

Then create a file named searchterms.txt and make it match the following output:

one
three
eight

Then run the following commands, which should produce the same output.

egrep '(one|three|eight)' filetosearch.txt
fgrep -f searchterms.txt foobartar.fu

Essentially, it’s easy to use fgrep to refer to a file for the discrete search terms you want to use; it is a much more elegant method than packing an egrep command line with a dozen or so search terms.

Don’t forget that you can use [] ranges, globs (*) of text and several other special characters to help you display what you want using the egrep and fgrep commands.

Using Regular Expressions and grep

Image

Using grep to find particular words and phrases can be difficult unless you use regular expressions. A regular expression has the capability to search for something that you don’t know exactly, either through partial strings or using the following special characters:

Image .—A period matches any single character and enforces that the character must exist (a.v is a three-letter regular expression).

Image ?—A question mark matches an optional item and is matched only once.

Image *—An asterisk matches from zero to many characters (a*v finds av, a2v, andv, and so on).

Image +—A plus sign means that the item must be matched once and can be matched many times.

Image {n}—A curly-bracketed number means that the item is matched n times.

Image {n,}—A curly-bracketed number followed by a comma means the item is matched n or more times.

Image {n,m}—A curly-bracketed pair of numbers separated by a comma matches from n to m times.

What’s the use for all of this? Try finding just the word “Kernel” in the source tree with the following command:

grep –rl Kernel /usr/share/doc | wc –l
138

The command finds 138 files that contain at least one match for Kernel. Now try finding just the word “Kernel” as a whole word with this command:

grep –rlw Kernel /usr/share/doc | wc –l
131

Now try the same command again, but modify it so that the word “Kernel” is searched for, but only followed by a period:

grep –rwl Kernel\. /usr/share/doc | wc –l
27

Now, let’s search for the word “silly” as the search pattern:

grep –rwl silly /usr/share/doc | wc –l
93

Run the command again with the context number set to 2 lines to get more information about what is being commented on:

grep -rwn -C2 silly /usr/share/doc

The output you see varies on different systems, but essentially you see consecutive lines preceded with the same filename and the number of the lines shown, so you see that there are two lines of context being shown above and below each found search term. This can be useful when trying to find a particular instance of a search term based on the lines around it.

You need to be familiar with how to use regular expressions on the exam, particularly how to find strings that start and end with a particular letter or letters but contain other text in between.

Another example of regular expressions in action is searching for a particular phrase or word, but not another that is similar.

The following file is watch.txt and contains the following lines:

01 The first sentence contains broad
02 The second contains bring
03 The third contains brush
04 The fourth has BRIDGE as the last word: bridge
broad 05 The fifth begins with BROAD
06 The sixth contains none of the four
07 This contains bringing, broadened, brushed

To find all the words that begin with “br” but exclude any that have the third letter as “i,” use the following command:

grep "\<br[^i]" watch.txt
01 The first sentence contains broad
03 The third contains brush
broad 05 The fifth begins with BROAD

The \< string just means that the word begins with those letters. The use of the [^i] characters is to find all but the letter “i” in that position. If you use a ^ in front of a search term inside a program such as vi, it searches at the front of a line, but using the ^ symbol inside a set of square brackets is to exclude that character from being found.

To find a set of words that ends with a certain set, use this command:

grep "ad\>" watch.txt
01 The first sentence contains broad
broad 05 The fifth begins with BROAD

As with the previous example, using the \> characters on the end of a search looks for words that end in that string.

Search strings that grep allows include

Image broad—Searches for exactly “broad,” but as part of other words (such as “broadway” or “broadening”) unless you use -w to cause broad to be searched for as a standalone word

Image ^broad—Searches for the word “broad” at the beginning of any line

Image broad$—Searches for the word “broad” at the end of the line

Image [bB]road—Searches for the words “broad” and “Broad.”

Image br[iou]ng—Searches for “bring,” “brong,” and “brung”

Image br[^i]ng—Searches for and returns all but “bring”

Image ^......$—Searches for any line that contains exactly six characters

Image [bB][rR]in[gG]—Searches for “Bring,” “BRing,” “BRinG,” or any combination thereof

Summary

In this chapter you learned about input and output streams, how to pipe data between programs, and how to write data from programs output into a file. We also covered the filtering of information that comes from programs, how to format that information to make it more useful, and how to use the various versions of grep and regular expressions to further separate out and find data.

Exam Preparation Tasks

As mentioned in the section “How to Use This Book” in the Introduction, you have a few choices for exam preparation: the exercises here, Chapter 21, “Final Preparation,” and the practice exams on the DVD.

Review All Key Topics

Review the most important topics in this chapter, noted with the Key Topics icon in the outer margin of the page. Table 6-3 lists a reference of these key topics and the page numbers on which each is found.

Image

Image

Image

Table 6-3 Key Topics for Chapter 6

Define Key Terms

Define the following key terms from this chapter and check your answers in the glossary:

pipe

redirect

convert

standard in

standard out

standard error

catenate

append

Review Questions

The answers to these review questions are in Appendix A.

1. When executing a command, you want the output, both standard error and standard out to not be displayed on the console. Which of the following accomplishes this? (Choose two.)

a. command1 1> /dev/null 2> /dev/null

b. command1 1+2> /dev/null

c. command1 12>> /dev/null

d. command1 stdout> /dev/null +stderr

e. command1 > /dev/null 2>&1

2. You want to add a log entry in a file named output.txt. This entry includes the output of a particular command that includes a timestamp and unique and critical data each time it is run. You don’t want to overwrite the current contents of the output.txt file. Which of the following operators causes the proper result to happen?

a. !<

b. >>

c. <>

d. >

e. <<

3. As the sysadmin of a financial organization, you receive a real-time feed of information that your web team uses to display financial data to customers. You must keep a log of the data you receive as well as send that data on for further processing.

Which of the following commands would you use to accomplish this?

a. tac

b. split

c. tee

d. branch

e. culvert

4. While sorting a file that has numbers at the beginning of the lines, you notice that sort seems to oddly order the lines, as follows:

1

11

12

20

What option to the sort command could you use to get a more human-friendly sorting of the file?

a. -n

b. -k

c. -t

d. -h

5. You are using vi or another text editor to write a technical article for your organization’s website, and the instructions from the site’s editor are to keep the word count to 500 and the character count to less than 2,000.

Which utility would you typically use to see these statistics for a given text file?

a. count

b. num

c. wc

d. pr

e. ed