Ubuntu Linux Toolbox: 1000+ Commands for Power Users (2013)

Chapter 5 Manipulating Text

IN THIS CHAPTER

· Matching text with regular expressions

· Editing text files with vi, JOE, or nano

· Using graphical text editors

· Listing text with cat, head, and tail

· Paging text with less and more

· Paginating text with pr

· Searching for text with grep

· Counting words, lines, and characters with wc

· Sorting output with sort

· Stream editing with sed, tr, cut, and awk

· Searching binaries for text with strings

· Finding differences in files with diff

· Converting text files with unix2dos/dos2unix

With only a shell available on the first UNIX systems (on which Linux was based), using those systems meant dealing primarily with commands and plain text files. Documents, program code, configuration files, e-mail, and almost anything you created or configured was represented by text files. To work with those files, early developers created many text manipulation tools.

Despite having graphical tools for working with text, most seasoned Linux users find command line tools to be more efficient and convenient. Text editors such as vi (vim), Emacs, JOE, nano, and Pico are available with most Linux distributions. Commands such as grep, sed, and awk can be used to find, and possibly change, pieces of information within text files.

This chapter shows you how to use many popular commands for working with text files in Ubuntu. It also explores some of the less common uses of text manipulation commands that you might find interesting.

Matching Text with Regular Expressions

Many of the tools for working with text enable you to use regular expressions, sometimes referred to as regex, to identify the text you are looking for based on some pattern. You can use these patterns to find text within a text editor or use them with search commands to scan multiple files for the strings of text you want.

A regex search pattern can include a specific string of text (as in a word such as Linux) or a location (such as the end of a line or the beginning of a word). It can also be specific (find just the word hello) or more inclusive (find any word beginning with h and ending with o).

Appendix B includes reference information for shell metacharacters that can be used in conjunction with regular expressions to do the exact kinds of matches you are looking for. This section shows examples of using regular expressions with several different tools you encounter throughout this chapter.

The list that follows shows some examples that use basic regular expressions to match text strings.

Many different types of regular expressions are used in examples throughout this chapter. Keep in mind that not every command that incorporates regex uses its features the same way.

Expression Matches

a* a, ab, abc, and aecjejich

^a Any "a" appearing at the beginning of a line

*a$ Any "a" appearing at the end of a line

a.c Three-character strings that begin with a and end with c

[bcf]at bat, cat, or fat

[a-d]at aat, bat, cat, dat, but not Aat, Bat, and so on

[A-D]at Aat, Bat, Cat, and Dat, but not aat, bat, and so on

1[3-5]7 137, 147, and 157

\tHello A tab character preceding the word Hello

\.[tT][xX][Tt] .txt, .TXT, .TxT, or other case combinations

Editing Text Files

There are many text editors in the Linux/UNIX world. The editor that is most common is vi, which can be found virtually on any UNIX or Linux system available today. That is why knowing how to at least make minor file edits in vi is a critical skill for any Linux administrator. One day, if you find yourself in a minimalist, foreign Linux environment trying to bring a server back online, vi is the tool that will almost always be there.

On Ubuntu, make sure you have the vim-enhanced package installed. Vim (Vi IMproved) with the vim-enhanced package will provide the most up-to-date, feature-rich, and user-friendly vi editor. For more details about using vi, refer to Appendix A.

Note Ubuntu installs vim by default.

Traditionally, the other popular UNIX text editor has been Emacs and its more graphical variant, XEmacs. Emacs is a powerful multi-function tool that can also act as a mail/news reader or shell and perform other functions. Emacs is also known for its very complex series of keyboard shortcuts that require three arms to execute properly.

In the mid-90s, Emacs was ahead of vi in terms of features. Now that Vim is widely available, both can provide all the text editing features you’ll ever need. If you are not already familiar with either vi or Emacs, I recommend you start by learning vi.

Many other command line and GUI text editors are available for Linux. Text-based editors that you may find to be simpler than vi and Emacs include JED, JOE, and nano. Start any of those editors by typing its command name, optionally followed by the filename you want to edit. The following sections offer some quick descriptions of how to use each of those editors.

Using the JOE Editor

If you have used classic word processors such as WordStar that worked with text files, you might be comfortable with the JOE editor. To use JOE, install the joe package. To use the spell checker in JOE, make sure the Aspell package is installed. (Ubuntu installs Aspell by default.) To install JOE, run the following command:

$ sudo apt-get install joe

With JOE, instead of entering a command or text mode, you are always ready to type. To move around in the file, you can use control characters or the arrow keys. To open a text file for editing, just type joe and the filename or use some of the following options:

$ joe memo.txt Open memo.txt for editing

$ joe -wordwrap memo.txt Turn on wordwrap while editing

$ joe -lmargin 5 -tab 5 memo.txt Set left margin to 5 and tab to 5

$ joe +25 memo.txt Begin editing on line 25

To add text, just begin typing. You can use keyboard shortcuts for many functions. Use arrow keys to move the cursor left, right, up, or down. Use the Delete key to delete text under the cursor or the Backspace key to erase text to the left of the cursor. Press Enter to add a line break. Press Ctrl+k+h to see the help screen. The following list shows the most commonly used control keys for editing in JOE.

Key Combo Result

Cursor

Ctrl+b Left

Ctrl+p Up

Ctrl+f Right

Ctrl+n Down

Ctrl+z Previous word

Ctrl+x Next word

Search

Ctrl+k+f Find text

Ctrl+l Find next

Block

Ctrl+k+b Begin

Ctrl+k+k End

Ctrl+k+m Move block

Ctrl+k+c Copy block

Ctrl+k+w Write block to file

Ctrl+k+y Delete block

Ctrl+k+/ Filter

Misc

Ctrl+k+a Center line

Ctrl+t Options

Ctrl+r Refresh

File

Ctrl+k+e Open new file to edit

Ctrl+k+r Insert file at cursor

Ctrl+k+d Save

Goto

Ctrl+u Previous screen

Ctrl+v Next screen

Ctrl+a Line beginning

Ctrl+e End of line

Ctrl+k+u Top of file

Ctrl+k+v End of file

Ctrl+k+l To line number

Delete

Ctrl+d Delete character

Ctrl+y Delete line

Ctrl+w Delete word right

Ctrl+o Delete word left

Ctrl+j Delete line to right

Ctrl+- Undo

Ctrl+6 Redo

Exit

Ctrl+k+x Save and quit

Ctrl+c Abort

Ctrl+k+z Shell

Ctrl+[+n Word

Ctrl+[+l File

Using the Pico and nano Editors

Pico is a popular, very small text editor, distributed as part of the Pine e-mail client. Although Pico is free, it is not truly open source. Therefore, many Linux distributions, including Ubuntu, don’t offer Pico. Instead, they offer an open source clone of Pico called nano (nano’s another editor). This section describes the nano editor.

Note Ubuntu links the command pico to the program for the nano editor.

Nano (represented by the nano command) is a compact text editor that runs from the shell, but is screen-oriented (owing to the fact that it is based on the curses library). Nano is popular with those who formerly used the Pine e-mail client because nano’s editing features are the same as those used by Pine’s Pico editor.

On the rare occasion that you don’t have the vi editor available on a Linux system (such as when installing a minimal Gentoo Linux), nano is almost always available. Ubuntu installs nano by default. You need the spell command, rather than aspell, to perform a spelling check within nano.

As with the JOE editor, instead of having command and typing modes, you can just begin typing. To open a text file for editing, just type nano and the filename, or use some of the following options:

$ nano memo.txt Open memo.txt for editing

$ nano -B memo.txt When saving, back up previous to ~.filename

$ nano -m memo.txt Turn on mouse to move cursor (if supported)

$ nano +83 memo.txt Begin editing on line 83

The -m command line option turns on support for a mouse. You can use the mouse to select a position in the text, and the cursor moves to that position. After the first click, however, nano uses the mouse to mark a block of text, which may not be what you are expecting.

As with JOE, to add text, just begin typing. Use arrow keys to move the cursor left, right, up, or down. Use the Delete key to delete text under the cursor or the Backspace key to erase text to the left of the cursor. Press Enter to add a line break. Press Ctrl+g to read help text. The following list shows the control codes for nano that are described on the help screen.

Control Function Description

Code Key

Ctrl+g F1 Show help text. (Press Ctrl+x to exit help.)

Ctrl+x F2 Exit nano (or close the current file buffer).

Ctrl+o F3 Save the current file.

Ctrl+j F4 Justify the current text in the current paragraph.

Ctrl+r F5 Insert a file into the current file.

Ctrl+w F6 Search for text.

Ctrl+y F7 Go to the previous screen.

Ctrl+v F8 Go to the next screen.

Ctrl+k F9 Cut (and store) the current line or marked text.

Ctrl+u F10 Uncut (paste) the previously cut line into the file.

Ctrl+c F11 Display the current cursor position.

Ctrl+t F12 Start spell checking.

Ctrl+- Go to selected line and column numbers.

Ctrl+\ Search and replace text.

Ctrl+6 Mark text, starting at the cursor (Ctrl+6 to unset

mark).

Ctrl+f Go forward one character.

Ctrl+b Go back one character.

Ctrl+Spacebar Go forward one word.

Alt+Spacebar Go backward one word.

Ctrl+p Go to the previous line.

Ctrl+n Go to the next line.

Ctrl+a Go to the beginning of the current line.

Ctrl+e Go to the end of the current line.

Alt+( Go to the beginning of the current paragraph.

Alt+) Go to the end of the current paragraph.

Alt+\ Go to the first line of the file.

Alt+/ Go to the last line of the file.

Alt+] Go to the bracket matching the current bracket.

Alt+= Scroll down one line.

Alt+- Scroll up the line.

Graphical Text Editors

Just because you are editing text doesn’t mean you have to use a text-based editor. The main advantage of using a graphical text editor is that you can use a mouse to select menus, highlight text, cut and copy text, or run special plug-ins.

You can expect to have the GNOME text editor (gedit) if your Linux system has the GNOME desktop installed. Features in gedit enable you to check spelling, list document statistics, change display fonts and colors, and print your documents. The KDE desktop also has its own KDE text editor (kedit in the kdeutils package). It includes similar features to the GNOME text editor, along with a few extras, such as the ability to send the current document with KMail or another user-configurable KDE component.

Vim itself comes with an X GUI version. It is launched with the gvim command, which is part of the vim-X11 package. If you’d like to turn GUI Vim into a more user-friendly text editor, you can download a third-party configuration called Cream. You can install it by typing sudo apt-get install cream.

Other text editors you can install include nedit (with features for using macros and executing shell commands and aimed at software developers) and leafpad (which is similar to the Windows Notepad text editor). The Scribes text editor (scribes) includes some advanced features for automatic correction, replacement, indentation, and word completion.

Listing, Sorting, and Changing Text

Instead of just editing a single text file, you can use a variety of Linux commands to display, search, and manipulate the contents of one or more text files at a time.

Listing Text Files

The most basic method to display the contents of a text file is with the cat command. The cat command concatenates (in other words, outputs as a string of characters) the contents of a text file to your display (by default). You can then use different shell metacharacters to direct the contents of that file in different ways. For example:

$ cat myfile.txt Send entire file to the screen

$ cat myfile.txt > copy.txt Direct file contents to a file

$ cat myfile.txt >> myotherfile.txt Append file contents to a file

$ cat -s myfile.txt Show consecutive blanks as one

$ cat -n myfile.txt Show line numbers with output

$ cat -b myfile.txt Show line numbers on non-blanks

However, if your block of text is more than a few lines long, using cat by itself becomes impractical. That’s when you need better tools to look at the beginning or the end, or page through the entire text.

To view the top of a file, use head:

$ head myfile.txt

$ cat myfile.txt | head

Both of these command lines use the head command to output the top ten lines of the file. You can specify the line count as a parameter to display any number of lines from the beginning of a file. For example:

$ head -n 50 myfile.txt Show the first 50 lines of a file

$ ps auwx | head -n 15 Show the first 15 lines of ps output

This can also be done using this obsolete (but shorter) syntax:

$ head -50 myfile.txt

$ ps auwx | head -15

You can use the tail command in a similar way to view the end of a file:

$ tail -n 15 myfile.txt Display the last 15 lines in a file

$ tail -15 myfile.txt Display the last 15 lines in a file

$ ps auwx | tail -n 15 Display the last 15 lines of ps output

The tail command can also be used to continuously watch the end of a file as the file is written to by another program. This is very useful for reading live log files when troubleshooting Apache, sendmail, or many other system services (some of these logs won’t appear unless the application is installed):

# tail -f /var/log/messages Watch system messages live

# tail -f /var/log/mail.log Watch mail server messages live

# tail -f /var/log/httpd/access_log Watch web server messages live

You can press Ctrl+c to end the tail -f command.

Paging through Text

When you have a large chunk of text and need to get to more than just its beginning or end, you need a tool to page through the text. The original UNIX system pager was the more command:

$ ps auwx | more Page through the output of ps (press spacebar)

$ more myfile.txt Page through the contents of a file

However, more has some limitations. For example, in the line with ps above, more could not scroll up. The less command was created as a more powerful and user-friendly more. The common saying when less was introduced was: “What is less? less is more!” I recommend you no longer use more, and use less instead.

Note The less command has another benefit worth noting. Unlike text editors such as vi, it does not read the entire file when it starts. This results in faster start-up times when viewing large files.

The less command can be used with the same syntax as more in the previous examples:

$ ps auwx | less Page through the output of ps

$ cat myfile.txt | less Page through the contents of a file

$ less myfile.txt Page through a text file

The less command enables you to navigateusing the up and down arrow keys, PageUp, PageDown, and the spacebar. If you are using less on a file (not standard input), press v to open the current file in an editor. Which editor gets launched is determined by environment variables defined for your account. The editor is taken from the environment variable VISUAL, if defined, or EDITOR if VISUAL is not defined. If neither is defined, less invokes the JOE editor on Ubuntu.

Note Other versions of Linux invoke vi as the default editor in this case.

Press Ctrl+c to interrupt that mode. As in vi, while viewing a file with less, you can search for a string by pressing / (forward slash) followed by the string and Enter. To search for further occurrences, press / and Enter repeatedly.

To scroll forward and back while using less, use the F and B keys, respectively. For example, 10f scrolls forward 10 lines and 15b scrolls back 15 lines. Type d to scroll down half a screen and u to scroll up half a screen.

Paginating Text Files with pr

The pr command provides a quick way to format a bunch of text into a form where it can be printed. This can be particularly useful if you want to print the results of some commands without having to open up a word processor or text editor. With pr, you can format text into pages with header information such as date, time, filename, and page number. Here is an example:

$ dpkg-query -l | sort | pr --column=2 | less Paginate list in 2 cols

In this example, the dpkg-query -l command lists all software packages installed on your system and pipes that list to the sort command, to be sorted alphabetically. Next, that list is piped to the pr command, which converts the single-column list into two columns (--columns=2) and paginates it. Finally, the less command enables you to page through the text.

Instead of paging through the output, you can send the output to a file or to a printer. Here are examples of that:

$ dpkg-query -l | sort | pr --column=2 > pkg.txt Send pr output to file

$ dpkg-query -l | sort | pr --column=2 | lpr Print pr output

Other text manipulation you can do with the pr command includes double-spacing the text (-d), showing control characters (-c), or offsetting the text a certain number of spaces from the left margin (for example, -o 5 to indent five spaces from the left).

Searching for Text with grep

The grep command comes in handy when you need to perform more advanced string searches in a file. In fact, the phrase “to grep” has actually entered the computer jargon as a verb, just as “to Google” has entered the popular language. Here are examples of the grep command:

$ grep Remote /etc/services Show lines containing Remote

# grep sudo /var/log/auth.log Show lines containing 404

$ ps auwx | grep init Show init lines from ps output

$ ps auwx | grep "\[*\]" Show bracketed commands

$ dmesg | grep "[ ]ata\|^ata" Show ata kernel device information

These command lines have some particular uses, beyond being examples of the grep command. By searching auth.log for sudo, you can see when the sudo command was run, and what it was set to run. Displaying bracketed commands that are output from the ps command is a way to see commands for which ps cannot display options. The last command checks the kernel buffer ring for any ATA device information, such as hard disks and CD-ROM drives.

The grep command can also recursively search a few or a whole lot of files at the same time. If you have the apache2 package installed, the following command recursively searches files in the /etc/apache2/sites-enabled and /etc/apache2/conf.d directories for the string VirtualHost:

$ grep -R VirtualHost /etc/apache2/conf.d /etc/apache2/sites-enabled

Add line numbers (-n) to your grep command to find the exact lines where the search terms occur:

# grep -Rn VirtualHost /etc/apache2/*conf*

By default, search terms are displayed in color on each line found. To explicitly ask to colorize the searched term in the search results, add the --color option:

# grep --color -Rn VirtualHost /etc/apache2/*conf*

By default, in a multi-file search, the filename is displayed for each search result. Use the -h option to disable the display of filenames. This example searches for the string sshd in the file auth.log:

# grep -h sshd /var/log/auth.log

If you want to ignore case when you search messages, use the -i option:

$ grep -i acpi /var/log/dmesg Search file for acpi (any case)

To display only the name of the file that includes the search term, add the -l option:

$ grep -Rl VirtualHost /etc/apache2

To display all lines that do not match the string, add the -v option:

$ grep -v " 200 " /var/log/apache2/access_* Show lines without " 200 "

Checking Word Counts with wc

There are times when you need to know the number of lines that match a search string. The wc command can be used to count the lines that it receives. For example, the following command lists how many hits in an Apache log file come from a specific IP address:

$ grep 192.198.1.1 /var/log/apache2/access.log | wc -l

The wc command has other uses as well. By default, wcprints the number of lines, words, and bytes in a file:

$ wc /var/log/dmesg List counts for a single file

436 3847 27984 /var/log/dmesg

$ wc /var/log/*.log List single/totals for many files

305 3764 25772 /var/log/auth.log

780 3517 36647 /var/log/bootstrap.log

350 4405 39042 /var/log/daemon.log

10109 60654 669687 /var/log/dpkg.log

71 419 4095 /var/log/fontconfig.log

1451 19860 135252 /var/log/kern.log

0 0 0 /var/log/lpr.log

0 0 0 /var/log/mail.log

0 0 0 /var/log/pycentral.log

0 0 0 /var/log/scrollkeeper.log

108 1610 13864 /var/log/user.log

0 0 0 /var/log/uucp.log

12 43 308 /var/log/wvdialconf.log

890 6717 46110 /var/log/Xorg.0.log

14076 100989 970777 total

Sorting Output with sort

It can also be useful to sort the contents of a file or the output of a command. This can be helpful in bringing order to disorderly output. The following examples list service names and numbers from the /etc/services file and sorts the results in alphanumeric order (forward and reverse):

$ cat /etc/services | sort Sort in alphanumeric order

$ cat /etc/services | sort -r Sort in reverse alphanumeric order

The following command sorts processes based on descending memory usage (fourth field of ps output). The -k option specifies the key field to use for sorting. 4,4 indicates that the fourth field, and only the fourth field, is a key field.

$ ps auwx | sort -r -k 4,4 | less

The following command line sorts loaded kernel modules in increasing size order. The n option tells sort to treat the second field as a number and not a string:

$ lsmod | sort -k 2,2n

Finding Text in Binaries with Strings

Sometimes you need to read the ASCII text that is inside a binary file. Occasionally, you can learn a lot about an executable that way. For those occurrences, use strings to extract all the human-readable ASCII text. The stringscommand is part of the binutils package and is installed by default on Ubuntu. Here are some examples:

$ strings /bin/ls | grep -i libc Find occurrences of libc in ls

$ cat /bin/ls | strings List all ASCII text in ls

$ strings /bin/ls List all ASCII text in ls

$ strings /usr/sbin/sshd | grep libwrap List TCP wrapper library

Replacing Text with sed

Finding text within a file is sometimes the first step toward replacing text. Editing streams of text is done using the sed command. The sed command is actually a full-blown scripting language. The examples in this chapter cover basic text replacement with the sed command.

If you are familiar with text replacement commands in vi, sed has some similarities. In the following example, you would replace only the first occurrence per line of tcp with TEST. Here, sed takes its input from a pipe, while sending its output to stdout (your screen):

$ cat /etc/services | sed s/tcp/TEST/ | head -n 20

TESTmux 1/tcp # TCP port service multiplexer

echo 7/TEST

echo 7/udp

discard 9/TEST sink null

Adding a g to the end of the substitution line, as in the following command, causes every occurrence of tcp to be changed to TEST. Also, in the following example, input is directed from the file myfile.txt and output is directed to mynewfile.txt:

$ sed s/tcp/TEST/g < /etc/services > mynewfile.txt

$ head -20 mynewfile.txt

TESTmux 1/TEST # TCP port service multiplexer

echo 7/TEST

echo 7/udp

discard 9/TEST sink null

To make the sed command case-insensitive, add the i option to the command line:

$ cat /etc/services | sed s/tcp/TEST/gi | head -n 20

TESTmux 1/TEST # TEST port service multiplexer

The next example changes the first occurrences of the text /home/chris to /home2/chris from the /etc/passwd file. (Note that this command does not change that file, but outputs the changed text.) This is useful when user accounts are migrated to a new directory (presumably on a new disk), named with much deliberation, home2. Here, you have to use quotes and backslashes to escape the forward slashes so they are not interpreted as delimiters:

$ sed 's/\/home\/chris/\/home2\/chris/g' < /etc/passwd | grep chris

chris:x:1000:1000:Chris Negus,,,:/home2/chris:/bin/bash

Although the forward slash is the sed command’s default delimiter, you can change the delimiter to any other character of your choice. Changing the delimiter can make your life easier when the string contains slashes. For example, the previous command line that contains a path could be replaced with this command:

$ sed 's./home/chris./home2/chris.g' < /etc/passwd | grep chris

chris:x:1000:1000:Chris Negus,,,:/home2/chris:/bin/bash

In the line shown, a period (.) is used as the delimiter.

The sed command can run multiple substitutions at once by preceding each one with -e. Here, in the text streaming from /etc/services, all occurrences of tcp are changed to LOWER and occurrences of TCP are changed to UPPER:

$ sed -e s/tcp/LOWER/g -e s/TCP/UPPER/g /etc/services | head -n 20

LOWERmux 1/LOWER # UPPER port service multiplexer

echo 7/LOWER

echo 7/udp

discard 9/LOWER sink null

You can use sed to add newline characters to a stream of text. Where Enter appears, press the Enter key. The > on the second line is generated by bash, not typed in.

$ echo aaabccc | sed 's/b/\Enter

> /'

aaa

ccc

The trick just shown does not work on the left side of the sed substitution command. When you need to substitute newline characters, it’s easier to use the tr command.

Translating or Removing Characters with tr

The tr command is an easy way to do simple character translations on the fly. In the following example, new lines are replaced with spaces, so all the files listed from the current directory are output on one line:

$ ls | tr '\n' ' ' Replace newline characters with spaces

The tr command can be used to replace one character with another, but does not work with strings like sed does. The following command replaces all instances of the lowercase letter f with a capital F.

$ tr f F < /etc/services Replace every f in the file with F

You can also use the tr command to simply delete characters. Here are two examples:

$ ls | tr -d '\n' Delete new lines (resulting in one line)

$ tr -d f < /etc/services Delete every letter f from the file

The tr command can do some nifty tricks when you specify ranges of characters to work on. Here’s an example of changing lowercase letters to uppercase letters:

$ echo chris | tr a-z A-Z Translate chris into CHRIS

CHRIS

The same result can be obtained with the following syntax:

$ echo chris | tr '[:lower:]' '[:upper:]' Translate chris into CHRIS

Checking Differences between Two Files with diff

When you have two versions of a file, it can be useful to know the differences between the two files. For example, when upgrading a software package, you may save your old configuration file under a new filename, such as config.old or config.bak, so you preserve your configuration. When that occurs, you can use the diff command to discover which lines differ between your configuration and the new configuration in order to merge the two. For example:

$ diff config config.old

You can change the output of diff to what is known as unified format. Unified format can be easier to read by human beings. It adds three lines of context before and after each block of changed lines that it reports, and then uses + and - to show the difference between the files. The following set of commands creates a file (f1.txt) containing a sequence of numbers (1-7), creates a file (f2.txt) with one of those numbers changed (using sed), and compares the two files using the diff command:

$ seq 1 7 > f1.txt Send a sequence of numbers to f1.txt

$ cat f1.txt Display contents of f1.txt

$ sed s/4/FOUR/ < f1.txt > f2.txt Change 4 to FOUR and send to f2.txt

$ diff f1.txt f2.txt

4c4 Shows line 4 was changed in file

< 4

---

> FOUR

$ diff -u f1.txt f2.txt Display unified output of diff

--- f1.txt 2007-09-07 18:26:06.000000000 -0500

+++ f2.txt 2007-09-07 18:26:39.000000000 -0500

@@ -1,7 +1,7 @@

-4

+FOUR

The diff -u output just displayed adds information such as modification dates and times to the regular diff output. The sdiff command can be used to give you yet another view. The sdiff command can merge the output of twofiles interactively, as shown in the following output:

$ sdiff f1.txt f2.txt

1 1

2 2

3 3

4 | FOUR

5 5

6 6

7 7

Another variation on the diff theme is vimdiff, which opens the two files side by side in Vim and outlines the differences in color. Similarly, gvimdiff opens the two files in gVim.

Note You need to install the vim-gnome package to run the gvim or gvimdiff program.

The output of diff -u can be fed into the patch command. The patch command takes an old file and a diff file as input and outputs a patched file. Following on the previous example, I use the diff command between the two files to generate a patch and then apply the patch to the first file:

$ diff -u f1.txt f2.txt > patchfile.txt

$ patch f1.txt < patchfile.txt

patching file f1.txt

$ cat f1.txt

FOUR

That is how many OSS developers (including kernel developers) distribute their code patches. The patch and diff commands can also be run on entire directory trees. For example, this set of commands recursively copies the /etc/pam.d directory to /tmp/newpam.d and /tmp/oldpam.d, and then makes changes to two files in the newpam.d directory:

# cp -r /etc/pam.d /tmp/oldpam.d

# cp -r /etc/pam.d /tmp/newpam.d

# echo hello >> /tmp/newpam.d/atd

# echo goodbye >> /tmp/newpam.d/sshd

# diff -r /tmp/newpam.d/ /tmp/oldpam.d/

diff -r /tmp/newpam.d/atd /tmp/oldpam.d/atd

10d9

< hello

diff -r /tmp/newpam.d/sshd /tmp/oldpam.d/sshd

40d39

< goodbye

By running a recursive diff (diff -r) on the directories, you can see that hello appears in the /tmp/newpam.d/atd file but not in the /tmp/oldpam.d/atd file (< hello). Likewise, the word goodbye is only in the /tmp/newpam.d/sshdfile.

If the output is too much detail for you, and you only want to see which files changed and not how, you can add the -q option:

# diff -rq /tmp/newpam.d/ /tmp/oldpam.d/

Files /tmp/newpam.d/atd and /tmp/oldpam.d/atd differ

Files /tmp/newpam.d/sshd and /tmp/oldpam.d/sshd differ

Using diff -r is a good technique to keep track of changes to configuration files or software projects. For example, if you copied all your /etc files to another directory, and then later ran diff -rq on the two directories, you could see which configuration files have changed since you made the copy.

Using awk and cut to Process Columns

Another massive text processing tool is the awk command. The awk command is a full-blown programming language. Although there is much more you can do with the awk command, the following examples show you a few tricks related to extracting columns of text:

$ ps auwx | awk '{print $1,$11}' Show columns 1, 11 of ps

$ ps auwx | awk '/chris/ {print $11}' Show chris' processes

$ ps auwx | grep chris | awk '{print $11}' Same as above

The first example displays the contents of the first column (username) and eleventh column (command name) from currently running processes output from the ps command (ps auwx). The next two commands produce the same output, with one using the awk command and the other using the grep command to find all processes owned by the user named chris. In each case, when processes owned by chris are found, column 11 (command name) is displayed for each of those processes.

By default, the awk command assumes the delimiter between columns is spaces. You can specify a different delimiter with the -F option as follows:

$ awk -F: '{print $1,$5}' /etc/passwd Use colon delimiter to print cols

You can get similar results with the cut command. As with the previous awk example, you can specify a colon (:) as the column delimiter to process information from the /etc/passwd file:

$ cut -d: -f1,5 /etc/passwd Use colon delimiter to print cols

The cut command can also be used with ranges of fields. The following command prints columns 1 thru 5 of the /etc/passwd file:

$ cut -d: -f1-5 /etc/passwd Show columns 1 through 5

Instead of using a dash (-) to indicate a range of numbers, you can use it to print all columns from a particular column number and above. The following command displays all columns from column 5 and above from the /etc/passwd file:

$ cut -d: -f5- /etc/passwd Show columns 5 and later

I prefer to use the awk command when columns are separated by a varying number of spaces, such as the output of the ps command. And I prefer the cut command when dealing with files delimited by commas (,) or colons (:), such as the /etc/passwd file.

Converting Text Files to Different Formats

Text files in the UNIX world use a different end-of-line character (\n) than those used in the DOS/Windows world (\r\n). You can view these special characters in a text file with the od command:

$ echo hello > myunixfile.txt

$ od -c -t x1 myunixfile.txt

0000000 h e l l o \n

68 65 6c 6c 6f 0a

0000006

It is necessary to convert the files so they will appear properly when copied from one environment to the other. The dos2unix package contains several tools for converting files between formats (type apt-get install dos2unix to get the package). Here are some examples:

$ unix2dos < myunixfile.txt > mydosfile.txt

$ cat mydosfile.txt | dos2unix > myunixfile.txt

The unix2dos example just shown converts a Linux or UNIX plain text file (myunixfile.txt) to a DOS or Windows text file (mydosfile.txt). The dos2unix example does the opposite by converting a DOS/Windows file to a Linux/UNIX file.

Summary

Linux and UNIX systems traditionally use plain text files for system configuration, documentation, output from commands, and many forms of stored information. As a result, many commands have been created to search, edit, and otherwise manipulate plain text files. Even with today’s GUI interfaces, the ability to manipulate plain text files is critical to becoming a power Linux user.

This chapter explores some of the most popular commands for working with plain text files in Linux. Those commands include text editors (such as vi, nano, and JOE), as well as commands that can edit streaming data (such as sed and awk commands). There are also commands for sorting text (sort), counting text (wc), and translating characters in text (tr).