Working with Text Files - Beginning the Linux Command Line, Second edition (2015)

Beginning the Linux Command Line, Second edition (2015)

CHAPTER 4. Working with Text Files

An important part of working on the Linux command line consists of working with text files. If you need to configure services, they’ll store their configuration in text files. If you need to write program code, you’ll do that in a text file as well. Linux text files are all over your computer, and to be good at the Linux command line, you’ll have to know how to handle them. In this chapter, you’ll learn how to work with text files. Different methods are discussed for manipulating the contents of them. First, you’ll learn about the only editor that matters on Linux, Vi. Next, I’ll explain different ways of displaying the contents of text files. After that, we’ll talk about some useful utilities that help you in sorting and comparing the contents of different text files—and more. You’ll then learn how regular expressions can help you in finding text patterns in a file in a clever way. You’ll also read how the programmable filters sed and awk can help you batch-manipulate text files. At the end of this chapter, you’ll also get familiar with some of the most useful commands in command-line printing.

Working with Vi

For your day-to-day management tasks from the command line, you’ll often need a text editor to change ASCII text files. Although many editors are available for Linux, Vi is still the most popular and probably the most used editor as well. It is a rather complicated editor, however, and most Linux distributions fortunately include Vim, which stands for Vi Improved, the user- friendly version of Vi. When talking about Vi in this book, I’ll assume that you are using Vim.

Image Note Most distributions use Vim, not Vi, and will start Vim when you enter the command vi. Clear, huh? If the commands that I describe in this chapter don’t work for you, you’re probably working with Vi, not Vim. In that case, use the following command as root: echo alias vi=vim >> /etc/profile. This makes sure that after the next time you log in to your computer, Vim is started, not Vi.

Even if Vi looks quite difficult to first time users, seen in its historical context, it was quite an improvement in the year 1976 when it was invented. In those days, only line editors such as ex were available. These editors didn’t give a complete overview of a text file a user was working with, but just the current line the user was at, like an old typewriter. Vi, which stands for visual, was the first editor that worked in a mode where the complete text file was displayed which made it possible to move back and forward between lines. To do this, different commands were introduced to make it possible to address individual lines, commands that are still used in modern vi.

Everyone who wants to work from the Linux command line should be capable of working with Vi. Why? You’ll find it on every Linux distribution and every version of UNIX. Another important reason why you should get used to working with Vi is that some other commands, especially commands that are important for a Linux administrator, are based on it. For example, to edit quota (which limits available disk space) for the users on your server, you would use edquota, which is just a macro built on Vi. If you want to set permissions for the sudo command, use visudowhich, as you likely guessed, is another macro that is built on top of Vi. Or if you want to schedule a task to run at a given moment in time, use crontab -e, which is based on Vi as well.

Image Note Well, to tell you the truth, there is a variable setting. The name of the variable is VISUAL. Only when this variable is set to vi (VISUAL=vi) will commands like edquota and visudo use Vi. If it is set to something else, they will use that something else instead. This is how on Ubuntu for instance nano is used as the default editor to accomplish many tasks.

In this section, I’ll provide a minimal amount of information that you need to work with Vi. The goal here is just to get you started. You’ll learn more about Vi if you really start working with it on a daily basis. Some people walk around with vi cheat sheets, containing long lists of commands that can be used with vi. I don’t want to scare you away immediately, which is why I rather focus on the essential commands that help you doing what needs to be done.

Vi Modes

Vi uses two modes: command mode, which is used to enter new commands, and insert mode (also referred to as the input mode), which is used to enter text. Before being able to enter text, you need to enter insert mode, because, as its name suggests, command mode will just allow you to enter commands. Notice that these commands also include cursor movement. As mentioned before, Vi has some pretty useful commands to address specific lines of text. While working with Vi, there often are several options for doing what needs to be done. This adds to the perception of Vi as being difficult, there’s just too much to remember. The best way to work with all those different options, is by focussing on one option only. It makes no sense knowing five different commands that help you doing the exact same thing. For example, you can use many methods to enter insert mode. I’ll list just four of them:

· Press i to insert text at the current position of the cursor.

· Use a to append text after the current position of the cursor.

· Use o to open a new line under the current position of the cursor (my favorite option).

· Use O to open a new line above the current position of the cursor.

After entering insert mode, you can enter text, and Vi will work just like any other editor. Now if you want to save your work, you should next get back to command mode and use the appropriate commands. Pressing the Esc key returns you to command mode from insert mode, and command mode is where you want to be to save text files.

Image Tip When starting Vi, always give as an argument the name of the file you want to create with it or the name of an existing file you would like to modify. If you don’t do that, Vi will display help text, and you will have the problem of finding out how to get out of this help text. Of course, you can always just read the entire help text to find out how that works (or just type :q to get out there).

Saving and Quitting

After activating command mode, you can use commands to save your work. The most common method is to use the :wq! command, which performs several tasks at once. First, a colon is used just because it is part of the command. Then, w is used to save the text you have typed so far. Because no file name is specified after the w, the text will be saved under the same file name that was used when opening the file. If you want to save it under a new file name, just enter the new name after the :w command (not that you have to start the command with a colon also); for instance, the following would save your file with the name newfile:

:w newfile

Next in the :wq! command is q, which makes sure that the editor is quit as well. Last, the exclamation mark tells Vi that it shouldn’t complain, but just do its work. Vi has a tendency to get smart with remarks like “A file with this name already exists” (see Listing 4-1), so you are probably going to like the exclamation mark. After all, this is Linux, and you want your Linux system to do as you tell it, not to second-guess you all the time. If you rather want to see warnings like this, just omit typing the ! and just use :wq to wite your file to disk.

Listing 4-1. Vi Will Tell You If It Doesn’t Understand What You Want It to Do

#
# hosts This file describes a number of hostname-to-address
# mappings for the TCP/IP subsystem. It is mostly
# used at boot time, when no name servers are running.
# On small systems, this file can be used instead of a
# "named" name server.
# Syntax:
#
# IP-Address Full-Qualified-Hostname Short-Hostname
#

127.0.0.1 localhost

# special IPv6 addresses
::1 localhost ipv6-localhost ipv6-loopback

fe00::0 ipv6-localnet

ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts
127.0.0.2 nuuk.sander.gl nuuk
E13: File exists (add ! to override) 1,1 All

As you have just learned, you can use :wq! to write and quit Vi. You can also use the parts of this command separately. For example, use :w if you just want to write changes while working on a file without quitting it, or use :q! to quit the file without writing changes. The latter option is a nice panic key if something has happened that you absolutely don’t want to store on your system. This is useful because Vi will sometimes work magic with the content of your file when you hit the wrong keys. Alternatively, you can recover by using the u command to undo the most recent changes you made to the file.

Cutting, Copying, and Pasting

You don’t need a graphical interface to use cut, copy, and paste features; Vi could do this back in the ’70s. But you have two ways of using cut, copy, and paste: the easy way and the hard way. If you want to do it the easy way, you can use the v command to enter visual mode, from which you can select a block of text by using the arrow keys. After selecting the block, you can cut, copy, and paste it.

· Use d to cut (in fact, delete) the selection. This will remove the selection and place it in a buffer.

· Use y to copy the selection to the area designated for that purpose in your server’s memory.

· Use p to paste the selection. This will copy the selection you have just placed in the reserved area of your server’s memory back into your document. It will always paste the selection at the cursor’s current position.

Deleting Text

Deleting text is another thing you’ll have to do often when working with Vi, and you can use many different methods to delete text. The easiest, however, is from insert mode: just use the Delete key to delete any text. This works in the exact same way as in a word processor. As usual, you have some options from Vi command mode as well:

· Use x to delete a single character. This has the same effect as using the Delete key while in insert mode.

· Use dw to delete the rest of the word. That is, dw will delete everything from the cursor’s current position of the end of the word.

· Use d$ to delete from the current cursor position until the end of the line.

· Use d to delete the current selection.

· Use dd to delete a complete line. This is a very useful option that you will probably like a lot.

Moving Through Text Files

Vi also offers some possibilities to move through text files. The following commands are used to search for text and to manipulate your cursor through a text file:

· Use the g key twice to go to the beginning of a text file.

· By using the G key twice, you can go directly to the end of a text file.

· To search text, you can use /, followed by the text you are searching. For instance, the command /root would find the first occurrence of the text root in the current file. This command would search from the current position down in the text file. To repeat this search action, use n (for next). To repeat the search in the opposite direction, use N.

· Use ?, followed by text you are using to search text from the current position in the text upward in the text file. For example, the command ?root would search for the text “root” from the current position in the text upward. To repeat this search action, use n for next. To repeat the search in the opposite direction, use N.

· Use: 5 to go directly to line number 5.

· Use ^ to go to the first position in the current line

· Use $ to go to the last position in the current line

Image Tip To work with advanced search patterns, Vi supports regular expressions as well. Read the section “Working with Basic Regular Expressions” later in this chapter to find out all about these.

Changing All Occurrences of a String in a Text File

When working with Vi, it may happen that you need to change all occurrences of a given word in a text file. Vi has a useful option for this, which is referred to as the global substitute. The basic syntax of a global substitution is as follows:

:%s/old/new/g

This command starts with :%s, which tells Vi that it should make a substitution. Next, it mentions the old text string, in this case old, which in turn is followed by the new text string, new. At the end of the command, the g tells Vi that this is a global action; it will make sure that the substitution is used all over the text file.

I recommend that you analyze your text file carefully after applying a global substitution. Did it work out well? Then save the changes to your text file. If it didn’t work out so well, use the u command to undo the global substitution and restore the original situation. In case you’ve typed one u too many, it’s good to know that Ctrl-R will redo the last undo.

EXERCISE 4-1: MANIPULATING TEXT WITH VI

1. Open a root shell.

2. Type cat /etc/passwd > ~/myfile.

3. Open ~/myfile in Vi, using vi ~/myfile.

4. From within the file, type GG to go to the last line.

5. Now type O to open a new line and type some random text.

6. Press Esc to go to command mode, and Type ZZ to write and quit your document. This is an alternative to using :wq!

7. Repeat step 3 of this exercise.

8. Type /home to search the first occurance of the text home in your document.

9. Type v to enter visual mode.

10.Type :13 to select up to line 13.

11.Press y to copy the current selection.

12.Type :20 to go to line 20.

13.From here, type p to paste the selection that you’ve copied to the buffer in step 11.

14.Type u to undo this last modification.

15.Type Ctrl-r to redo.

16.Type gg to go to the first line of the text file.

17.Type :%s/home/HOME/g to replace all occurances of “home” with “HOME”.

18.Type :wq! to write and quit the document.

Vi Summarized

In this section you’ve learned how to work with Vi. Although there are many more commands that you can use when working with Vi, the commands that I’ve covered in this section will help you perform all basic tasks with Vi. Table 4-1 summarizes all commands that were treated in this section.

Table 4-1. Summary of Vi Commands

Command

Explanation

I

Opens insert mode for editing. Inserts text after the current cursor position.

Esc

Returns to command mode.

A

Opens insert mode for editing. Inserts text at the current cursor position.

O

Opens insert mode for editing. Opens a new line after the current line where the cursor is.

O

Opens insert mode for editing. Opens a new line before the current line where the cursor is.

:wq!

Writes and quits the current document. Suppresses any warnings.

:w

Writes the current file using the same name. Appends a file name to write the file with another name.

:q!

Quits without saving. Ignores any warnings.

u

Undoes the last command.

Ctrl-r

Undo an undo (redo)

V

Enters visual mode to mark a block on which you can use commands.

D

Deletes the current selection.

Y

Yanks (copies) the current selection.

P

Pastes.

G

Goes to the top of the current text file.

G

Goes to the bottom of the current text file.

/text

Searches text from the current position of the cursor downward.

?text

Searches text from the current position of the cursor upward.

^

Go to the first position in the current line

$

Go to the last position in the current line

:nn

Go directly to line number nn

Displaying Contents of Text Files

When working on the command line, you will find that you often need to modify configuration files, which take the form of ASCII text files. Therefore, it’s very important to be able to browse the content of these files. You have several ways of doing this:

· cat: Displays the contents of a file

· tac: Does the same as cat, but displays the contents in reverse order

· tail: Shows just the last lines of a text file

· head: Displays the first lines of a file

· less: Opens an advanced file viewer

· more: Like less, but not as advanced

Showing File Contents with cat and tac

First is the cat command. This command just dumps the contents of a file on the screen (see Listing 4-2). This can be useful, but, if the contents of the file do not fit on the screen, you’ll see some text scrolling by, and when it stops, you’ll only see the last lines of the file displayed on the screen. As an alternative to cat, you can use tac as well. Not only is its name the opposite of cat, its result is too. This command will dump the contents of a file to the screen, but it reverses the file contents.

Listing 4-2. The cat Command Is Used to Display the Contents of a Text File

root@RNA:/boot# cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 RNA.lan RNA

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Showing a File’s Last Lines with tail

Another very useful command is tail. If no options are used, this command will show the last ten lines of a text file. You can also modify the command to show any number of lines on the bottom of a file; for example, tail -2 /etc/passwd will display the last two lines of the configuration file in which usernames are stored.

Also very useful for monitoring what happens on your system in real time is the option -f, which keeps tail open and refreshes the output as soon as new lines are added. For example, if you use tail -f /var/log/messages, the most generic log file on your system is opened, and, when a new line is written to the bottom of that file, you will see it immediately. Use Ctrl+C to get out of a file that you’ve opened using tail –f. Listing 4-3 shows you what the result of tail-f /var/log/messages may look like. In particular, the last two lines here are of interest; you can see that user sander has tried to work as root using the su command, but failed in doing so.

Listing 4-3. Monitoring System Events in Real Time with tail -f

BTN:~ # tail -f /var/log/messages
Nov 11 08:57:27 BTN sshd[11993]: Accepted keyboard-interactive/pam for root from
192.168.1.53 port 62992 ssh2
Nov 11 09:00:01 BTN su: (to beagleindex) root on none
Nov 11 09:00:01 BTN su: (to beagleindex) root on none
Nov 11 09:02:53 BTN su: (to nobody) root on none
Nov 11 09:02:58 BTN syslog-ng[2407]: last message repeated 3 times
Nov 11 09:02:58 BTN su: (to cyrus) root on none
Nov 11 09:02:58 BTN ctl_mboxlist[12745]: DBERROR: reading /var/lib/imap/db/skipstamp
, assuming the worst: No such file or directory
Nov 11 09:02:59 BTN ctl_mboxlist[12745]: skiplist: recovered /var/lib/imap/mailboxes
.db (0 records, 144 bytes) in 0 seconds Listing 4-3: Use tail -f to monitor log files in real time
Nov 11 09:03:59 BTN sux: FAILED SU (to root) sander on /dev/pts/1
Nov 11 09:03:08 BTN sux: (to root) sander on /dev/pts/1

Displaying the First Lines in a File with head

The opposite of tail is the head command, which displays the top lines of a text file. As with tail, this command is useful in a shell script, if you want to display only the first line of a file, for instance. You can even combine head and tail to specify exactly which line in a file you want to display. Consider the example file that you see in Listing 4-4.

Listing 4-4. Example Text File

Username Current status
Linda enabled
Lori disabled
Lisa enabled
Laura enabled

Imagine that, for some reason, you need to see the name of the first user only. You wouldn’t get that by just using tail or by just using head. If, however, you first take the head of the first two lines, and next the tail of the result of that, you would get the required result:

head -n 2 textfile | tail -n 1

As you can see in this example, once again, by using a pipe you get a command that has some powerful additional options.

Browsing File Contents with less and more

The last two commands used to view the contents of text files are less and more. The most important thing you need to remember about them is that you can do more with less. Contrary to common sense, the less command is actually the improved version of more. Both commands will open your ASCII text file in a viewer as you can see in Listing 4-5, which shows the contents of the /etc/hosts file (which contains mappings between node names and IP addresses). In this viewer, you can browse down in the file by using the Page Down key or the spacebar. Onlyless offers the option to browse up as well. Also, both commands have a search facility. If the less utility is open and displays the contents of your file, use /sometext from within the less viewer to locate sometext in the file. Useful to remember: both utilities are based on the Vi editor; therefore, many key strokes that you can use in Vi will work in less and more as well. To quit both utilities, use the q command.

Listing 4-5. You Can Use the less Command As a Viewer to View File Contents

BTN:~ # less /etc/hosts
127.0.0.1 localhost
127.0.1.1 RNA.lan RNA
192.168.1.100 RNA.lan RNA
192.168.1.101 ZNA.lan ZNA
192.168.1.102 BTN.lan BTN
192.168.1.103 XTN.lan XTN
/etc/hosts (END)

Cool Text File Manipulation Tools

To change the contents of text files, you can use an editor. Apart from editors that oblige you to make changes word by word, you can also use some automated tools to do batch changes. The tools mentioned in the following text are all classical tools from the UNIX era, and you can use them to apply batch changes. You will notice though that these tools don’t make their changes in the files you’re working on but show the modifications on the standard output. This means that in all cases, you’ll have to work with redirection to write the changes to a new file. You will see some examples explaining how this works.

Changing Contents in a Batch with tr

The tr utility is used to translate or delete characters from a file. Since it doesn’t have any options to work with input or output files, you have to using piping and redirection to apply changes to files when using tr. A classical use of tr is to translate lowercase into uppercase. In the example in Listing 4-6, you can see the contents of the ~/users file before and after it is translated with tr.

Listing 4-6. Changing Lowercase into Uppercase with tr

BTN:~ # cat users
linda
sanne
anja
sylvia
zeina
BTN:~ # cat users | tr a-z A-Z
LINDA
SANNE
ANJA
SYLVIA
ZEINA

As you can see, in this example the cat command is used first to display the contents of the file user, and the result of the cat command is piped to the tr command, which translates a–z into A–Z. The result, however, is written to the standard output only, and not to a file. To write the result of the command from Listing 4-6 to a text file with the name users2, you can apply redirection to do the following:

cat users | tr a-z A-Z > users2

Instead of working with cat and a pipe that has tr process the results of the cat command, you can also work with the input redirector, <. The next command shows an alternative for the preceding command that translates and next writes the results to a new text file:

tr a-z A-Z < users > users2

Sorting Text Files with sort

Imagine that you have a list of users, and you want to sort that list. In this case, you can use the sort command. For instance, if applied to the users file from Listing 4-6, sort users would give you the result that you see in Listing 4-7.

Listing 4-7. Sorting File Contents with sort

BTN:~ # sort users

anja
linda
sanne
sylvia
zeina

At first sight, sort appears to be a simple utility that is pretty straightforward. You may be surprised, though. For instance, consider the example in Listing 4-8, in which another users file is sorted.

Listing 4-8. Sorting in Alphabetical Order?

BTN:~ # sort users

Angy
Caroline
Susan
anja
linda
sanne
sylvia
zeina

As you can see, in the example from Listing 4-8, sort first gives names that start in uppercase, and next it gives all lowercase names. This is because by default it doesn’t respect alphabetical order, but it takes the order as defined in the ASCII table. Fortunately, sort has the -f option, which allows you to apply real alphabetical order and ignore case. Also useful is the option -n, which makes sure that numbers are sorted correctly. Without the -n option, sort would consider 8, 88, 9 the correct order. With this option applied, you can make sure that the numbers are sorted as 8, 9, 88.

Finding Differences Between Text Files with diff

If you want to find out differences between files, the diff utility is very useful. Typically, you would use diff to compare an old version with a newer version of a file. For instance, if you make a copy of the user database in /etc/passwd to /etc/passwd.old, you can compare these files later by using the diff utility to see whether any differences have occurred. Listing 4-9 gives an easy-to-understand example of the diff utility.

Listing 4-9. Comparing Files with diff

BTN:~ # diff users users2
7d6
< pleuni

In the example in Listing 4-9, there is only one difference between the two files that are compared: one file contains a line that reads pleuni, whereas the other line doesn’t. The diff utility uses the coordinates 7d6 to indicate where it has found differences. In these coordinates, it uses a d to indicate that a line was deleted from the first file. The following indicators can be used in the coordinates:

· d: Line was deleted from the first file

· a: Line was added to the first file

· c: Line was changed in the first file

The number to the left of the letter corresponds to the line number found in the first file. The number to the right of the letter corresponds to the line in the second file used during comparison. Since diff finds the longest common sequence in both files, 7d6 means that the line pleuniwas deleted from the first file to make it the same as the second file.

< and > are also clear indications of where the differences can be found. < refers to the first file, while > refers to the second file.

Another way of presenting the output given by diff is to use the --side-by-side option as well, to show the contents of both files and where exactly the differences are. You can see an example of this in Listing 4-10.

Listing 4-10. Use the --side-by-side Option to Clearly See Differences

BTN:~ # diff --side-by-side users users2
linda linda
Angy Angy
Susan Susan
sanne sanne
anja anja
Caroline Caroline
pleuni <
sylvia sylvia
zeina zeina

Checking Whether a Line Exists Twice with uniq

When working on a text configuration file, it is a rather common error to have a given configuration parameter twice. The risk of getting this is very real, especially if you have a configuration file that contains hundreds of lines of configuration parameters. By using the uniq utility, you’ll find these lines easily. Let’s consider the input file users, which is displayed in Listing 4-11.

Listing 4-11. Test Input File

BTN:~ # cat users
linda
Angy
Susan
sanne
anja
Caroline
pleuni
linda
sylvia
zeina
sylvia

As you can see, some of the lines in this input file occur twice. If, however, you use the uniq users command, the command shows you unique lines only. That is, if a given line occurs twice, you will only see the first occurrence of that line as you can see in Listing 4-12.

Listing 4-12. Displaying Unique Lines Only

BTN:~ # uniq users
linda
Angy
Susan
sanne
anja
Caroline
pleuni
sylvia
zeina

Like most other commands, uniq has some specific switches as well that allow you to tell it exactly what you need it to do. For instance, use uniq --repeated yourfile to find out which lines occur repeatedly in yourfile.

Getting Specific Information with cut

Another very useful command is cut. This command allows you to get fields from structured files. To do this, it helps if you tell cut what the field delimiter is. In Listing 4-13, you see an example. First, I’ve displayed the last seven lines of the /etc/passwd file in which user accounts are stored, and next, I’ve piped this to the cut command to filter out the third column.

Listing 4-13. Filtering a Specific Column from the passwd User Database

nuuk:~ # tail -n 7 /etc/passwd
lori:x:1006:100::/home/lori:/bin/bash
laura:x:1007:100::/home/laura:/bin/bash
lucy:x:1008:100::/home/lucy:/bin/bash
lisa:x:1009:100::/home/lisa:/bin/bash
lea:x:1010:100::/home/lea:/bin/bash
leona:x:1011:100::/home/leona:/bin/bash
lilly:x:1012:100::/home/lilly:/bin/bash
nuuk:~ # tail -n 7 /etc/passwd | cut -d : -f 3
1006
1007
1008
1009
1010
1011
1012

In this example command, the option -d : is used with cut to specify the field delimiter, which is a : in the /etc/passwd file. Next, with the option -f 3, cut learns that it should filter out the third field. You can really benefit from the options that cut has to offer, if you combine it with other commands in a pipe. Listing 4-13 already shows an example of this, but you can go beyond this example. For instance, the command cut -d : -f 3 /etc/passwd | sort -n would display a sorted list of user IDs from the /etc/passwd file.

EXERCISE 4-2: WORKING WITH TEXT PROCESSING TOOLS

1. Open a shell.

2. Type tr [:lower:] [:upper:] < /etc/passwd > ~/myfile to create myfile

3. Type less ~/myfile to open the contents of the file in the less viewer. Type q to quit it.

4. Type sort ~/myfile and observe the results. Notice that the contents of the file is sorted on the first character.

5. Type sort -t: -k3 -n ~/myfile. This sorts the contents of the file based on the numeric order in the third column. Notice the part -t: which defines the trailing character as a :

6. Type head -n 5 ~/myfile. It shows the first 5 lines of the file.

7. Type tail -f /var/log/messages. You’ll see the last lines of /var/log/messages, with new lines scrolling by as they are added.

8. Press Ctrl-c to interrupt the tail -f command.

Advanced Text File Filtering and Processing

Up to now, we’ve talked about the simple text-processing tools only. There are some advanced tools as well, among which are the old and versatile sed and awk. Although these are complicated tools, you may benefit from some basic knowledge about these tools. In the next sections, you’ll learn about their basics. Before diving into sed and awk details, you’ll read about an advanced way to work with text patterns by using regular expressions. Each of these three subjects merits a book on its own; consider what I give here just a very basic introduction to these complex matters.

Working with Basic Regular Expressions

Many programs discussed in this chapter are used to search for and work with text patterns in files. Because working with text patterns is so important in Linux, a method is needed to refer to text patterns in a flexible way that goes beyond just quoting the text pattern literally. For instance, try a command like grep -r host /; it will give you a huge result because every word that contains the text “host” (think, for example, about words like ghostscript) would give a match. By using a regular expression, you can be much more specific about what you are looking for. For instance, you can tell grep that it should look only for lines that start with the word host.

Regular expressions are not available for all commands; the command that you use must be programmed to work with regular expressions. The most common examples of such commands are the grep and vi utilities. Other utilities, like sed and awk, which are covered later in this section, can also work with regular expressions.

An example of the use of a regular expression is in the following command:

grep 'lin.x' *

In this example, the dot in the regular expression 'lin.x' has a special meaning; it means every character at that particular position in the text string is seen as a match. To prevent interpretation problems, I advise you to always put regular expressions between single quotes. By doing this, you’ll prevent the shell from interpreting the regular expression.

As mentioned in the introduction of this section, you can do many things with regular expressions. In the following list, I give examples of some of the most common and useful regular expressions:

· ^: Indicates that the text string has to be at the beginning of a line. For instance, to find only lines that have the text hosts at the beginning of a line, use the following command:

grep -ls '^hosts' *

· $: Refers to the end of a line. For instance, to find only lines that have the text hosts at the end of the line, use the following command:

grep -ls 'hosts$' *

Image Tip You can combine ^ and $ in a regular expression. For instance, to find lines that contain only the word “yes,” you would use grep -ls '^yes$' *.

· .: Serves as a wildcard to refer to any character, with the exception of a newline character. To find lines that contain the text tex, tux, tox, or tix, for instance, use the following command:

grep -ls 't.x' *

· []: Indicates characters in the regular expression that should be interpreted as alternatives. For instance, you would use the following command to find users who have the name pinda or linda:

grep -ls '[pl]inda' *

· [^ ]: Ignores all characters that you put between square brackets after the ^ sign. For instance, the following command would find all lines that have the text inda in them, but not lines that contain the text linda or pinda:

grep -ls '[^pl]inda' *

· -: Refers to a class or a range of characters. You have already seen an example of this in the tr command where the following was used to translate all lowercase letters into uppercase letters:

tr a-z A-Z < mytext

Likewise, you could use a regular expression to find all files that have lines that start with a number, using the following command:

grep -ls '^0-9' *

· \< and \>: Search for patterns at the beginning of a word or at the end of a word. For instance, the following would show lines that have text beginning with san:

grep \<san *

These regular expressions have two disadvantages though. First is that they don’t find lines that start with the provided regular expression. The other disadvantage is that they are not supported by all utilities, though Vi and grep do work with them.

· \: Makes sure that a character that has a special meaning in a regular expression is not interpreted. For instance, the following command will search a text string that starts with any character, followed by the text host:

grep -ls '.host' *

If, however, you need to find a text string that has a dot at the first position, which is followed by the text host, you need the following regular expression:

grep -ls '\.host' *

The regular expressions just discussed help you find words that contain certain text strings. You can also use regular expressions to specify how often a given string should occur in a word by using regular expression repetition operators. For instance, you can use a regular expression to search for files containing the username linda exactly three times. When working with repetition operators, you must make sure that the entire regular expression is in quotes; otherwise, you may end up with the shell interpreting your repetition operator. Next is a list of the most important repetition operators:

· *: The asterisk is used to indicate that the preceding regular expression may occur once, more than once, or not at all. It is not the most useful character in a regular expression, but I mainly mention it so that you don’t try to use it as a * in the shell. In a shell environment,* stands for any character; in regular expressions, it just indicates that the preceding regular expression may exist.

· ?: The question mark is used to indicate that there may be a character at this position, but there doesn’t have to be a character. Consider the following example, where both the words “color” and “colour” will be found:

grep -ls 'colo.r' *

· +: The preceding character or regular expression has to be present at least once.

· \{n\}: The preceding character or regular expression occurs at least n times. This is useful in a regular expression where you are looking for a number, say, between 100 and 999, as in the following command:

grep -ls '0-9\{3\}' *

Working with Programmable Filters

In the first part of this chapter, you’ve read about utilities that you can use to manipulate text files. Most of the utilities discussed so far are in some way limited in use. If they just don’t do what you need them to do, you may need more powerful utilities. In that case, programmable filters such as sed and awk may offer what you need.

Once you start working with power tools like sed and awk, you may end up using programming languages such as Perl and Python. You could consider languages like these as a further extension to the powerful sed and awk, with more options and more possibilities that allow you to process text files in real time, something that is quite important if, for instance, you want to offer dynamic web pages to end users. In this chapter, we won’t go that far. You’ll just get a basic introduction to working with sed and awk, with the purpose of making text file processing easier for you.

Working with sed

In fact sed, which stands for Stream EDitor, is just a further development of the old editor ed. With sed, you can automate commands on text files. To do this, sed processes the text file line by line to see whether a command has to be executed on these lines. By default, sed will write its result to standard output. This means you must redirect it somewhere else if you also really need to do something with this standard output.

The basic sed syntax is as follows:

sed 'list of commands' file ...

Normally, sed will walk through the files it has to work on line by line, apply its commands to each line, and then write the output to the standard output. Let’s have a look at an example involving a file with the name users, shown in Listing 4-14.

Listing 4-14. Example Text File

nuuk:~ # cat users
lori:x:1006:100::/home/lori:/bin/bash
laura:x:1007:100::/home/laura:/bin/bash
lucy:x:1008:100::/home/lucy:/bin/bash
lisa:x:1009:100::/home/lisa:/bin/bash
lea:x:1010:100::/home/lea:/bin/bash
leona:x:1011:100::/home/leona:/bin/bash
lilly:x:1012:100::/home/lilly:/bin/bash

If you just want to display, say, the first two lines from this file, you can use the sed command 2q. With this command, you tell sed to show two lines, and then quit (q). Listing 4-15 shows the results of this command.

Listing 4-15. Showing the First Two Lines with sed and Quitting

nuuk:~ # sed 2q users
lori:x:1006:100::/home/lori:/bin/bash
laura:x:1007:100::/home/laura:/bin/bash

Basically, to edit lines with sed automatically, you need to find the proper way to address lines. To do this, you can just refer to the line number you want to display, but far more useful is to have sed search for lines that contain a certain string and execute an operation on that line. To refer to a string in a certain line, you can use regular expressions, which have to be between slashes. An example of this is in the following sed command, where only lines containing the string or are displayed:

sed -n /or/p users

In this example, the option -n is used to suppress automatic printing of pattern space. Without this option, you would see every matching line twice. Next, /or/ specifies the text you are looking for, and the command p is used on this text to print it. As the last part, the name of the file on which sed should do its work is mentioned. Following is a list of examples where regular expressions are used in combination with sed on the example text file from Listing 4-14:

· sed -n /or/p users: Gives the line that contains the text lori; only those lines that contain the literal string or are displayed.

· sed -n /^or/p users: Doesn’t give any result, as there are no lines starting with the text or.

· sed -n /./p users: Gives all lines; the dot refers to any character, so all lines give a match.

· sed -n /\./p users: Still gives all lines. Since no quotes are used in the regular expression, the shell interprets the \ sign before sed can treat it as part of the regular expression. Therefore, the dot refers to any character, and all lines from the example file are displayed.

· sed -n /\./p users: Shows only lines that contain a dot. Since these don’t exist in the example file, no result is given.

· sed -n /me\/le/p users: Shows the lines containing the text lea and leona. The regular expression in this example uses me\/le, which means that in this case sed searches for the literal string 'me/le'. Note that this command would also fail without the quotes.

Up to now, you have read about line addressing only, and just one command was displayed, which is the command p for print. sed has many other commands as well, of which the s (substitute) command is without a doubt the single most popular. By using the s command, you can substitute a string with another string. In the next example you can see how the s command is used to replace /home with /users in the example file from Listing 4-14. See also Listing 4-16 for the complete results of this command:

sed s/home/users/g users

Note that in this command, the first element that is used is the s command itself. Then follow two addresses: the name of the string to search for and the name of the string this should be replaced with. Next, the g command tells sed this is a global command, meaning that it will perform the replace action all over the file. Last, the name of the file on which sed should work is given.

The result of this command is written to STDOUT by default, and therefore is not saved in any file. If you want to save it, make sure to use redirection to write the result to a file (e.g., sed s/home/users/g users > newusers). Alternatively, you can write the results to the source file, using the -i option. But you should be sure about what you’re doing, as the modifications will be applied immediately.

Listing 4-16. Using the sed Substitute Command to Replace Text

nuuk:~ # sed s/home/users/g users
lori:x:1006:100::/users/lori:/bin/bash
laura:x:1007:100::/users/laura:/bin/bash
lucy:x:1008:100::/users/lucy:/bin/bash
lisa:x:1009:100::/users/lisa:/bin/bash
lea:x:1010:100::/users/lea:/bin/bash
leona:x:1011:100::/users/leona:/bin/bash
lilly:x:1012:100::/users/lilly:/bin/bash

Manipulating text Files with awk

Another powerful tool to manipulate text files is awk. Like sed, awk is also a programming language by itself, with many possibilities. Personally I like it a lot, because it is a very versatile utility that helps you to get the information you need fast and easy.

As is the case with sed, each awk command also works with a pattern that specifies what to look for. Next, you’ll use a command to specify what to do with it. Typically, the patterns are put between slashes, and the actions that you want to perform are put in braces. Since awk also works with regular expressions, it is wise to put awk patterns between single quotes as well, to avoid the shell from interpreting them by accident. The global structure of an awk command is as follows:

awk '/pattern/{action}' file

In case you don’t specify a pattern, the action is performed on every line in the file. You can interpret this as “every line that matches the pattern null.” If no action is specified, awk just shows you the lines that match the pattern; hence, there is no big difference with a tool such as grep. An example of this is shown in Listing 4-17, where awk displays lines containing the text lori.

Listing 4-17. Displaying Lines That Contain a Given Text Pattern with awk

nuuk:~ # awk '/lori/' users
lori:x:1006:100::/home/lori:/bin/bash

The awk utility becomes really interesting combined with its abilities to filter columns or fields out of a text file. The default field separator is a space, but you can tell awk to use something else instead by using the option -F followed by the character you want to use as a separator. In the next example line, the awk print command and the colon field separator are used to find the user ID of user lori from the users file:

awk -F : '/lori/{print $3}' users

In the preceding example, you see that $3 is used to refer to the third field in the file. You can also use $0 to refer to the entire record. Because awk is able to refer to specific fields, it’s possible as well to compare fields with one another. The following operators are available for this purpose:

· ==: Equals (searches for a field that has the same value)

· !=: Not equals

· <: Smaller than

· <=: Smaller than or equal to

· >: Bigger than

· >=: Bigger than or equal to

With these operators, you can make some useful calculations on text files. For instance, the following example would search the /etc/passwd file and show all lines where the third field contains a value bigger than 999:

awk '$3 > 999 { print $1 }' /etc/passwd

Image Tip The preceding example allows you to find all names of user accounts that have a UID bigger than 999 (you’ll learn more about commands like this in Chapter 6, which discusses user management). Typically, this gives you real usernames, and not the names of system accounts.

EXERCISE 4-3: USING SED

1. Type awk '/root/' ~/myfile to show all lines that contain the text root.

2. Type awk -F : '/root/{ print $3 } ' ~/myfile. This prints the third field of all lines that contain the text root.

3. Type sed -n '2p' ~/myfile. This prints the second line in the ~/myfile file.

4. Type sed -i -e '2d' ~/myfile. This command deletes the second line in the file and writes the contents to the file immediately.

Printing Files

On Linux, the CUPS print system is used to print files. Typically, you would set up a CUPS printing environment with the management tools that are provided with your distribution, so I won’t cover that here. Once installed, you can use several command-line tools to send jobs to CUPS printers. You can find examples of some of these in the following text.

Managing CUPS Print Queues

CUPS offers a lot of tools from the command line that you can use to manage print jobs and queues. The flow of a print job is easy: a print job is placed in the printer queue, where it waits for the printer process to get it out of there and have it served by a printer. If you have worked with older UNIX print systems, I have good news for you: CUPS works with tools from thBerkeley UNIX dialect as well as the System V UNIX dialect. Since the Berkeley UNIX dialect is more common, in this subsection I will focus on the Berkeley tools.

Creating Print Jobs

To create a print job from the command line, you need the lpr tool. With this tool, you can send a file directly to a printer. In its most basic configuration, you can issue the command lpr somefile; this command will send somefile to the default printer. If you want to specify the printer where the file is sent to, you can use the -P option followed by the name of the print queue. For example, use lpr -P hplj4l somefile to send somefile to the queue for hplj4l. Want to print to a remote printer? That’s also possible using lpr; use lpr -P hplj4l@someserver somefile to send somefile to the queue named hplj4l at someserver.

Tuning Print Jobs

From time to time, as an administrator it is useful to display print job information. For this purpose, you can use the lpq command. To get a list of all print jobs in the default queue, just issue lpq. Want to show print jobs in another queue? Specify the name of the queue you want to monitor, like lpq -P somequeue. This will get you a fairly high-level overview of the jobs and their properties. Want to see more detail? Use lpr -l -P somequeue. The option -a lets you check print jobs in all queues—just issue lpq -a.

Removing Print Jobs

Have you ever sent a print job to a queue that wasn’t supposed to be sent after all? Good news: if you are fast enough, you can remove that job using the lprm command. This command can be used in many different ways. The most brute-force way of using it is with the - option and nothing else. This will remove all jobs that you have submitted to the queue, and if you are the root user, it will remove all jobs from the queue. You can be more specific as well; for example, lprm -P hplj4l 3 would remove job number 3 from the queue hplj4. To find out what job number your queue is using, you can use the lpq command.

Image Tip When hacking CUPS from the command line, it can happen that changes are not automatically activated. If you’ve made a change, but you don’t see any result, use the rccups restart command to restart CUPS.

Finding Files

Since Linux is a very file-oriented operating system, it is important that you know how to find files. The utility used for this purpose, find, allows you to find files based on any of the file properties that were used when storing the file on disk. Let’s start with an example: the following findcommand helps you find all files with names that start with host on the entire hard drive of the computer:

find / -name "hosts*"

One cool thing about find is that it allows you to do a lot more than just find files based on their file names. For instance, you can find files based on their size, owner, permissions, and much more. Following is a short list of file properties that you can use to find files:

· -amin n: Finds all files that were last accessed less than n minutes ago. For instance, find -amin 5 would give all files that were accessed less than five minutes ago.

· -executable: Finds all files that are executable.

· -group gname: Shows all files that have gname as their group owner. (Read Chapter 7 for more information about ownership.)

· -mmin n: Shows all files that were last modified less than n minutes ago.

· -newer file: Shows all files that are newer than file.

· -nogroup, -nouser: Show all files that do not have a group or a user owner.

· -perm [+|-]mode: Finds all files that have a specific permission mode set. (See Chapter 7 for more details about permissions.)

· -size n: Finds all files of a specific size. With this parameter, you can also find files bigger than or smaller than a specific size. For instance, find / -size +2G would find all files larger than 2 gigabytes. When using this parameter, use K, M, and G for kilobytes, megabytes, and gigabytes, respectively. Use the + sign to indicate that you want to see files greater than a specific size.

· -type t: Finds files of a specific type. The most interesting file types that you can search for using this option are d for directory or f for a regular file (which is any file that is not a directory).

The interesting part of find is that you can combine different options as well. For example, you can run a find command that finds all files owned by user linda that are larger than 100MB using the following command:

find / -user linda -size +100M

Even more interesting is that you can issue any other command on the result of your find command using the -exec statement. Let’s have a look at an example where find is used to find all files owned by jerry and next moves these files to the directory /root:

find / -user jerry -exec mv {} /root \;

Here you can see some specific items are used with the command you start with -exec. For instance, normally the mv command would refer to the name of some files, as in mv * /root. In this specific case, mv has to work on the result of the previous find command. You refer to this result by using {}. Next, you have to close the exec statement. To do this, use \; at the end each time you open -exec.

Let’s have a look at one more example. This file first looks up all files that are owned by user linda and next executes grep to look in these files to see whether any of them contains the text blah:

find / -user linda -exec grep -l blah {} \;

As you can see, find is a useful tool that helps you in finding files, no matter what properties the file may have.

EXERCISE 4-3: FINDING FILES

1. Type which passwd. This command searches the $PATH for the presence of a binary with the name passwd. Notice that it does not show the /etc/passwd file.

2. Type find / -name “passwd”. This command searches the entire file system for files that have the string passwd in the name.

3. Type find / -user “root”. This searches the file system for files that are owned by user root.

4. Type find / -user “root” -size +500M. This searches the file system for files that are bigger than 500MB and owned by user root.

Summary

In this chapter, you’ve learned about commands that help you in manipulating text files. Apart from these commands, you have learned how to work with regular expressions that help you in finding text patterns in a clever way. Following is a short list in which all commands that are covered in this chapter are summarized:

· vi: Brings up a text editor that allows you to create and modify text files

· cat: Displays the contents of a text file

· tac: Displays the contents of a text file, but inversed

· tail: Shows the last n lines of a text file

· head: Shows the first n lines of a text file

· less: Allows you to walk page by page through a text file

· tr: Substitutes characters, for instance, changing all lowercase letters to uppercase

· diff: Finds differences between files

· sort: Sorts files into alphabetical or any other order

· uniq: Finds a line that has multiple occurrences in a file

· cut: Filters fields from a structured file with clearly marked field separators

· sed: Brings up a stream editor, especially useful for finding and replacing text

· awk: Applies a programmable filter, especially useful for displaying specific fields from files that contain specific text

· lpr: Allows you to send files to a printer

· lpq: Helps you in monitoring files that are waiting to be printed

· lprm: Removes jobs from the print queue

In the next chapter, you’ll learn how to manage a Linux file system.