Advanced Shell Scripting - Scripting - Linux All-in-One For Dummies, 5th Edition (2014)

Linux All-in-One For Dummies, 5th Edition (2014)

Book VII. Scripting

Chapter 2. Advanced Shell Scripting

In This Chapter

arrow Trying out the sed command

arrow Working with the awk and sed commands

arrow Reading some final notes on shell scripting

The preceding chapter introduces you to some of the power available through shell scripting. All the scripts in that chapter are simple bash routines that allow you to run commands and repeat operations a number of times.

This chapter builds upon that knowledge by showing how to incorporate two powerful tools — sed and awk — into your scripts. These two utilities move your scripts to the place where the only limit to what you can do becomes your ability to figure out how to ask for the output you need. Although sed is the stream editor and awk is a quick programming language, they complement each other so well that it’s not uncommon to use one with the other. The best way to show how these tools work is to walk through some examples.

Trying Out sed

The following are sample lines of a colon-delimited employee database that has five fields: unique id number, name, department, phone number, and address.

1218:Kris Cottrell:Marketing:219.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:219.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:21974 Unix Way
1221:Anne Heltzel:Finance:219.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:219.555.5555:984 Bash Lane

This database has been in existence since the beginning of the company and has grown to include everyone who now works, or has ever worked, for the company. A number of proprietary scripts read from the database, and the company cannot afford to be without it. The problem is that the telephone company has changed the 219 prefix to 260, so all entries in the database need to be changed.

This is precisely the task for which sed was created. As opposed to standard (interactive) editors, a stream editor works its way through a file and makes changes based on the rules it is given. The rule in this case is to change 219 to 260. It’s not quite that simple, however, because if you use the command

sed 's/219/260/'

the result is not completely what you want (changes are in bold):

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1260:Nate Eichhorn:Sales:219.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:26074 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

The changes in the first, fourth, and fifth lines are correct. But in the second line, the first occurrence of 219 appears in the employee id number rather than in the phone number and was changed to 260. If you wanted to change more than the very first occurrence in a line, you could slap ag (for global) into the command:

sed 's/219/260/g'

That is not what you want to do in this case, however, because the employee id number should not change. Similarly, in the third line, a change was made to the address because it contains the value that is being searched for; no change should have been made because the employee does not have the 219 telephone prefix.

The first rule of using sed is to identify what makes the location of the string you are looking for unique. If the telephone prefix were encased in parentheses, it would be much easier to isolate. In this database, though, that is not the case; the task becomes a bit more complicated.

If you said that the telephone prefix must appear at the beginning of the field (denoted by a colon), the result would be much closer to what you want:

sed 's/:219/:260/'

Again, bolding has been added to show the changes:

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:260.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:26074 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

The accuracy has increased, but there is still the problem of the third line. Because the colon helped to identify the start of the string, it may be tempting to turn to the period to identify the end:

sed 's/:219./:260./'

But the result still isn’t what was hoped for (note the third line):

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:260.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:260.4 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

Because the period has a special meaning of any character, a match is found whether the 219 is followed by a period, a 7, or any other single character. Whatever the character, it is replaced with a period. The replacement side of things isn’t the problem; the search needs to be tweaked. By using the \ character, we can override the special meaning of the period and specify that you are indeed looking for a period and not any single character:

sed 's/:219\./:260./'

The result becomes:

1218:Kris Cottrell:Marketing:260.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:260.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:21974 Unix Way
1221:Anne Heltzel:Finance:260.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:260.555.5555:984 Bash Lane

And the mission is accomplished.

Working with awk and sed

The second example involves a database of books that includes the ISBN number of each title. In the old days, ISBN numbers were ten digits and included an identifier for the publisher and a unique number for each book. ISBN numbers are now thirteen digits for new books. Old books (those published before the first of 2007) have both the old 10-digit and a new 13-digit number that can be used to identify them. For this example, the existing 10-digit number will stay in the database and a new field — holding the ISBN-13 number — will be added to the end of each entry.

To come up with the ISBN-13 number for the existing entries in the database, you start with 978, then use the first 9 digits of the old ISBN number. The thirteenth digit is a mathematical calculation (a check digit) obtained by doing the following:

1. Add all odd-placed digits (the first, the third, the fifth, and so on).

2. Multiply all even-placed digits by 3 and add them.

3. Add the total of Step #2 to the total of Step #1.

4. Find out what you need to add to round the number up to the nearest 10. This value becomes the thirteenth digit.

For example, consider the 10-digit ISBN 0743477103. It first becomes 978074347710, and then the steps work out like this:

1. 9+8+7+3+7+1=35

2. 7*3=21 ; 0*3=0; 4*3=12; 4*3=12; 7*3=21; 0*3=0; 21+0+12+12+21+0=66

3. 66+35=101

4. 110-101=9. The ISBN-13 thus becomes 9780743477109.

The beginning database resembles:

0743477103:Macbeth:Shakespeare, William
1578518520:The Innovator's Solution:Christensen, Clayton M.
0321349946:(SCTS) Symantec Certified Technical Specialist:Alston, Nik
1587052415:Cisco Network Admission Control, Volume I:Helfrich, Denise

And you want the resulting database to change so each line resembles something like this:

0743477103:Macbeth:Shakespeare, William:9780743477109

The example that follows accomplishes this goal. It’s not the prettiest thing ever written, but it walks through the process of tackling this problem, illustrating the use of awk and sed. I have also included writing to temporary files so you can examine those files and see the contents at various stages. Clean programming would mitigate the use of temporary files everywhere possible, but that practice also makes it difficult to follow the action at times. That said, here is one solution out of dozens. Read on.

Step 1: Pull out the ISBN

Given the database as it now exists, the first order of business is to pull out the existing ISBN — only the first nine digits because the tenth digit, which was just a checksum, no longer matters — and slap 978 onto the beginning. The nine digits we want are the first nine characters of each line, so we can pull them out by using the cut utility:

cut -c1e-9 books

Because a mathematical operation will be performed on the numbers comprising this value, and that operation works with each digit, I’ll add a space between each number and the next in the new entry:

sed 's/[0-9]/& /g'

Now it’s time to add the new code to the beginning of each entry (the start of every line):

sed 's/^/9 7 8 /'

And finally, I do an extra step: removing the white space at the end of the line just to make the entry a bit cleaner:

sed 's/ $//'

Then I write the results to a temporary file that can be examined to make sure all is working as it should. The full first step then becomes

cut -c1-9 books | sed 's/[0-9]/& /g' | sed 's/^/9 7 8 /' | sed 's/ $//' > isbn2

Note: the sed operations could be combined in a script file to increase speed and decrease cycles. However, I am walking through each operation step-by-step to show what’s going on, and am not worried about creating script files for this one-time-only operation.

Examining the temporary file, the contents are as follows:

9 7 8 0 7 4 3 4 7 7 1 0
9 7 8 1 5 7 8 5 1 8 5 2
9 7 8 0 3 2 1 3 4 9 9 4
9 7 8 1 5 8 7 0 5 2 4 1

Step 2: Calculate the 13th digit

We’ve taken care of the first 12 digits of the ISBN number. Now we need to compute those 12 digits to figure out the thirteenth value. Because the numbers are separated by a space, awk can interpret them as fields. The calculation will take several steps:

1. Add all the odd-placed digits: x=$1+$3+$5+$7+$9+$11.

2. Add all the even-placed digits and multiply by 3:

y=($2+$4+$6+$8+$10+$12)*3.

3. Add the total of Step #2 to the total of Step #1: x=x+y.

4. Find out what you need to add to round the number up to the nearest 10 by computing the modulo when divided by 10, and then subtracting it from 10. The following awk command gets everything in place except the transformation:

awk '{ x=$1+$3+$5+$7+$9+$11 ; y=$2+$4+$6+$8+$10+$12 ; y=y*3 ; x=x+y ; y=x%10 ; print y }'

Everything is finished except subtracting the final result from 10. This is the hardest part. If the modulo is 7, for example, the check digit is 3. If the modulo is 0, however, the check digit does not become 10 (10 – 0), but stays 0. My solution is to use the transform function of sed:

sed 'y/12346789/98764321/'

Combining the two operations into one, the second step thus becomes

awk '{ x=$1+$3+$5+$7+$9+$11 ; y=$2+$4+$6+$8+$10+$12 ; y=y*3 ; x=x+y ; y=x%10 ; print y }' | sed 'y/12346789/98764321/' > isbn3

Examining the temporary file, the contents are

9
4
1
5

Step 3: Add the 13th digit to the other 12

The two temporary files can now be combined to get the correct 13-digit ISBN number. Just as cut was used in the earlier step, paste can be used now to combine the files. The default delimiter for paste is a tab, but we can change that to anything with the –d option. I use a space as the delimiter, and then use sed to strip the spaces (remember that the isbn2 file has spaces between the digits so that they can be read as fields):

paste -d" "isbn2 isbn3 | sed 's/ //g'

Finally, I want to add a colon as the first character of each entry to make it easier to append the newly computed ISBN to the existing file:

sed 's/^/:/'

And the entire command becomes

paste -d" "isbn2 isbn3 | sed 's/ //g' | sed 's/^/:/' > isbn4

Examining the temporary file, the contents are

:9780743477109
:9781578518524
:9780321349941
:9781587052415

Step 4: Finish the process

The only operation remaining is to append the values in the temporary file to the current database. I’ll use the default tab delimiter in the entry, and then strip it out. Technically, I could specify a colon as the delimiter and avoid the last part of the last steps. However, I would rather have my value complete there and be confident that I am stripping characters that don’t belong (tabs) instead of running the risk of adding more characters than should be there. The final command is

paste books isbn4 | sed 's/\t//g' > newbooks

The final file looks like this:

0743477103:Macbeth:Shakespeare, William:9780743477109
1578518520:The Innovator's Solution:Christensen, Clayton M.:9781578518524
0321349946:(SCTS) Symantec Certified Technical Specialist:Alston, Nik:9780321349941
1587052415:Cisco Network Admission Control, Volume I:Helfrich, Denise:9781587052415

Again, this result can be accomplished in many ways. This solution is not the cleanest, but it does illustrate the down-and-dirty use of sed and awk.

Final Notes on Shell Scripting

As with any other aspect of computing, it takes a while to get used to shell scripting. After you become comfortable writing scripts, however, you’ll find that you can automate any number of operations and simplify your task as an administrator. The following tips can be helpful to keep in mind:

· After you create a script, you can run it automatically on a one-time basis by using at, or on a regular basis by using cron.

· You can use conditional expressions, such as if, while, and until, to look for events to occur (such as certain users accessing a file they should not) or to let you know when something that should be there goes away (for example, a file is removed or a user terminates).

· You can set permissions on shell scripts in the same way you set permissions for other files. For example, you can create scripts that are shared by all members of your administrative group (use case to create menus based upon LOGNAME).