Perl - Tools and Programming - UNIX: The Complete Reference (2007)

UNIX: The Complete Reference (2007)

Part V: Tools and Programming

Chapter 22: Perl

Perl is what is known as a scripting language or an interpreted language. It combines the best features of shell scripting, awk, and sed into one package. Perl is particularly well suited to processing and manipulating text, but can also be used for applications such as network and database programming. Partly because text manipulation is such a common task, Perl has become incredibly popular. It is particularly common for CGI scripting-in fact, the majority of CGI scripts are written in Perl.

Perl was first released in 1987 by Larry Wall. It is open source and can be downloaded for many platforms, including Linux, Solaris, HP-UX, AIX, Microsoft Windows, and Mac OS X. Another reason for the popularity of Perl is the ease with which Perl scripts can be run on different platforms.

The basic syntax of Perl will feel familiar to C/C++ programmers as well as to shell scripters. Unlike many scripting languages, the perl interpreter completely parses each script before executing it. Thus, a Perl program will not abort in the middle of an execution with a syntax error. Perl programs are generally faster and more portable than shell scripts. At the same time, Perl scripts are faster to write and often shorter than comparable C programs.

This chapter is only an introduction to the many uses of Perl. It gives you all the information you need to get started writing your own Perl scripts. However, if you really want to understand Perl, you will need to devote some time to a longer reference. See the section “How to Find Out More” at the end of this chapter for suggested sources.

Obtaining Perl

Most modern UNIX Systems come with perl already installed, usually at /usr/bin/perl. If it is installed on your system, the command perl -v should tell you which version you have. If you do not already have perlinstalled, or if you want to confirm that you have the latest version, go to http://www.perl.org/. This site, which is a great general resource for Perl information, has links to the various web sites where you can download Perl for your system, including http://www.activestate.com/Products/ActivePerl/. You will also be able to find installation instructions, either on the web site when you download Perl, or included with the Perl distribution.

Running Perl Scripts

The quickest way to run a command in Perl is on the command line. The -e switch is used to run one statement at a time. The statement must be enclosed in single quotes:

$ perl -e 'print "Hello, world\n";'

Hello, world

Although there are many useful perl one-liners, this is not going to get you far with the language.

A more common way to use Perl is to create a script by entering lines of Perl code in a file. You can then use the perl command to run the script:

$ cat hello

print "Hello, world\n";

$ perl hello

Hello, world

As you can see, the Perl function print sends a line of text to standard output. The “\n” adds a newline at the end.

You can also create scripts that automatically use perl when they run. To do this, add the line #!/usr/bin/perl (or whatever the path for perl is on your system-use which perl to find out) to the top of your file. This instructs the shell to use /usr/bin/perl as the interpreter for the script. You will also need to make sure that you have execute permission for the file. You can then run the script by typing its name:

$ cat hello.pl

#!/usr/bin/perl

print "Hello, world\n";

$ chmod u+x hello.pl

$ ./hello.pl

Hello, world

If the directory containing the script is not in your PATH, you will need to enter the pathname in order to run the script. In the preceding example, ./hello.pl was used to run the script in the current directory

The extension .pl is commonly given to Perl scripts. Although it is not required, using .pl when you name your Perl scripts can help you organize your files.

Perl Syntax

You may notice that Perl looks a bit like a combination of shell scripting and C programming. Like C, Perl requires a semicolon at the end of statements. It uses “\n” to represent a newline. Perl includes some familiar C functions like printf, and as you will see later, for statements in Perl use the C syntax. Like shell scripts, Perl scripts are interpreted rather than compiled. They do not require you to explicitly declare variables, which are global by default. As with shell scripting, Perl makes it easy to integrate UNIX System commands into your script. Comments in Perl scripts start with #, again, just like in the shell.

One thing that’s important to understand about Perl is that the language is very flexible. Many Perl functions allow you to leave out syntax elements that other languages would require-for example, parentheses are often optional, and sometimes you can even leave out the name of a variable. When you read Perl scripts written by other people, you will often see familiar commands being used in new ways. If you’re not sure how something will work, experiment with it.

Scalar Variables

The simplest type of variable in Perl is called a scalar variable. These hold a single value, either a number or a string. A scalar variable name always starts with a $, as in

$pi = 3.14159;

$name = "Hamlet\n";

print $name;

Note that this is different from shell scripting, in which you only need the $ when you want the value of the variable. Variable names are case sensitive, as is the language in general.

As in shell scripting, you do not need to explicitly declare variables in Perl. In addition, Perl will interpret whether a variable contains a string or a number according to the context. For example, the following program will print the number 54:

$string = "27";

$product = $string * 2;

print "$product \n";

The default value for a variable is 0 (for a number) or “” (for a string). You can take advantage of this by using variables without first initializing them, as in

$x = $x + 1;

If this is the first time $x has been used, it will start with the value 0. This line will add 1 to that value and assign the result back to $x. If you print $x, you will find that it now equals 1.

Working with Numbers

A shorter way to write the previous example is

$x += 1; # add 1 to the current value of $x

or even

$x++; # increment $x

C programmers will recognize these handy shortcuts. This works with subtraction, too:

$posX = 7.5;

$posY = 10;

$posX −= $posY; # subtract $posY from $posX, so that $posX equals −2.5

$posY--; # $posY is now 9

In addition to += and −=, Perl supports *= (for multiplication) and /= (for division). Exponentiation is done with **, as in

$x = 2**3; # $x is now 8

$x ** = 2; # $x equals $x ** 2, or 8 ** 2, which is 64

and modular division is done with %.

The function int converts a number to an integer. Other math functions include sqrt (square root), log (natural logarithm), exp (e to a power), and sin (sine of a number). For example,

$roll = int (rand(6))+1; # random integer from 1 to 6

print exp 1; # prints the value of e, approx 2.718281828

$pi = 4 * at an2(1, 1); # at an2 ($x, $y) returns the arctan of $x/$y

Entering Numbers

There are many ways to enter numbers in Perl, including scientific notation. All of the following declarations are equivalent:

$num = 156.451;

$num = 1.56451e2; # 1.56451 * (10 ** 2)

$num = 1.56451E2; # same as previous statement

$num = 156451e-3; # 156451 * (10 ** −3)

Perl can also interpret numbers in hex, octal, or binary. See http://perldoc.perl.org/perldata.html for details.

Perl performs all internal arithmetic operations with double-precision floating-point numbers. This means that you can mix floating-point values with integers in your calculations.

Working with Strings

String manipulation is one of Perl’s greatest strengths. This section introduces some of the simplest and most common string operations. More powerful tools for working with strings are discussed in the section “Regular Expressions” later in this chapter.

The . (dot) operator concatenates strings. This can be used for assignment

$concat = $str1 . $str2;

or when printing, as in

$name1 = "Rosencrantz";

$name2 = "Guildenstern";

print $name1 . "and " . $name2 . "\n";

which prints Rosencrantz and Guildenstern.

Variables can be included in a string by enclosing the whole string in double quotes. This example is a more compact way of writing the preceding print statement:

print "$name1 and $name2\n";

Here the values of $name1 and $name2 are substituted into the line of text before it is printed. The \n in this example is an escape sequence (for the newline character) that is interpreted before printing as well.

Another example of an escape sequence is \t, which stands for the tab character. Other escape sequences that are interpreted in double-quoted strings include \u and \l, which convert the next character to upper- or lowercase, and \U and \L, which convert all of the following characters to upper- or lowercase. For example,

$ perl -e 'print "\U$name1 \L$name2\n"'

ROSENCRANTZ guildenstern

To turn off variable substitution, use single quotes. The line

print '$name1 and $name2\n';

will print, literally, $name1 and $name2\n, without a newline at the end. Alternatively, you could use a \ (backslash) to quote the special characters $ and \ itself.

The x operator is the string repetition operator. For example,

print '*' x 80; # repeat the character '*' 80 times

prints out a row of 80 *′s.

As its name implies, the length function returns the length of a string-that is, the number of characters in a given string:

$length = length ("Words, words, words.\n") ;

In this example, $length is 21, which includes the newline at the end as one character.

The index and rindex functions return the position of the first and last occurrences, respectively, of a substring in a string. The position in the string is counted starting from 0 for the first character. In this example,

$posFirst = index ("To be, or not to be", "be");

$posLast = rindex ("To be, or not to be", "be");

$posFirst is three and $posLast is 17.

These two functions are commonly combined with a third, called substr. This function can be used to get a substring from a string, or to insert a new substring. The first argument is the current string. The next argument is the position from which to start, and the optional third argument is the length of the substring (if omitted, substr will continue to the end of the original string). When substr is used for inserting, a fourth argument is included, with the replacement substring. In this case, the function modifies the original string, rather than returning the new string.

$name = "Laurence Kerr Olivier";

# Get the substring that starts at 0 and stops before the first space:

$firstname = substr($name, 0, index ($name,' ')) ; # $firstname = "Laurence"

# Substring that starts after the last space and continues to the end of $name:

$lastname = substr($name, rindex ($name, ' ')+1) ; # $lastname = "Olivier"

substr($name, 9, 5, "");

In the last line of this example, the function substr starts at index 9 and replaces five characters (“Kerr ”) with the empty string-that is, the five characters are removed. If you were to print the variable $name at this point, you would see “Laurence Olivier”.

Here’s another way to modifying an existing string with substr, by assigning the new substring to the result of the function:

$path = "/usr/bin/perl";

# Replace the characters after the last / with "tclsh":

substr($path, rindex ($path, "/") + 1) = "tclsh"; # $path = "/usr/bin/tclsh"

# Insert the string "local/" at index 5 in $path:

substr($path, 5, 0) = "local/"; # "/usr/local/bin/tclsh"

Perl includes many more ways to manipulate strings, such as the reverse function, which reverses the order of characters in a string, or the sprintf function, which can be used to format strings. See the sources listed at the end of this chapter for further details about these functions and other useful string operations.

Variable Scope

By default, Perl variables are global, meaning that they can be accessed from any part of the script, including from inside a procedure. This can be a bad thing, especially if you reuse common variable names like $i or $x. You can declare local variables with the keyword my, as in

my $pi = 3.14159;

It is generally considered good practice to do this for any variable that you don’t specifically need to use globally If you add the line

use strict;

to the top of your script, perl will enforce the use of my to declare variables, and generate an error if you forget to do so.

Reading in Variables from Standard Input

To read input from the keyboard (actually, from standard input), just use <STDIN> where you want to get the input, as in

print "Please enter your name: ";

my $name = <STDIN>;

print "Hello, $name.\n";

When you run this script, the output might look something like

Please enter your name: Ophelia

Hello, Ophelia

Note that the period ended up on its own line. That’s because when you typed in the name Ophelia and pressed ENTER, <STDIN> included the newline at the end of your string, so the print statement actually printed Hello, Ophelia\n.\n. To fix this, use the command chomp to remove a newline from the end of a string, as shown:

my $name = <STDIN>;

chomp($name);

If for some reason there is no newline at the end, chomp will do nothing.

You will almost always want to chomp data as you read it in. Because chomp is used so frequently, the following shortcut is common:

chomp(my $name = <STDIN>);

Arrays and Lists

Arrays and lists are pretty much interchangeable concepts in perl. A list can be entered as a set of scalar values enclosed in parentheses, as shown:

(1, "Branagh", 2.71828, $players)

Lists can contain any type of scalar value (or even other lists). Perl does not impose a limit on the size or number of elements in a list.

An array is just an ordered list in which you can refer to each element using its position. To assign the value of the preceding list to an array, you would write

my @array = (1, "Branagh", 2.71828, $players);

Note that, where scalar variable names all start with $, array variable names start with @. You do not need to tell Perl how big you want the array to be-it will automatically make the array big enough to hold all the elements you add.

Once an array has been assigned a list, each element in the array can be accessed by referring to its index (starting from 0 for the first element):

print "$array[1]\n"; # prints "Branagh"

Here the @ has been replaced by a $. That’s because $array[1] is the string “Branagh”, which is a scalar. It’s only a piece of @array.

The index of the last element in an array is the number $#arrayname. You can also use the index −1 as a shortcut to get the last element (although you can’t count backward through the whole array with −2, −3, etc). To get the size of an array, you can use the expression scalar @arrayname. This causes Perl to use the scalar value of the array which is its size.

my @flowers = ("Rosemary", "Rue", "Daisies", "Violets");

print "The " . scalar @flowers ."th flower ";

print " (at index $#flowers) is $flowers[−1].\n";

This example will print the line “The 4th flower (at index 3) is Violets.”

You can create and initialize an array with the x operator.

my @newarray = "0" x 10;

is shorthand for

my @newarray = ("0", "0", "0", "0", "0", "0", "0", "0", "0", "0") ;

You can also create a list with the range operator,.. (dot dot). For example,

my @newarray = ('A'..'Z');

The range operator simply creates a list containing all the values from one point to another. So, for example, (0..9) is a list with 10 elements, the integers from 0 to 9.

Reading and Printing Arrays

You can assign input from the keyboard to an array, just as you would a scalar variable.

my @lines = <STDIN>;

This time, Perl will continue to read in lines as you enter them. Each line will be one entry in the array To finish entering data, type CTRL-D on a line by itself. Remember that the newline character will be included at the end of each line of text. To get rid of the newlines, use chomp:

chomp (@lines);

or the shortcut

chomp (my @lines = <STDIN>);

Printing an entire array works just like a scalar variable, too:

print @lines;

If you didn’t use chomp to remove the trailing newlines, this will echo back the strings in the array just as you entered them, each on its own line. If you did remove the newlines, the strings will be concatenated together. You can use the command

print "$_\n" foreach (@lines);

to print each one on a separate line. This is an example of a foreach loop, which will be explained later in this chapter.

Modifying Arrays

You can add one or more new elements to the end of an array with push.

my @actors = ("Gielgud", "Olivier", "Branagh");

push (@actors, "Gibson", "Jacobi");

To remove the last element, use pop:

print pop (@actors) . "\n"; # remove "Jacobi"

# @actors now: ("Gielgud", "Olivier", "Branagh", "Gibson")

This will remove the last element from the array and print it at the same time.

The functions shift and unshift operate on the beginning of the array. shift removes the first element and shifts all the others back one index, while unshift adds a new first element and moves everything else up one index.

shift (@actors); # remove the first element, "Gielgud"

unshift (@actors, pop (@actors)); # move "Gibson" to the beginning

# @actors now: ("Gibson", "Olivier", "Branagh")

The second line here removes the last element of the array with pop, and then it adds it at the beginning with unshift.

Array Slices

Perl allows you to assign part of an array, called a slice, to another array The following example creates a new array containing six elements from @players.

my @subset = @players [0, 3, 6..9] ;

You can also use slices to assign new values to parts of an array For example, you could change the elements at indices 1 and 4 of @players with

@players [1, 4] = ($playerK, $playerQ) ;

Another use for lists is to assign values to a group of variables all at once. For example, you could initialize the variables $x, $y, and $z with

my ($x, $y, $z) = (.707, 1.414, 0) ;

Sorting Arrays

The sort function uses ASCII order (in which uppercase and lowercase letters are treated separately) to sort the elements of a list.

my @newlist = sort (@oldlist) ;

The original list is not changed.

Somewhat unfortunately sort treats numbers as strings, which may not be what you want:

my @numlist = sort (3, 25, 40, 100);

will put the numbers in ASCII order as 100, 25, 3, 40. To sort numerically, use the line

my @sortednumlist = sort {$a <=> $b} @numlist;

This example uses a feature of sort that allows you to write your own comparison for the elements of your list. It uses a special built-in function, <=>, for the comparison. The web page http://perldoc.perl.org/functions/sort.html has more examples of custom sort routines.

The reverse function reverses the order of the elements in a list. It is often used after sort:

chomp (my @wordlist = <STDIN>);

my @revsort = reverse (sort (@wordlist)) ;

Hashes

A hash (also known as an associative array] is like an array, but it uses strings instead of integers for indices. These index strings are called keys. As an example, suppose you want to be able to look up each user’s home directory. You could use a hash with the usernames as the keys. The following example creates a hash with two entries-one for user kcb, and one for mgibson:

my %homedirs = ("kcb", "/home/kbc", "mgibson", "/home/mgibson") ;

As you can see, hashes look a bit like arrays. A hash is a list in which the keys alternate with the corresponding values. Hash variable names start with a %. Adding values to a hash is similar to adding elements to an array:

$homedirs{"johng"} = "/home/johng";

Note that hashes use curly braces ({}) instead of square brackets ([]) for arguments. You can look up values in a hash by using the key as an index:

my $homedir = $homedirs{"johng"}; # the home directory for user johng

Here’s a longer example of a hash:

my %dayabbr = (

"Sunday", "Sun",

"Monday", "Mon"

"Tuesday", "Tues"

"Wednesday", "Wed",

"Thursday", "Thurs"

"Friday", "Fri"

"Saturday", "Sat"

) ;

print "The abbreviation for Tuesday is $dayabbr{"Tuesday"}\n";

This hash links the days of the week to their abbreviations.

Note that since the keys are used to look up values, each key must be unique. (The values can be duplicated.)

If you have never used associative arrays, then hashes may seem strange at first. But they are remarkably useful, especially for working with text. We will see examples of how convenient hashes can be later in this chapter, when we have discussed foreach loops and a few other language features.

Working with Hashes

The reverse function swaps the keys of a hash with the values. In this example, abbrdays is a reverse of the hash dayabbr. It translates abbreviations into the full names for the days of the week:

my %abbrdays = reverse (%dayabbr) ;

Not all hashes reverse well. If a hash contains some duplicate values, when it is reversed it will have some duplicate keys. But duplicate keys are not allowed, so the "extras" are removed. For example, if you reverse the following hash,

my %roles = (

"McKellen", "Hamlet",

"Jacobi", "Hamlet",

"Stewart", "Claudius",

) ;

my %actors = reverse($roles) ;

the new hash, %actors, will contain only two elements, one with the key "Hamlet" and the other "Claudius". It can be difficult to predict which entry Perl will remove, so you should be careful when reversing a hash that might have duplicate values.

The function keys returns a list (in no particular order) of the keys for a hash, as in

my @fullnames = keys (%dayabbr) ; # "Sunday", "Monday", etc

Similarly, values returns a list of the values in the hash. For example,

my @shortnames = values (%dayabbr) ; # "Sun", "Mon", etc

The list may include duplicate values, if the hash contains two or more keys that have the same value.

The delete function removes a key (and the associated value):

delete $dayabbr{"Wednesday"};

delete also returns the value it removes, so you could write

print "Enter a day to delete.\n";

chomp(my $deleteday = <STDIN>); # must remember to chomp here

print "Deleting the pair $deleteday, " . delete $dayabbr{$deleteday} . "\n";

Control Structures

In order to write more interesting scripts, you will need to know about control structures.

if Statements

An if statement tests to see if a condition is true. If it is, the following block of code is executed. This example tests to see if the value of $x is less than 0. If so, it multiplies by −1 to make it positive:

if ($x < 0) {

$x *= −1;

}

Perl is not sensitive to line breaks, so you can write short statements like the one just shown all on one line. The following example checks to see if $x or $y is equal to 0:

if ($x == 0 | $y == 0) {print "Cannot divide by 0.\n";}

There are a few things to notice here. The comparison == is used to see if two numbers are equal. Be careful not to use =, which in this case would set $x to 0. Also, | | means “or”. If we wanted to know if both $xand $y were 0, we could use && for “and”.

Another way to write a one-line if statement is to put the test last:

print "Error: input expected\n" if (! defined $input);

In this example, the ! stands for “not”. The function defined is used to determine if a variable has been assigned a value. This statement says “print an error message if $input has not been defined”.

In some cases you may find it more natural to write this type of statement as

print "Error: input expected\n" unless (defined $input);

if statements can have an else clause that gets executed if the initial condition is not met. This example checks whether a hash contains a particular key:

if (exists $hash{$key}){

print "$key is $hash{$key}\n";

} else {

print "$key could not be found\n";

}

You can also include elseif clauses that test additional conditions if the first one is false. This example has one elseif clause. It uses the keyword eq to see if two strings are equal:

if ($str eq "\L$str") {

print "$str is all lowercase.\n";

} elseif ($str eq "\U$str") {

print "$str IS ALL UPPERCASE.\n";

} else {

print "$str Combines Upper And lower case letters.\n";

}

Comparison Operators

Table 22–1 lists the operators used for comparison. Notice that there are different operators, depending on whether you are comparing numbers or strings. Be careful to use the appropriate operators for your comparisons. For example, “0.67” == “.67” is true, because the two numbers are equal, but “0.67” eq “.67” is false, because the strings are not identical.

Table 22–1: Comparison Operators

Numerical

String

Meaning

==

eq

is equal to

!=

ne

does not equal

>

gt

is greater than

<

lt

is less than

>=

ge

is greater than or equal to

<=

le

is less than or equal to

while Loops

The while loop repeats a block of code as long as a particular condition is true. For example, this loop will repeat five times, until the value of $n is 0:

my ($n, $sum) = (5, 0);

while ($n > 0) {

$sum += $n;

$n--;

}

print "$sum\n";

The first line of the loop could also have been written as

until ($n == 0) {

A common use of while loops is to process input. The assignment $input=<STDIN> will have a value of true as long as there is data coming from standard input. The following example will center each line of input from the keyboard, stopping when CTRL-D signals the end of input:

while (my $input = <STDIN>) {

$indent = (80 − length ($input))/2;

print " "x $indent;

print "$input";

}

The $ Variable

The $_ variable is a shortcut you can use to make scripts like the one just shown even more compact. Many Perl functions operate on $_ by default. The output from <STDIN> is assigned to $_ if you do not explicitly assign it elsewhere. print sends the value of $_ to standard output if no argument is specified. Similarly, chomp works on $_ by default.

With $_, the preceding centering script could be rewritten as

while (<STDIN>){

print " " x ((80 − length())/2) . $_;

}

Note that this use of length returns the length of $_. This could even be written on a single line, as

print " " x ((80 − length())/2) . $_ while (<STDIN>);

Iterating Through Hashes

You can use a while loop to iterate through the elements in a hash with the each function. This function returns a key/value pair each time it is called. For example, you could print the elements of the hash %userinfo as shown:

while (my ($key, $value) = each %userinfo) {

print "$key −> $value\n";

}

foreach Loops

The foreach loop iterates through the elements of a list. This example will print each list element on its own line:

foreach $line (@list){

print "$line\n";

}

The syntax here could be read as “for each line in the list, print.”

If you leave out the variable, foreach will use $_:

foreach (@emailaddr) {

print "Email sent to $_\n";

}

This example could be written on a single line as

print "Email sent to $_\n" foreach (@emailaddr);

The foreach loop is also handy for working with hashes. This loop will print the contents of a hash:

foreach $key (keys %userinfo) {

print "$key −> $userinfo{$key}\n";

}

for Loops

The Perl for loop syntax is just like the syntax in C. The loop

for (my $i=0; $i<=10; $i++) {

print $i**2 . "\n";

}

prints the squares of the integers from 0 to 10. This is the same as

foreach (0..10) {

print $ **2 . "\n";

}

Defining Your Own Procedures

The keyword sub is used to define a procedure. Procedures are called with an & in front of their name. This example shows a procedure named &arrayprint that prints the contents of the array @data with one element on each line:

sub arrayprint {

print "$_\n" foreach (@data);

}

@data = (0..9);

&arrayprint;

Notice that @data has been declared without the keyword my, so it is a global variable. This allows the procedure to use @data. A better way to write this procedure would be to make @data a local variable and pass it to the procedure as an argument.

Variables passed to a procedure are stored in the array @_. In this example, the value of $n will be sent to &arrayprint as the first element in @_:

sub factorial {

my $x = shift(@_) ;

my $fact = 1 ;

$fact *= $_ foreach (1..$x) ;

print "$x factorial is $fact.\n";

}

chomp (my $n = <STDIN>);

&factorial ($n) ;

You can get values back from a procedure, as well. By default, the procedure returns the value of the last statement. You can also use the keyword return to immediately exit the procedure and return a value:

sub pythagorean {

($x, $y) = @_;

if (! defined $x || ! defined $y) { # check that the input is defined

return 0; # and return 0 if it isn't

}

($x**2 + $y**2) ** .5;

}

print "&pythagorean (3, 4) \n";

File I/0

At this point, you should be comfortable printing to standard output with print and reading from standard input with <STDIN>. You may be wondering how to work with input and output from other sources. To do this, you will need to know about filehandles.

Standard 1/0

STDIN is an example of a filehandle. When you use <STDIN> in a variable assignment, Perl knows to get the value from standard input. STDOUT is another filehandle. The print command uses STDOUTautomatically, although if you wanted to you could use print STDOUT to explicitly print to standard output.

There’s a third default filehandle in Perl, STDERR, which points to standard error. To print to standard error, use

print STDERR "Error: filename argument expected.\n" if (! defined @ARGV) ;

Using Filename Arguments

The NULL filehandle <> (also called the diamond operator) allows you to read in the contents of files listed on the command line. <> can be used just like <STDIN>. For example, you could implement the UNIX command cat in a script called cat.pl like this:

#!/usr/bin/perl -w

use strict;

print "$_" while (<>);

This script will print the contents of any filenames that are given as arguments, just like cat. Also like cat, cat.pl will wait for standard input if there are no arguments.

Perl recognizes the-command-line argument as a reference to standard input. So you could enter the command line

$ grep " " files | ./cat.pl header - footer

to print the file header, followed by all the output from grep, followed by footer.

The names of the arguments to your script are in the variable @ARGV. If you are a C programmer, note that the first element of @ARGV is the first command-line argument, not the name of the Perl script itself. You can use the special variable $0 to get the script name.

Using perl -p

The -p option causes perl to enclose your script in a while (<>) loop. It also prints $_ at the end of each iteration. So cat.pl could actually be written as the single line

#!/usr/bin/perl -p

Better yet, it could be entered directly at the command line as

$ perl -pe ''

The centering program from the earlier section “The $_ Variable” could be implemented with the -p switch as well. To center the words in the file quotations, use the command line

$ perl -pe 'print " " x ((80 − length())/2)' quotations

Speak the speech

I pray you

as I pronounced it to you

trippingly on the tongue

Opening Files

You also need to be able to open your own files. The command

open MYFILE, "myinputfile";

will open myinputfile for reading and assign it the filehandle MYFILE (it’s common to pick names in all caps for filehandles). Now you can use MYFILE to get data from this file just as you would use STDIN. For example,

$firstline = <MYFILE>;

In some cases, Perl may not be able to open myinputfile. For example, the file may not exist, or you may not have permission to read from it. The line

open MYFILE, "myinputfile" or die "Error opening $myinputfile: $!";

is a safer way to try to open a file. It tells Perl that, if it can’t open the file, it should die, meaning stop the script. Before it exits, it will print the error message to standard error. The $! variable contains the system error caused by the open statement. Printing this may help determine what went wrong.

When you open a file, you can specify the type of access with the familiar UNIX operators <, >, and >>, as shown here:

open READFILE, "< inputfile"; # Open inputfile for reading.

open WRITEFILE, "> outputfile"; # Open outputfile for writing.

open APPENDFILE, ">> appendfile"; # Open appendfile to append data.

To write to a file, use

print MYFILE "$output\n";

Or, if you plan to write a lot of output to one file, use

select MYFILE;

print "$output\n";

The select makes MYFILE the default filehandle for print. When you’re done printing to that file, use select STDOUT to reset STDOUT as the default print destination.

When you’ve finished using a file, you can close it. For example,

close MYFILE;

Opening Command Pipes

Perl also lets you open pipes to or from other commands. For example,

open LSIN, "ls |";

will let you read the files in the current directory with <LSIN>. Alternatively,

open LPOUT, "| lpr";

will let you send output to lpr.

You can run a command directly by enclosing it in backquotes. For example,

my @files = 'ls';

would assign a list of files in the current directory to @files.

Working with Files

This script shows some of the other ways to work with files. The script takes a list of directories and makes backup copies of the directories and the files they contain.

Don’t be scared by the length of this script. It could actually be much shorter, but it was written to be as clear and understandable as possible, not as short as possible.

#!/usr/bin/perl -w

use strict;

foreach my $olddir (@ARGV) {

# Check that each argument is a valid directory name.

if (! defined $olddir | ! -e $olddir){

die "Error: Enter directory to back up. \n";

}

# The backup directory will have .bk at the end.

# Check that it doesn't yet exist.

my $newdir="$olddir.bk"

if (-e $newdir) {

die "Error: Backup directory $newdir already exists.\n";

}

# Call the function that does all the work.

&backupdir ($olddir, $newdir);

}

sub backupdir {

my ($olddir, $newdir)=@_;

# Use the Perl mkdir command to create the new directory.

print "Creating backup directory $newdir ... ";

mkdir $newdir;

print "Done.\n";

# Iterate through all the files in the source directory.

foreach my $oldfile (glob "$olddir/*") {

# The new filename has .bk at the end.

my $newfile="$oldfile.bk";

# Change the path to include the new directory.

substr($newfile, 0, rindex ($newfile, "/")/ $newdir)

# Running the UNIX command cp to copy the file.

print "Copying $oldfile to $newfile ... ";

'cp $oldfile $newfile';

print "Done.\n";

}

}

There are a few new commands in here. The test -e filename checks to see if a file exists. Other file tests, such as how to tell what type a file is, are listed in http://perldoc.perl.org/functions/-X.html. The globcommand expands a filename that contains wildcards, such as *, into a list of matching files in the current directory.

The mkdir command is built in to Perl. It works just like the UNIX command with the same name. Perl does not have a built-in command to copy a file, however. In this script, we used backquotes to run the UNIX cp command. Of course, this will only work on a UNIX system. Once you know how to work with modules, you could use the Copy command from File::Copy instead, which will work on any system your Perl script runs on.

If the directory you are backing up contains other directories, the cp command won’t be able to copy them. You could use cp -r, but it still wouldn’t change the names of the files in those directories. In order to do that, you would need to use the procedure backupdir recursively If you know how to use recursion, this might be a good exercise.

Regular Expressions

A regular expression is a string used for pattern matching. Expressions can be used to search for strings that match a certain pattern, and sometimes to manipulate those strings. Many UNIX System commands (including grep, vi, emacs, sed, and awk) use regular expressions for searching and for text manipulation. Perl has taken the best features for pattern matching from these commands and made them even more powerful.

Pattern Matching

Here’s an example of using a pattern to match a string:

my @emails = ('derek@rsc.org', 'johng@elsinore.dk',

'kcb@rsc.org', 'olivier@elsinore.dk');

foreach $addr (@emails) {

if ($addr =~ /elsinore/) {

print "$addr matches elsinore.\n";

}

}

This example will produce output for johng@elsinore.dk and olivier@elsinore.dk, but not for the other two strings. (By the way, note that single quotes are used around the e-mail addresses. This is to prevent Perl from trying to interpret the @ signs as indicating an array)

As you can see, a string is compared to a regular expression pattern with the =~ operator. The pattern itself is enclosed in a pair of forward slashes. The string is considered a match for the pattern if any part of the string matches the pattern.

If no other string is specified, the pattern is compared to $_. So

foreach (@emails) {

if (/$pattern/i) {

print "$_ matches $pattern\n";

}

}

will look for elements in @emails that match $pattern. The i after /$pattern/ causes Perl to ignore case when matching, so that /elsinore/i will match “Elsinore”.

Constructing Patterns

As you have seen, a string by itself is a regular expression. It matches any string that contains it. For example, elsinore matches “in elsinore castle”. However, you can create far more interesting regular expressions.

Certain characters have special meanings in regular expressions. Table 22–2 lists these characters, with examples of how they might be used.

Table 22–2: Perl Regular Expressions

Char

Definition

Example

Matches

.

Matches any single character.

th.nk

think, thank, thunk, etc.

\

Quotes the following character.

script\.pl

script.pl

*

Previous item may occur zero or more times in a row.

.*

any string, including the empty string

+

Previous item occurs at least once, and maybe more.

\*+

*, *****, etc.

?

Previous item may or may not occur.

web\.html?

index.htm, index.html

{n,m}

Previous item must occur at least n times but no more than m times.

\*{3,5}

***, ****, *****

( )

Group a portion of the pattern.

script(\.pl)?

script, script.pl

|

Matches either the value before or after the |.

(R|r)af

Raf, raf

[ ]

Matches any one of the characters inside. Frequently used with ranges.

[0–9]*

0110, 27, 9876, etc.

[^ ]

Matches any character not inside the brackets.

[^AZaz]

any nonalphabetic character, such as 2

\s

Matches any white-space character.

\s

space, tab, newline

\S

Matches any non-white space.

the\S

then, they, etc. (but not the)

\d

Matches any digit.

\d*

same as [0–9]*

\D

Matches anything that’s not a digit.

\D+

same as [^0–9]+

\w

Matches any letter, digit, or underscore.

\w+

Q, Oph3L1A, R_and_G, etc

\W

Matches anything that \w doesn’t match.

\W+

&#*$%, etc.

^

Anchor the pattern to the beginning of a string.

^Words

any string beginning with Words

$

Anchor the pattern to the end of the string.

\.$

any string ending in a period

Saving Matches

One use of regular expressions is to parse strings by saving the portions of the string that match your pattern. To save part of a string, put parentheses around the corresponding part of the pattern. The matches to the portions in parentheses are saved in the variables $1, $2, and so on. For example, suppose you have an e-mail address, and you want to get just the username part of the address:

my $email = 'derek@rsc.org';

if ($email =~ /(\w+)@/) {

print "Username: $1\n"; # $1 is "derek", which matched (\w+)

}

You can assign the matches to your own variables, as well. Another way to parse the address is

(my $username, my $domain) = ($email =~ /(.*)@(.*)/);

print "Username: $username\nDomain: $domain\n";

Substitutions

Regular expressions can also be used to modify strings, by substituting text for the part of the string matched by the pattern. The general form for this is

$string =~ s/$pattern/$replacement/;

In this example, the string “Hello, world” is transformed into “Hello, sailor”:

my $hello = "Hello, world";

$hello =~ s/world/sailor/;

The flag g causes all occurrences of the pattern to be replaced. For example,

chomp (my @input = <STDIN>);

foreach $line(@input) {

$line =~ s/\d/X/g;

}

will replace all the digits in the input with the letter X.

You can include the variables $1, $2, etc., in the replacement, meaning that

foreach (@input) {

s/(.*)/\L$1/; # same as $_=~ s/(.*)/\L$1/;

}

will convert all the input to lowercase.

The flag e will cause the replacement string to be evaluated as an expression before the replacement occurs. You could double all the integers in $mathline with the statement

$mathline =~ s/(\d+)/2*$1/ge;

To double all the numbers, including decimals, is just a little more complicated. Here’s one way to do it:

$mathline =~ s/(\d+(\.\d+)?)/2*$1/ge;

Translations

The translation operator is similar to the substitution operator. Its purpose is to translate one set of characters into another set. For example, you could use it to switch uppercase and lowercase letters, as shown:

$switchchars =~ tr/AZaz/azAZ/;

Or you could convert letters into their scrabble values (with a value of 10 represented by 0):

$scrabbleword =~ tr/az/13321424185131130111144840/;

This would convert “vanquisher” into “4110111411”.

More examples of the translation operator can be found at http://perldoc.perl.org/perlop.html.

More Uses for Regular Expressions

Regular expressions can be used in several functions for working with strings. These include split, join, grep, and splice.

The split Function

The split function breaks a string at each occurrence of a certain pattern. It takes a regular expression and a string. (If the string is omitted, split operates on $_).

Consider the following line from the file /etc/passwd:

kcb:x:3943:100:Kenneth Branagh:/home/kcb:/bin/bash

We can use split to turn the fields from this line into a list:

@passwd = split(/:/, $line);

# @passwd = ("kcb", "x", 3943, 100, "Kenneth Branagh", "/home/kcb", "/bin/bash")

Better yet, we can assign a variable name to each field:

($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split(/:/, $line);

The join Function

The join function concatenates a series of strings into one string separated by a given separator. It takes a string (not a regular expression) to use as the separator and a list of values to combine.

Given the previously defined array @passwd, we can recreate $line with the following statement:

$line = join(':', @passwd) ;

We can also join individual scalar values together:

$line = join("\n", $login, $gcos);

Here, $line will contain the user name and full name separated by a newline.

The grep Function

The grep function is used to extract elements of a list that match a given pattern. This function works in much the same way as the UNIX System grep family of commands. However, perl’s grep function has a number of new features and is usually more efficient.

To extract all the elements of @data that contain numbers, you can write

@data = ("sjf8", "rlf", "ehb3", "pippin", "13");

@numeric = grep(/ [0–9]/, @data) ;

# Same as @numeric = ("sjf8", "ehb3", "13")

The grep function sets $_ to each value in the list as it searches, so we can give it an expression containing $_ to evaluate. For example, we can search an array for numbers less than 50 with the line:

@numbers = (1..100);

@numbers = grep(($_ < 50), @numbers);

# Same as @numbers = (1..49)

Or we could double each value in a list by saying

@numbers = grep(($_ *= 2), 1, 2, 3, 4);

# Same as @numbers = (2, 4, 6, 8)

A Sample Program

This program demonstrates the uses of regular expressions, hashes, and some of the other Perl language features that you’ve learned about. It counts the frequency of each word in the input. The words are saved as the keys in a hash; the number of times the words appear are the values.

#!/usr/local/bin/perl -w

use strict;

my (%count, $totalwords);

while( <>){

my @line = split(/\s/, $_);

foreach my $word (@line) {

$count{$word}++;

$totalwords++;

}

}

print "$count{$_} $_\n" foreach (sort keys (%count));

print "$totalwords total words found.\n";

The tricky part here is how to split the input lines to find words. The current program uses a regular expression escape sequence “\s”, which splits each line at every space character. Take a look at the results with the following test input:

$ cat raven.input

Once upon a midnight dreary, while I pondered, weak and weary,

Over many a quaint and curious volume of forgotten lore;

While I nodded, nearly napping, suddenly there came a tapping,

As of someone gently rapping, rapping at my chamber door.

"'Tis some visitor", I muttered, "tapping at my chamber door;

Only this and nothing more."

$ wordcount.pl raven

17

1 "'Tis

1 "tapping

1 As

3 I

1 Once

1 Only

1 Over

1 While

3 a

3 and

2 at

1 came

2 chamber

1 curious

1 door.

1 door;

.

.

.

1 volume

1 weak

1 weary,

1 while

73 total words found.

The output of this program shows a few flaws in its design. First of all, it is counting 17 of something that doesn’t seem to be a word. Second, words are not stripped of punctuation, so “door.” and “door;” are counted as two separate words. Finally, words that are capitalized differently are also counted separately, as in “While” and “while”. In order to get an accurate count of how often a word occurs in a document, we should arrange that all forms of the same word get counted as one word.

Let’s try this version of the word frequency program:

#!/usr/local/bin/perl -w

use strict;

my (%count, $totalwords);

while (<>) {

tr/AZ/az/;

s/^\W*//;

my @line = split(/\W*\s+\W*/, $_);

foreach my $word (@line) {

$count{$word}++;

$totalwords++;

}

}

print "$count{$_} $_\n" foreach (sort keys (%count));

print "$totalwords total words found.\n";

The translation operator is used to convert everything to lowercase. The substitution operator then removes leading punctuation for each line. The split pattern is a little more complicated now. It looks for patterns of at least one white-space character, with optional nonword characters on either side. This enables us to correctly count words with punctuation around them.

Here is the new output:

$ wordcount2.pl raven

3 a

3 and

1 as

2 at

1 came

2 chamber

1 curious

2 door

.

.

.

1 volume

1 weak

1 weary

2 while

56 total words found.

Perl Modules

Perl has the capacity to use modules to perform specialized functions. Many modules are included in the standard Perl distribution. The command perldoc perlmodlib documents these modules. In addition, more modules can be downloaded from http://www.cpan.org/.

To see a list of all the available modules on your system, type the following at the command prompt:

$ find 'perl -e 'print "@INC" ' ' -name '*.pm' -print

The variable @INC contains a list of the directories that perl searches to find modules. Module names end in the extension .pm.

To use functions from a module in your script, include it at the beginning with use, as in

use Math::Complex;

This module includes support for complex numbers. The file for this module is Math/Complex.pm.

Most modules come with their own documentation. For example, to view the documentation for the module Socket, type the command

$ perldoc Socket

You may find it easier to consult the documentation on the web at http://perldoc.perl.org/.

Using Perl for CGI Scripting

Perl is an excellent language for writing web-based CGI scripts. In fact, the majority of the CGI scripts on the web are written in Perl. Because this is such a popular application of the language, the module CGI is included to give you a very accessible interface for writing CGI scripts. The documentation for this module is available on the web at http://perldoc.perl.org/CGI.htm. It includes many example of how to use the functions in the module for writing scripts.

Here is one example of a CGI script written in Perl with the CGI module. This script displays a form for entering a bug report. The form has a text field for entering a name, a select box for choosing an operating system, and a large text area for entering a description. Pressing the Submit button at the bottom of the form causes the data in those fields to be sent to the CGI script. The script will then display a message to indicate that the data has been received.

#!/usr/bin/perl

use CGI qw/ :standard/ ;

print header,

start_html("Report a Bug (Duckpond Software)"),

b(i("Duckpond Software")), p,

b("Report a Bug"), hr;

if (! param()) {

print "Fill out this form and click submit.", p,

start_form,

table(Tr([

td([

"Name",

textfield(-name=>"name", -size=>34),

]), td([

"System",

popup_menu(-name= >"system",

-values=>["", "UNIX Variant", "MS Windows", "Mac OS X"]),

]), td([

"Problem Description",

textarea(-name=>"descript", -cols=>30, -rows=>4),

]), td([

"",

submit("Submit") ,

])

])),

end_form, p, "Thank you!";

} else {

print br,

"Thank you for your submission, ", param("name"), ".", br,

"We will respond within 24 hours.", br, br, br, br;

}

print hr,

a({-href=>"http://www.duckpond-software.com"}, "Back to Home Page"),

end_html;

For more information about CGI scripting, including how to run CGI scripts, see Chapter 27.

Troubleshooting

The following is a list of problems that you may run into when running your scripts, and suggestions for how to fix them. In addition, one good general tip for troubleshooting is to always use perl -w to execute scripts. The warnings it prints can help you find errors or typos in your code.

Problem: You can’t find perl on your machine.

Solution: From the command prompt, try typing the following:

$ perl -v

If you get back a "command not found" message, try typing

$ ls /usr/bin/perl

or

$ ls /usr/local/bin/perl

If one of those commands shows that you do have perl on your system, check your PATH variable and make sure it includes the directory containing perl. Also check your scripts to make sure you entered the full pathname correctly If you still can’t find it, you may have to download and install perl yourself.

Problem: You get “Permission denied” when you try to run a script.

Solution: Check the permissions on your script.

For a perl script to run, it needs both read and execute permission. For instance,

$ ./hello.pl

Can't open perl script "./hello.pl": Permission denied

$ ls -l hello.pl

---x------1 kili 46 Apr 23 13:14 hello.pl

$ chmod 500 hello.pl

$ ls -l hello.pl

-r-x------1 kili 46 Apr 23 13:14 hello.pl

$ ./hello.pl

Hello, World

Problem: You get a syntax error.

Solution: Make sure each line is terminated by a semicolon.

Unlike shell and some other scripting languages, Perl requires a semicolon at the end of every statement.

Problem: You still get a syntax error.

Solution: Make sure all parentheses match correctly and all blocks are enclosed in curly braces.

You can use the showmatch option in vi, or blink-matching-paren in emacs, to help you make sure you always close your parentheses and braces.

Remember to enclose all blocks with curly braces. Unlike C, perl does not allow one-line statements to represent a block. For instance, you can’t say

while (<>)

if ( ! /^$/)

print "$_\n";

Problem: You get a syntax error when assigning a value to a scalar variable.

Solution: Make sure you use a “$” in front of all scalar variable names.

Unlike most other programming languages, Perl requires all variable names to start with an identifying character-$ for scalar variables, @ for arrays, and % for hashes. Also remember to use a $ when getting a scalar value from a hash or an array

Problem: You get incorrect results when comparing numbers or strings.

Solution: Make sure you are using the right test operators.

Remember that the operators eq and ne are string comparisons, and == and != are numeric comparisons.

Problem: Data received from external sources (such as STDIN) causes unexpected behavior.

Solution: Make sure you chomp your input to remove the newline at the end of strings.

If you forget to chomp data, you will get unexpected newlines when printing and test comparisons will fail.

Problem: Values outside parentheses seem to get lost.

Solution: Group all arguments to a function in parentheses.

Remember that many commands in Perl are functions. Although they are not always required, parentheses are used to group the input to functions. For example, just as

sqrt (1+2)*3

will take the square root of 1+2 and then multiply the result by 3,

$ perl -e 'print (1+2)*3'

3

will print 1+2 and then try to multiply the result by 3. Adding parentheses around the arguments to print, as in

$ perl -e 'print ((1+2)*3)'

9

will solve this problem.

Running your scripts with perl -w will help detect these errors.

Problem: You get the warning “Use of uninitialized value”.

Solution: You may be trying to use a variable that’s undefined.

Some operations should not be done to a variable that’s undefined. For example,

die "Error: filename argument expected" if (! -e $ARGV[0]);

looks for a file named $ARGV[0]. If it is undefined (because there were no command-line arguments), perl -w will generate a warning.

Problem: Running perl from the command line gives an error message or no output at all.

Solution: Make sure you are enclosing your instructions in single quotes, as in

$ perl -e 'print "Hello, World!\n"'

Problem: Running your perl script gives unexpected output.

Solution: Make sure you are running the right script!

This might sound silly, but one classic mistake is to name your script “test” and then run it at the command line only to get nothing:

$ test

$

The reason is that you are actually running /bin/test instead of your script. Try running your script with the full pathname (e.g., /home/kili/PerlScripts/test.pl) to see if that fixes the problem.

Problem: Your program still doesn’t work correctly.

Solution: Try running the perl debugger with the -d switch.

The perl debugger is used to monitor the execution of your code in a step-by-step fashion. When using the debugger, you can set breakpoints at exact lines in your script and then see exactly what is going on at any point during the program execution. Debugging a program can be very useful in locating logical errors.

Alternately, you try running your program with perl -MO=Lint scriptname, which will use the module B::Lint to check for syntax errors that perl -w might miss.

You could also try posting to the newsgroup comp.lang.perl. The readers of that newsgroup can often be very helpful in diagnosing problems. Be sure to read the newsgroup FAQ before posting, to avoid asking questions that have already been answered.

Summary

Although it is not the easiest language to learn, once you are comfortable using associative arrays, regular expressions, and other keys feature of Perl, you will find that it is very easy to write short but powerful scripts. It is said that Perl programs are generally shorter, easier to write, and faster to develop than corresponding C programs. They are often less buggy, more portable, and more efficient than shell scripts. According to www.perl.org, Perl is the most popular web programming language. Hopefully you now have a sense of why all this might be true.

Table 22–3 lists some of the most important Perl functions introduced in this chapter. Details about these functions, and many more, can be found in perldoc perlfuncs. Table 22–4 summarizes the special characters used in Perl scripting.

Table 22–3: Basic Perl Functions

Function

Use

print

Print a string

chomp

Remove terminal newlines

my

Declare a local variable

reverse

Reverse the order of characters in a string or elements in a list, or swap the keys and values in a hash

push, pop

Add or remove elements at the end of an array

unshift, shift

Add or remove elements at the beginning of an array

sort

Sort a list in ASCII order

keys, values

Get a list of keys or values for a hash

if, unless

Conditional statements

while, until

Loop while a condition is true (or until it becomes true)

foreach, for

Loop through the elements in a list

defined

Check whether a variable has a value other than undef

open, close

Open or close a filehandle

die

Exit with an error message

sub

Define a procedure

return

Exit from a procedure, returning a value

Table 22–4: Special Characters Used in Perl

Symbol

Use

Symbol

Use

#

comment

<>

Read input from a filehandle

$

scalar variable name

$_

Default variable

@

array name

@_

Values passed to a procedure

$#

last index in an array

//

Enclose a regular expression

%

name of a hash

!

Not

&

procedure name

&&, ||

And, or

How to Find Out More

The classic Perl reference is known as the “Camel book” (because of the picture on the cover). It is very thorough and would be a good choice for an experienced programmer who wants to really understand Perl.

Wall, Larry, Tom Christiansen, and Jon Orwant. Programming Perl. 3rd ed. Sebastopol, CA: O’Reilly Media, 2000.

The “Llama book” is a shorter and more introductory work. If you are relatively new to programming, this might be more approachable, although it does not cover the language in the same depth as the Camel book.

§ Schwartz, Randal L., Tom Phoenix, and brian d foy. Learning Perl 4th ed. Sebastopol, CA: O’Reilly Media, 2005.

Perl comes with extensive documentation. The command perldoc can be used to access this documentation. For example, perldoc perlintro displays an overview of Perl, and perldoc perl includes a list of the other documentation pages. The same documentation, with a more user-friendly interface, is available on the web at

· http://perldoc.perl.org/

Three excellent web sites for Perl information are

· http://www.perl.org/

· http://www.cpan.org/

· http://www.perl.com/

ActivePerl, a Perl implementation that can be downloaded for many platforms, is found at

§ http://www.activestate.com/

Another good place to learn about perl is the newsgroup comp.lang.perl.misc. This is a good place to ask questions about the language. Be sure to read the newsgroup FAQ before posting.