Python - Tools and Programming - UNIX: The Complete Reference (2007)

UNIX: The Complete Reference (2007)

Part V: Tools and Programming

Chapter 23: Python

Python is a scripting language. It was first released in 1991 by Guido van Rossum, who is still actively involved in maintaining and improving the language. Python is open source and runs on virtually all UNIX variants, including Linux, BSD, HP-UX, AIX, Solaris, and Mac OS X, as well as on Windows. (There’s even a version of Python for the PSP.) Python has been gaining popularity ever since it was released, and although Perl is still more widely used, it is certainly one of the most popular scripting languages. One reason for its popularity is the large set of libraries available for Python, including interfaces for writing graphical applications, and for network and database programming.

Like most scripting languages, Python has built-in memory management. Scripts are compiled to bytecode before they are interpreted, which makes execution fairly efficient. Python can be used to write either object-oriented or procedural code. It even supports some features of functional programming languages.

Writing programs in Python is typically faster and easier than writing in C. Python is known for being easy to combine with C libraries, however, and because it can be object-oriented, it works well with C++ and Java also. Python is significantly more readable than Perl. In particular, Python code strongly resembles pseudocode. It uses English words rather than punctuation whenever possible. Like Perl, Python is used extensively in developing applications for the web. Chapter 27 shows how Python can be used for CGI scripting and web development.

This chapter is only an introduction to the many uses of Python. It gives you all the information you need to get started writing your own Python scripts, but if you really want to understand Python, you will need to devote some time to a longer reference. See the section “How to Find Out More” at the end of this chapter for suggested sources.

Installing Python

Most modern UNIX systems come with python already installed, usually at /usr/bin/python. If it is installed on your system, the command python -V will tell you which version you have. If you do not have python installed, you can download it from http://www.python.org/, which is the official web site for Python. The download page includes instructions for unpacking and installing the source files.

Running Python Commands

The easiest way to use Python is with the interactive interpreter. Just like the UNIX shell, the interpreter allows you to execute one line at a time. This is an excellent way to test small blocks of code. The command python starts the interactive interpreter, and CTRL-D exits it. The interpreter prompts you to enter commands with “>>>”. In this chapter, examples starting with “>>>” show how the interpreter would respond to certain commands.

$ python

>>> print "Hello, world"

Hello, world

>>> [CRTL-D]

$

As you can see, the command print sends a line of text to standard output. It automatically adds a newline at the end of every line.

You can also use python to run commands that have been saved in a file. For example,

$ cat hello-script

print "Hello, world"

$ python hello-script

Hello, world

To make a script that automatically uses python when it is run, add the line #!/usr/bin/ python (or whatever the path for python is on your system-use which python to find out) to the top of your file. This instructs the shell to use /usr/bin/python as the interpreter for the script. You will also need to make sure the file is executable, after which you can run the script by typing its name.

$ cat hello.py

#!/usr/bin/python

print "Hello, world"

$ chmod u+x hello.py

$ ./hello.py

Hello, world

If the directory containing the script is not in your PATH, you will need to enter the pathname in order to run it. In the preceding example, ./hello.py was used to run a script in the current directory The extension .py indicates a Python script. Although it is not required, using .py when you name your Python scripts can help you organize your files.

The quickest way to execute a single command in Python is with the -c option. The command must be enclosed in single quotes, like this:

$ python -c 'print "Hello, world"'

Hello, world

Python Syntax

One of the first things many newcomers to Python notice is how readable the code is. This makes Python a good choice for group projects, where many people will have to share and maintain programs. The ease of reading and maintaining Python code is one reason it is popular with so many programmers.

One very notable feature of Python that tends to startle experienced developers is the mandatory indentation. Instead of using keywords like do/done or punctuation like {}, you group statements (e.g., the block following an if statement) by indenting your code. Python is sensitive to white space, so to end a block you just return to the previous level of indentation. The section “Control Structures” shows exactly how this works.

Unlike C and Perl, Python does not require semicolons at the end of lines (although they can be used to separate multiple statements on a single line). Comments in Python start with a #, as in shell and Perl. You do not need to declare variables before using them, and memory management is done automatically

Using Python Modules

Python ships with a rich set of core libraries, called modules. Modules contain useful functions that you can call from your code. Some modules, like sys (for system functions like input and output) are very commonly used, but there are also more specialized modules, like socket and ftplib (for networking). To use a module, you have to import it with a line at the top of your script, after which you can use any of the functions or objects it contains. For example, you could import the math module, which includes the variable pi. Here’s how you would print the value of pi while in the interactive interpreter:

>>> import math

>>> print math.pi

3 .14159265359

This chapter describes the most commonly used Python modules. You can find out more about the core modules at http://docs.python.org/modindex.html. And, just for fun, you might want to try entering import this in the Python interpreter.

Variables

Python variables do not have to be declared before you can use them. You do not need to add a $ in front of variable names, as you do in Perl or in shell scripts when getting the value of a variable. In fact, variable names cannot start with a special character. Python variable names, and the language in general, are case-sensitive.

This section explains how you can use numbers, strings, lists, and dictionaries. Python also has a file data type, which is described in the section “Input and Output.” Data types that are not covered here include tuples, which are very similar to lists, and sets, which are like unordered lists. For information about tuples and sets, see http://docs.python.org/tut/node7.html.

Numbers

Python supports integers and floating-point numbers, as well as complex numbers. Variables are assigned with=(equals), as shown.

x = y = −10 # Set both x and y equal to the integer −10

dist = 13.7 # This is a floating-point decimal

z = 7 – 3j # The letter j marks the imaginary part of a complex number

Python also allows you to enter numbers in scientific notation, hexadecimal, or octal. See http://docs.python.org/ref/integers.html and http://docs.python.org/ref/floating.html for more information.

You can perform all of the usual mathematical operations in Python:

>>> print 5 + 3

8

>>> print 2.5 – 1.5

1.0

>>> print (6–4j) * (6+4j) # Can multiply complex numbers just like integers.

(52+0j)

>>> print 7 / 2 # Division of two integers returns an integer!

3

>>> print 7.0 / 2

3.5

>>> print 2 ** 3 # The operator ** is used for exponentiation

8

>>> print 13 % 5 # Modulus is done with the % operator

3

In newer versions of Python, if you run your script with python -Qnew, integer division will return a floating-point number when appropriate.

Variables must be initialized before they can be used. For example,

>>> n = n + 1

NameError: name 'n' is not defined

will cause an error if n has not yet been set. This is true of all Python variable types, not just numbers.

Python supports some C-style assignment shortcuts:

>>> x = 4

>>> x += 2 # Add 2 to x

>>> print x

6

>>> x /= 1.0 # Divide x by 1.0, which converts it to a floating-point

>>> print x

6.0

But not the increment or decrement operators:

>>> X++

SyntaxError: invalid syntax

Functions such as float, int, and hex can be used to convert numbers. For example,

>>> print float(26)/5

5.2

>>> print int (5.2)

5

>>> print hex (26)

0x1a

You can assign several variables at once if you put them in parentheses:

(x, y) = (1.414, 2.828) # Same as x = 1.414 and y = 2.828

This works for any type of variable, not just numbers.

Useful Modules for Numbers

The math module provides many useful mathematical functions, such as sqrt (square root), log (natural logarithm), and sin (sine of a number). It also includes the constants e and pi. For example,

>>> import math

>>> print math.pi, math.e

3.14159265359 2.71828182846

>>> print math.sqrt(3**2 + 4**2)

5.0

The module cmath includes similar functions for working with complex numbers.

The random module includes functions for generating random numbers.

import random

x = random.random() # Random number, with 0 <= x < 1

diceroll = int (random.random() * 6) + 1 # Random integer from 1 to 6

dice2 = int (random.uniform(1, 7)) # Equivalent to previous statement.

Strings

Python allows you to use either single or double quotes around strings. Strings can contain escape sequences, including \n for newline and \t for tab. Here is an example of a string:

>>> title = "Alice's Adventures\nin Wonderland"

>>> print title

Alice's Adventures

in Wonderland

Printing Strings

Since variable names do not start with a distinguishing character (such as $), Python does not expand variables if you embed them in a string. One way to print variables as part of a string is with the concatenation operator, +, which allows you to combine several strings:

>>> name = "Alice"

>>> print "The value of name is " + name

The value of name is Alice

There are some drawbacks to this method. The concatenation operator only works on strings, so variables of other data types (such as numbers) must be converted to a string with str() in order to concatenate them. When many variables and strings are combined in one statement, the result can be messy:

>>> (n1, n2, n3) = (5, 7, 2)

>>> print "First:" + str(n1) + "Second:" + str (n2) + "Third:" + str (n3)

First: 5 Second: 7 Third: 2

You can shorten this a little bit by giving print a list of arguments separated by commas. This allows you to print numbers without first converting them to strings. In addition, print automatically adds a space between each term. So you could replace the print statement in the previous example with

>>> print "First:", n1, "Second:", n2, "Third:", n3

First: 5 Second: 7 Third: 2

If you add a comma at the end of a print statement, print will not add a newline at the end, so this example will print the same line as the previous two:

print "First:", n1,

print "Second:", n2,

print "Third:", n3

Another way to include variables in a string is with variable interpolation, like this:

>>> print "How is a %s like a %s?" % ('raven', 'writing desk')

How is a raven like a writing desk?

>>> year = 1865

>>> print "It was published in %d." % year

It was published in 1865.

>>> print "%d times %f is %f" % (4, 0.125, 4 * 0.125)

4 times 0.125000 is 0.500000

The operator % (which works a little like the C command printf) replaces the format codes embedded in a string with the values that follow it. The format codes include %s for a string, %d for an integer, and %ffor a floating-point value. The full set of codes you can use to format strings is documented at http://docs.python.org/lib/typesseq-strings.html.

Because the % formatting operator produces a string, it can be used anywhere a string can be used. So, for example,

print len("Hello, %s" % name)

will print the length of the string after the value of name has been substituted for %s.

String Operators

As you have seen, the + operator concatenates strings. For example,

fulltitle = title + '\nby Lewis Carroll'

The * operator repeats a string some number of times, as in

print '-' * 80 # Repeat the - character 80 times.

which prints a row of dashes.

The function len returns the length of a string.

length = len("Jabberwocky\n")

The length is the total number of characters, including the newline at the end, so in this example length would be 12.

You can index the characters in strings as you would the values in an array (or a list). For example,

print name [3]

will print the fourth character in name (since the first index is 0, the fourth index is 3).

String Methods

This sections lists some of the most common string methods. A complete list can be found at http://docs.python.org/lib/string-methods.html.

The method strip() removes white space from around a string. For example,

>>> print " Jabberwocky ".strip()

Jabberwocky

You can also use lstrip() or rstrip() to remove leading or trailing white space.

The methods upper() and lower() convert a string to upper- or lowercase.

>>> str = "Hello, world"

>>> print str.upper (), str.lower ()

HELLO, WORLD hello, world

You can use the method center() to center a string:

>>> print "'Twas brillig, and the slithy toves".center(80)

'Twas brillig, and the slithy toves

>>> print "Did gyre and gimble in the wabe" . center (80)

Did gyre and gimble in the wabe

You can split a string into a list of substrings with split(). By itself, split() will divide the string wherever there is white space, but you can also include a character by which to divide the string. For example,

wordlist = sentence.split() # Split a sentence into a list of words

passwdlist = passwdline.split(':') # Split a line from /etc/passwd

join() is an interesting method that concatenates a list of strings into a single string. The original string is used as the separator when building the new string. For example,

print ":".join(passwdlist)

will restore the original line from /etc/passwd.

To find the first occurrence of a substring, use find(), which returns an index. Similarly, rfind() returns the index of the last occurrence.

scriptpath = "/usr/bin/python"

i = scriptpath.rfind('/') # i = 8

You can use replace() to replace a substring with a new string:

newpath = oldpath.replace('perl', 'python')

This example replaces perl with python every time it occurs in oldpath, and saves the result in newpath. The method count() can be used to count the number of times a substring occurs.

More powerful tools for working with strings are discussed in the section “Regular Expressions” later in this chapter.

Lists

A list is a sequence of values. For example,

mylist = [3, "Queen of Hearts", 2.71828, x]

Lists can contain any types of values (even other lists). There is no limit to the size or number of elements in a list, and you do not need to tell Python how big you want a list to be.

Each element in a list can be accessed or changed by referring to its index (starting from 0 for the first element):

print mylist[1] # Print "Queen of Hearts"

mylist[0] += 2 # Change the first element to 5

You can also count backward through the array with negative indices. For example, mylist[1] is the last element in mylist, mylist[2] is the element before that, and so on.

You can get the number of elements in a list with the function len(), like this:

size = len (mylist) # size is 4, because mylist has 4 elements

The range() function is useful for creating lists of integers. For example,

numlist = range (10) # numlist = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

creates a list containing the numbers from 0 to 9.

List Slices

Python allows you to select a subsection of a list, called a slice. For example, you could create a new list containing elements 2 through 5 of numlist:

numlist = range (10)

sublist = numlist[2:6] # sublist=[2, 3, 4, 5]

In this example, the slice starts at index 2 and goes up to (but does not include) index 6. If you leave off the first number, the slice starts at the beginning of the string, so

sublist = numlist[:4] # sublist = [0, 1, 2, 3]

would assign the first 4 elements of numlist to sublist. Similarly, you can leave off the last number to include all the elements up to the end of the string, as in

sublist = numlist[4:] # sublist = [4, 5, 6, 7, 8, 9]

You can also use slices to assign new values to part of a list. For example, you could change the elements at indices 1 and 3 of mylist with

mylist[1:4] = [new1, mylist[2], new3]

Slices work on strings as well as lists. For example, you could remove the last character from a string with

print inputstr[:−1]

List Methods

The sort() method sorts the elements in a list. For strings, sort() uses ASCII order (in which all the uppercase letters go before the lowercase letters). For numbers, sort() uses numerical order (e.g., it puts “5” before “10”). sort() works in place, meaning that it changes the original list rather than returning a new list.

mylist.sort()

print mylist

The reverse() method reverses the order of the elements in a list. Like sort(), reverse() works in place. The sort() and reverse() methods can be used one after another to put a list in reverse ASCII order.

You can add a new element at the end of a list with append(). For example,

mylist.append(newvalue) # Same as mylist[len(mylist)] = newvalue

extend() is just like append(), but it appends all the elements from another list.

To insert an element elsewhere in a list, you can use insert():

mylist.insert(0, newvalue) # Insert newvalue at index 0

This will cause all the later elements in the list to shift down one index. To remove an element from a list, you can use pop(). pop() removes and returns the element at a given index. If no argument is given, the last element in the list is removed.

print mylist.pop(0) # Print and remove the first element of mylist

mylist.pop() # Remove the last element

Dictionaries

A dictionary is like a list, but it uses arbitrary values, called keys, for the indices. Dictionaries are sometimes called associative arrays. (In Perl, they are called hashes. There is no equivalent intrinsic type in C, although they can be implemented using hashtables.)

As an example, suppose you want to be able to look up a user’s full name when you know that user’s login name. You could create a dictionary, like this:

fullname = {"lewisc": "L. Carroll", "mhatter": "Bertrand Russell"}

This dictionary has two entries, one for user lewisc and one for user mhatter. To create an empty dictionary, use an empty pair of brackets, like this:

newdictionary = {}

Adding to a dictionary is just like adding to a list, but you use the key as the index:

fullname["aliddell"] = "Alice Liddell"

Similarly, you can look up values like this:

print fullname["mhatter"]

which will print “Bertrand Russell”.

Note that since keys are used to look up values, each key must be unique. To store more than one value for a key, you could use a list, like this:

dict={}

dict['v'] = ['verse', 'vanished', 'venture']

dict['v'].append('vorpal')

Here’s a longer example of a dictionary:

daysweek = {

"Sunday": 1,

"Monday": 2,

"Tuesday": 3,

"Wednesday": 4,

"Thursday": 5,

"Friday": 6,

"Saturday": 7,

}

print "Thursday is day number", daysweek['Thursday']

This dictionary links the names of the days to their positions in the week.

Working with Dictionaries

You can get a list of all the keys in a dictionary with the method keys, like this

listdays = daysweek.keys() # "Sunday", "Monday", etc

To check if a dictionary contains a particular key, use the method has_key:

>>> print daysweek. has_key ("Fri")

False

>>> print daysweek.has_key("Friday")

True

This is commonly used in an if statement, so that you can take some particular action depending on whether a key is defined or not. (Note that some older versions of Python will print 0 for false and 1 for true.)

The del statement removes a key (and the associated value):

>>> del daysweek [ "Saturday"]

>>> del daysweek ["Sunday"]

>>> print daysweek

{'Monday': 2, 'Tuesday': 3, 'Wednesday': 4, 'Thursday': 5, 'Friday': 6}

The function len returns the number of entries in the dictionary:

print "There are", len (daysweek), "entries."

If you have never used associative arrays, then dictionaries may seem strange at first. But they are remarkably useful, especially for working with text. You will see examples of how convenient dictionaries can be later in this chapter.

Control Structures

In order to write more interesting scripts, you will need to know about control structures.

if Statements

An if statement tests to see if a condition is true. If it is, the block of code following it is executed. This example tests to see if the value of x is less than 0. If so, x is multiplied by −1 to make it positive:

if x < 0 :

x *= −1

The important thing to note here is that there are no begin/end delimiters (such as curly braces) to group the statements following if. Instead, the test is followed by a colon (:), and the block of code is indented. As mentioned earlier, Python is sensitive to indentation. When you are done with the if statement, you simply return to entering your code in the first column.

When you enter the first line of an if statement in the interactive interpreter, it will prompt you with “…” for the remaining lines. You must indent these lines, just as you would if you were entering them in a script. To end your if block, just enter a blank line. The preceding if statement would look like this in the interpreter:

>>> if x < 0 :

... x *= −1

...

if statements can have an else clause that gets executed if the initial condition is not met. This example checks whether a dictionary contains a particular key:

if dict.has_key(key):

print "%s is in dict" % key

else:

print "%s could not be found. Adding %s to dictionary..." % (key, key)

dict[key]=value

You can also include elif clauses that test additional conditions if the first one is false. This example has one elif clause:

if str.islower() :

print str, "is all lower case"

elif str.isupper() :

print str, "IS ALL UPPERCASE"

else :

print str, "Combines Upper And lower case letters"

Comparison Operators

The comparison == is used to see if two values are equal. Unlike other languages, Python allows you to use the same comparison operator for strings and other objects as well as for numbers. But be careful not to use =, which in the next example would set x to 0. To test whether two values are different, you can use !=, which also works on any type of object.

Python uses the keywords and, or, and not for the corresponding logical tests, as in this example:

if x == 0 or y == 0 :

print "Cannot divide by 0."

elif not (x > 0 and y > 0) :

print "Please enter positive values."

for Loops

The for loop iterates through the elements of a list. This example will print each list element on its own line:

for element in list:

print element

The syntax here could be read as “for each element in the list, print the element”.

As you will see in the section “Input and Output”, for loops are very useful for looping through the lines in a file. They can also be used with ranges, to execute a loop a certain number of times. For example,

for i in range (10):

print "%d squared is %d" % (i, i**2)

will use the list [0, 1, 2, 3,... 9] to print the squares of the integers from 0 to 9. To create the list [1, 2, 3, … 10], you could use range(1, 11).

The for loop is also handy for working with dictionaries. This loop will iterate through the keys of a dictionary and print each key/value pair:

for key in userinfo.keys() :

print key, "−>", userinfo [key]

while Loops

The while loop repeats a block of code as long as a particular condition is true. For example, this loop will repeat five times. It will halt when the value of n is 0.

(n, sum)=(5, 0)

while n > 0 :

sum += n

n −= 1

To create an infinite loop, you can use the keyword True or False. You can exit from an infinite loop with break. This loop, for example,

while True :

print inputlist.pop(0)

if inputlist[0] == "." :

break

will print each element in inputlist. When the next element in the list is ".", the loop will terminate.

Defining Your Own Functions

The keyword def is used to define a function. This example shows a function named factorial that returns the factorial of an integer. It sends the value 5 to the function and returns the value of fact.

def factorial(n):

fact=1

for num in range(1, n+1):

fact *= num

return fact

print "5 factorial is", factorial(5)

Python also allows you to define small functions called lambdas that can be passed as arguments. For example, you could use the map function to apply a lambda expression to each element in a list. You can learn how to use lambda and map in sections 4 and 5 of the Python Tutorial at http://docs.python.org/tut/tut.html.

Variable Scope

Variables in the main body of your code (outside of functions) are global. Global variables can be accessed from any part of the code, including inside a function. While this may seem convenient, it can cause some serious problems. For example, if you happen to use the same variable name for a global variable and for a variable inside a function, you could accidentally change the value of the global variable when you call the function. In fact, it’s best to avoid using global variables as much as possible. One easy way to do this is to put all of your code in functions, including the main body of code. You can then include a single line to call the main function, like this:

#!/usr/bin/python

#

# wordcount.py : count the words in the filename arguments

#

import fileinput, re

def sortkeys(dict) :

keylist=dict.keys()

keylist.sort()

return keylist

def printwords (wordfreq, totalwords) :

for word in sortkeys(wordfreq) :

print "%d %s" % (wordfreq[word], word)

print "%d total words found." % totalwords

def countwords(splitline, wordfreq, totalwords) :

for word in splitline :

if not wordfreq.has_key(word) :

wordfreq[word]=1

else :

wordfreq[word] += 1

totalwords += 1

def main() :

(wordfreq, totalwords)=( {}, 0)

for line in fileinput.input() :

splitline=re.findall (r"\w+", line.lower())

countwords(splitline, wordfreq, totalwords)

printwords (wordfreq, totalwords)

main()

This program uses a dictionary (wordfreq) to count the frequency of each word in the input. The words are saved as keys in the dictionary, where the number of times the words appear are the values. It two methods from modules you haven’t seen yet: fileinput.input() allows you to iterate through the lines in the input, and the function re.findall() is used to divide each line into a list of lowercase words. You will learn how to use these functions in the next two sections.

Notice that even though this program is relatively short, it has been broken into four separate functions. The functions make it easier to quickly understand what each part of the program does. For example, just by reading the code in main(), you can see that the program iterates through input, splits lines, counts words, and prints some output. Even without reading the other sections-and without knowing exactly how fileinput.input() and re.findall() work-you would be able to make a pretty good guess about what the program was for.

Input and Output

You know how to print to standard output with print, but by now you are probably wondering how to read from standard input or print to standard error, and how to work with filename arguments and other files.

Getting Input from the User

The simplest way to read input from the keyboard (actually, from standard input), is with the function raw_input(). For example,

print "Please enter your name:",

name = raw_input()

print "Hello," + name

In this example, the comma after the first print statement prevents it from including a newline at the end of the string.

Another way to write the same thing is to include the prompt string as an argument to raw_input:

>>> name=raw_input("Please enter your name: ")

Please enter your name: Alice

>>> print "Hello,"+name

Hello, Alice

raw_input() does not return the newline at the end of the input.

File I/0

To open a file for reading input, you can use

filein=open(filename, 'r')

where filename is the name of the file and r indicates that the file can be used for reading. The variable filein is a file object, which includes the methods read(), readline(), and readlines().

To get just one line of text at a time, you can use readline(), as in

#!/usr/bin/python

filein = open ("input.txt", 'r')

print "The first line of input.txt is"

print filein.readline()

print "The second line is"

print filein.readline()

which will print the first two lines of input.txt

When you run this script, the output might look something like

The first line of input.txt is

The second sentence is false.

The second line is

The first sentence was true.

As you can see, there is an extra newline after each line from the file. That’s because readline() includes the newline at the end of strings, and the print statement also adds a newline. To fix this, you can use the rstrip method to remove white space (including a newline) from the end of the string, like this:

print filein.readline().rstrip()

Alternatively, you could use a comma to prevent print from appending a newline:

print filein.readline(),

To read all the lines from a file into a list, use readlines(). For example, you could center the lines in a file like this:

for line in filein.readlines() :

print line.rstrip().center(80)

This script uses the center method for strings to center each line (assuming a width of 80 characters). The readlines method also includes the newline at the end of each line, so line.rstrip() is used to strip the newline from line before centering it.

To read the entire file into a single string, use read():

for filename in filelist :

print "*** %s ***" % filename # Display the name of the file.

filein=open(filename, 'r') # Open the file.

print filein.read(), # Print the contents.

filein.close() # Close the file.

This script will print the contents of each file in filelist to standard output. The comma at the end of the print statement will prevent print from appending an extra newline at the end of the output.

To open a file for writing output, you can use

fileout=open(filename, 'w')

If you use ‘a’ instead of ‘w’, it will append to the file instead of overwriting the existing contents.

You can write to the file with write(), which writes a string to the file, or writelines(). This example uses the time module to add the current date and time to a log file:

import time

logfile = open (logname, 'a')

logfile.write(time.asctime() + "\n")

Note that write does not automatically add a newline to the end of strings.

You can also use the method writelines(), which copies the strings in a list to the file. As with write(), you must include a newline at the end of each string if you want them to be on separate lines in the file.

To close a file when you are done using it, use the close method:

filehandle.close()

Standard Input, Output, and Error

The sys (system) module has objects for working with standard input, output, and error. As with any module, to use sys you must import it by including the line import sys at the top of your script.

The file object sys.stdin lets you read from standard input. You can use the methods read(), readline(), and readlines() to read from sys.stdin just as you would any normal file.

print "Type in a message. Enter Ctrl-D when you are finished."

message = sys.stdin.read()

The object sys.stderr allows you to print to standard error with write() or writelines(). For example,

sys.stderr.write("Error: testing standard error\n")

Similarly, the file object sys.stdout allows you to print to standard output. You could use sys.stdout.write as a replacement for print, as in

sys.stdout.write("An alternate way to print\n")

Using Filename Arguments

The sys module also lets you read command-line arguments. The variable sys.argv is a list of the arguments to your script. The name of the script itself is in sys.argv[0]. For example,

$ cat showargs.py

#!/usr/bin/python

import sys

print "You ran %s with %d arguments:" % (sys.argv[0], len (sys.argv[1:]))

print sys.argv[1:]

$ ./showargs.py here are 4 arguments

You ran ./showargs.py with 4 arguments:

['here', 'are', '4', 'arguments']

Note that this script uses the slice sys.argv[1:] to skip the first entry in sys.argv (the name of the script itself) when it prints the command-line arguments.

To read from filename arguments, you can use the module fileinput, which allows you to iterate through all the lines in the files in a list. By default, fileinput.input() opens each command-line argument and iterates through the lines the files contain. A typical use might look something like

#!/usr/bin/python

import fileinput

for line in fileinput.input():

print "%s: %s" % (fileinput.filename(), line),

This will display the contents of each filename argument, along with the name of the file. It will interpret the argument-as a reference to standard input, and it will use standard input if no filename arguments are given. For other uses of fileinput, see http://docs.python.org/lib/module-fileinput.html.

Alternatively, you can open filename arguments just as you would any other files. For example, this script will append the contents of the first argument to the second argument:

#!/usr/bin/python

import sys

filein=open (sys.argv[1], 'r')

fileout=open (sys.argv[2], 'a')

fileout.write(filein.read())

Using Command-Line Options

The getopt module contains the getopt function, which works rather like the shell scripting command with the same name (described in Chapter 20). You can use getopt to write scripts that take command-line options, as in

$ ./optionScript.py -ab -c4 -d filename

To learn how to use getopt, see the Python documentation at http://docs.python.org/lib/module-getopt.html.

Interacting with the UNIX System

The module os (operating system) allows Python to interact directly with the files on your system and to run UNIX commands from within a script.

File Manipulation

One of the commonly used functions in os is os.path.isfile(), which checks if a file exists:

import os, sys

if not os.path.isfile (argv[1]) :

sys.stderr.write("Error: %s is not a valid filename\n" % argv[1])

Similarly, os.path.isdir() can be used to see if a string is a valid directory name. The function os.path.exists() checks if a pathname (for either a file or a directory) is valid.

To get a list of the files in a directory, you can use os.listdir(), as in

for filename in os.listdir("/home/alice") :

print filename

The list will include hidden files such as .profile. You can get the path of the current directory with os.getcwd(), so you can get a list of the files in the current directory with

filelist = os .listdir (os.getcwd())

A few of the other useful functions included in the os module are mkdir(), which creates a directory, rename(), which moves a file, and remove(), which deletes a file. To copy files, you can use the module shutil, which has the functions copy(), to copy a single file, and copytree(), to recursively copy a directory For example,

shutil.copytree(projectdir, projectdir + ".bak")

will copy the files in projectdir to a backup directory

Running UNIX Commands

You can run a UNIX command in your script with os.system(). For example, you could call uname -a (which displays the details about your machine, including the operating system, hostname, and processor type) like this:

os.system("uname -a") # Print system information

However, os.system() does not return the output from uname -a, which is sent directly to standard output. To work with the output from a command, you must open a pipe.

Opening Pipelines

Python lets you open pipes to or from other commands with os.popen(). For example,

readpipe = os.popen("ls -la", 'r') # Similar to ls -la pythonscript.py

will allow you to read the output from ls -la with readpipe, just as you would read input from sys.stdin. For example, you could print each line of input from readpipe:

for line in readpipe.readlines() :

print line.rstrip()

You can also open a command to accept output. For example,

writepipe=os.popen("lpr", 'w') # Similar to pythonscript.py lpr

writepipe.write(printdata) # Send printdata to printer

will let you send output to lpr.

Regular Expressions

A regular expression is a string used for pattern matching. Regular expressions can be used to search for strings that match a certain pattern, and sometimes to manipulate those strings. Many UNIX System commands (including grep, vi, emacs, sed, and awk) use regular expressions for searching and for text manipulation.

The re module in Python gives you many powerful ways to use regular expressions in your scripts. Only some of the features of re will be covered here. For more information, see the documentation pages at http://docs.python.org/lib/module-re.html

Pattern Matching

In Python, a regular expression object is created with re.compile(). Regular expression objects have many methods for working with strings, including search(), match(), findall(), split(), and sub(), Here’s an example of using a pattern to match a string:

import re

maillist = ["alice@wonderland.gov", "mgardner@sciam.bk",

"smullyan@puzzleland.bk"]

emailre = re.compile(r"land")

for email in maillist :

if emailre.search(email) :

print email, "is a match."

This example will print the addresses alice@wonderland.gov and smullyan@puzzleland.bk, but not mgardner@sciam.bk. It uses re.compile(r"land”) to create an object that can search for the string land. (The ris used in front of a regular expression string to prevent Python from interpreting any escape sequences it might contain.) This script then uses emailre.search(email) to search each e-mail address for land, and prints the ones that match.

You can also use the regular expression methods without first creating a regular expression object. For example, the command re.search(r“land”, email) could be used in the if statement in the preceding example, in place of emailre.search(email). In short scripts it may be convenient to eliminate the extra step of calling re.compile(), but using a regular expression object (emailre, in this example) is generally more efficient.

The method match() is just like search(), except that it only looks for the pattern at the beginning of the string. For example,

regexp = re.compile(r'kn', re.I)

for element in ["Knight", "knave", "normal"] :

if regexp.match(element) :

print regexp.match (element).group ()

will find strings that start with “kn”. The re.I option in re.compile(r‘kn’, re.I) causes the match to ignore case, so this example will also find strings starting with “KN”. The method group() returns the part of the string that matched. The output from this example would look like

Kn

kn

Constructing Patterns

As you have seen, a string by itself is a regular expression. It matches any string that contains it. For example, venture matches “Adventures”. However, you can create far more interesting regular expressions.

Certain characters have special meanings in regular expressions. Table 23–1 lists these characters, with examples of how they might be used.

Table 23–1: Python Regular Expressions

Char

Definition

Example

Matches

.

Matches any single character.

th.nk

think, thank, thunk, etc.

\

Quotes the following character.

script\.py

script.py

*

Previous item may occur zero or more times in a row.

.*

any string, including the empty string

+

Previous item occurs at least once, and maybe more.

\*+

*, *****, etc.

?

Previous item may or may not occur.

web\.html?

web.htm, web.html

{n,m}

Previous item must occur at least n times but no more than m times.

\*{3,5}

***, ****, *****

( )

Group a portion of the pattern.

script(\.pl)?

script, script.pl

|

Matches either the value before or after the |.

(R|r)af

Raf, raf

[ ]

Matches any one of the characters inside. Frequently used with ranges.

[QqXx]*

Q, q, X, or x

[^]

Matches any character not inside the brackets.

[^AZaz]

any nonalphabetic character, such as 2

\n

Matches whatever was in the nth set of parenthesis.

(croquet)\1

croquetcroquet

\s

Matches any white space character.

\s

space, tab, newline

\s

Matches any non-white space.

the \S

then, they, etc. (but not the)

\d

Matches any digit.

\d*

0110, 27, 9876, etc.

\D

Matches anything that’s not a digit.

\D+

same as [^0–9]+

\w

Matches any letter, digit, or underscore.

\w+

t, AL1c3, Q_of_H, etc.

\W

Matches anything that \w doesn’t match.

\W+

&#*$%, etc.

\b

Matches the beginning or end of a word.

\bcat\b

cat, but not catenary or concatenate

^

Anchor the pattern to the beginning of a string.

^ If

any string beginning with If

$

Anchor the pattern to the end of the string.

\.$

any string ending in a period

Remember that it is usually a good idea to add the character r in front of a regular expression string. Otherwise, Python may perform substitutions that change the expression.

Saving Matches

One use of regular expressions is to parse strings by saving the portions of the string that match your pattern. For example, suppose you have an e-mail address, and you want to get just the username part of the address:

email = 'alice@wonderland.gov'

parsemail = re.compile(r"(.*)@(.*)")

(username, domain)=parsemail.search(email).groups()

print "Username:", username, "Domain:", domain

This example uses the regular expression pattern “(.*)@(.*)” to match the e-mail address. The pattern contains two groups enclosed in parentheses. One group is the set of characters before the @, and the other is the set of characters following the @. The method groups() returns the list of strings that match each of these groups. In this example, those strings are alice and wonderland.gov.

Finding a List of Matches

In some cases, you may want to find and save a list of all the matches for an expression. For example,

regexp = re.compile(r"ap*le")

matchlist = regexp.findall(inputline)

searches for all the substrings of inputline that match the expression "ap*le". This includes strings like ale or apple. If you also want to match capitalized words like Apple, you could use the regular expression

regexp = re.compile(r"ap*le", re.I)

instead.

One common use of findall() is to divide a line into sections. For example, the sample program in the earlier section “Variable Scope” used

splitline = re.findall (r"\w+", line.lower())

to get a list of all the words in line.lower().

Splitting a String

The split() method breaks a string at each occurrence of a certain pattern.

Consider the following line from the file /etc/passwd:

line = "lewisc:x:3943:100:L. Carroll:/home/lewisc:/bin/bash"

We can use split() to turn the fields from this line into a list:

passre = re.compile(r":")

passlist = passre.split(line)

# passlist = ['lewisc', 'x', 3943, 100, 'L. Carroll', '/home/lewisc, '/bin/bash']

Better yet, we can assign a variable name to each field:

(logname, passwd, uid, gid, gcos, home, shell) = re.split (r":", line)

Substitutions

Regular expressions can also be used to substitute text for the part of the string matched by the pattern. In this example, the string “Hello, world” is transformed into “Hello, sailor”:

hello = "Hello, world"

hellore = re.compile(r"world")

newhello = hellore.sub("sailor", hello)

This could also be written as

hello = "Hello, world"

newhello = re.sub (r"world", "sailor", hello)

Here's a slightly more interesting example of a substitution. This will replace all the digits in the input with the letter X:

import re, fileinput

matchdigit = re.compile(r"\d")

for line in fileinput.input():

print matchdigit.sub('X', line)

Creating Simple Classes

In all of the examples so far, we have been using Python as a procedural programming language, like C or shell scripting (or most Perl scripts). You can also use Python for object-oriented programming. If you are not familiar with object-oriented programming, see Chapter 25, which explains the concepts and terminology Most of the books on Python listed at the end of this chapter also cover object-oriented programming.

To define a class, you can use the form

class MyClass (ParentClass) :

def method1(self) :

# insert code here, such as

self.x = 1024

def method2 (self, newx) :

self.x = newx

def method3(self) :

return self.x

This creates a class named MyClass. The (ParentClass) is optional. If it is included, MyClass inherits from ParentClass. (Python also supports multiple inheritance.)

Classes typically contain one or more methods. In the previous example, MyClass has three methods. method1 can be called without any arguments. It sets the member variable x to 1024. The second method, method2, is called with an argument, which it uses to set the value of x. The last method in this example, method3, returns the value of x.

Here’s how you might use this class in the Python interpreter:

>>> obj = MyClass()

>>> obj .method1 ()

>>> print obj .x

1024

>>> obj .method2 ("Hello, world")

>>> print obj .method3 ()

Hello, world

For more information about classes and objects in Python, see the Classes section of the Python Tutorial at http://docs.python.org/tut/node11.html.

Exceptions

Like C++ and Java, Python supports exception handling. For example, if you attempt to open a file that does not exist, or a file you do not have permission to read, Python will raise an IOError. You can handle this in your code. For example,

try :

filein = open (inputfile, 'r')

except IOError :

sys.stderr.write( "Error: cannot open %s\n" % inputfile)

sys.stderr.write( "%s: %s\n" % (sys.exc_type, sys.exc_value))

sys.exit(2)

In this example, Python will try to open inputfile. If it successfully opens the file, execution will continue after the try/except block. If it cannot open the file, however, Python will throw an IOError exception. The except statement catches the exception, and executes a block of code to handle it. The variable sys.exc_type gives the type of exception (although in this case, we already know that it was an IOError). The variable sys.exc_value has an error message generated by Python that may help to determine what went wrong. Finally, sys.exit() causes the script to terminate. In this example, sys.exit(2) returns the exit code 2 to indicate that there was error.

Because Python automatically generates detailed error messages, exception handling isn’t always necessary. Chapter 25 has a detailed description of how exception handling works in Java, which may be helpful if you want to learn more about exceptions. In addition, the books in the section “How to Find Out More” at the end of this chapter have more complete coverage of exception handling in Python.

Troubleshooting

The following is a list of problems that you may run into when running your scripts, and suggestions for how to fix them.

Problem: You can’t find pythonon your machine.

Solution: From the command prompt, try the command which python.

If you get back a “command not found” message, try typing

$ ls /usr/bin/python

or

$ ls /usr/local/bin/python

If one of those commands shows that you do have python on your system, check your PATH variable and make sure it includes the directory containing python. Also check your scripts to make sure you entered the full pathname correctly If you still can’t find it, you may have to download and install python yourself, from http://www.python.org.

Problem: You get “Permission denied” when you try to run a script.

Solution: Check the permissions on your script.

For a python script to run, it needs both read and execute permissions. For instance,

$ ./hello.py

Can't open python script "./hello.py": Permission denied

$ ls −1 hello.py

---x------ 1 kili 46 Apr 23 13:14 hello.py

$ chmod 500 hello.py

$ ls −1 hello.py

-r-x------ 1 kili 46 Apr 23 13:14 hello.py

$ ./hello.py

Hello, World

Problem: You get a SyntaxError (“invalid syntax”).

Solution: Remember to use a colon (:) in your if statements, loops, and function definitions.

Although Python does not require curly braces around blocks of code, or semicolons at the end of each line, it does require the : before each indented block of code.

Problem: You still get an error.

Solution: Check that you are using tabs or spaces, but not both, to indent your blocks.

A common mistake is to use tabs to indent some lines and spaces to indent others. Depending on the width your editor uses to display tabs, code that appears to line up may actually be indented incorrectly If you are working on code that may be shared with other programmers, spaces are usually a safer choice than tabs. They will look the same in any editor, and a simple command such as

$ grep " " *.py

can be used to quickly find any cases where tabs have been used by mistake. (To include a TAB character in a command line, as in the preceding example, use CTRL-V TAB.) In addition, scripts run with python -t will generate a warning if tabs and spaces are used in the same block of code.

The command

$ sed 's/ \t/ /g' tabfile > spacefile

replaces all the tabs in tabfile with spaces, and saves the result in spacefile. An even better solution is this simple Python program, which will replace the tabs in a list of files:

#!/usr/bin/python

# replace tabs with spaces

# usage: tabreplace.py n filenames

# where n is the number of spaces to use instead of each tab

import sys, fileinput

# Loop through the lines in the files (starting from sys.argv[2])

# Use a special flag to send the output from print directly to the files

# In each line, replace all tab characters with sys.argv[1] spaces

for line in fileinput.input(sys.argv[2:], True):

print 1ine.expandtabs(int(sys.argv[1])),

The method expandtabs() actually replaces each tab with up to n spaces, so that the text will still line up correctly in columns.

Problem: You get a NameError (“name is not defined”).

Solution: Remember to import modules, and to include the module name when using its objects.

In order to use standard I/O, regular expressions, and other important language features, you must import Python modules. You must also include the module name when you use objects or functions from the module. For example, in order to use the math module to compute a square root, you would have to write

import math

sqroot = math.sqrt (121)

Problem: Test comparisons with strings fail unexpectedly.

Solution: Make sure you remove the newline at the end of strings.

Remember that strings that have been read in from files or from standard input typically have newlines at the end. If you do not remove the newlines, not only with your print statements tend to add an extra line, but your test comparisons will often fail. You can use

str = str.rstrip ()

to remove white space (including newlines) from the end of your strings.

Problem: Running python from the command line gives an error message or no output at all.

Solution: Make sure you are enclosing your instructions in single quotes, as in

$ python -c 'print "Hello, world"'

Problem: Running your Python script gives unexpected output.

Solution: Make sure you are running the right script!

This might sound silly, but one classic mistake is to name your script "test" and then run it at the command line only to get nothing:

$ test

$

The reason is that you are actually running /bin/test instead of your script. Try running your script with the full pathname (e.g., /home/kili/PythonScripts/test.py) to see if that fixes the problem.

Problem: Your program still doesn’t work correctly.

Solution: Try running your code with a debugger.

python comes with a command-line debugger called pdb. One way to run your script with pdb is with the -m flag, like this:

$ python -m pdb myscript.py command-line-argument

pdb will display the first line of code in your file and wait for input. For information about the commands pdb recognizes and how to use it for debugging, see the documentation at http://docs.python.org/lib/debugger-commands.html. A list of other debuggers for Python can be found at http://wiki.python.org/moin/PythonDebuggers/.

Alternatively, you could use a Python IDE that has a graphical debugger. There are quite a few IDEs for Python, some free and some commercial, including IDLE, which ships with python. The Python wiki has a list of IDEs at http://wiki.python.org/moin/IntegratedDevelopmentEnvironments/. Many of these IDEs also have syntax checking features that can help you spot errors in your code as you work.

Summary

There are many features of Python that were not covered in this chapter. Chapter 27 has some information about using the CGI module of Python to write CGI scripts, but for further information about the language you will need to find a book devoted to Python. Several good references are mentioned in the later section “How to Find Out More.”

Table 23–2 lists some of the most important Python functions introduced in this chapter.

Table 23–2: Python Keywords

Function

Use

print

Print a string to standard output.

raw input()

Read a string from standard input.

import

Load a module.

rstrip()

Remove trailing white space from a string. Other string methods: lower(), center(), split(), join(), find(), replace(), count()

sort()

Sort the items in a list. Other list methods: reverse(), append(), extend(), insert(), pop()

keys()

Get a list of keys in a dictionary. Use has_key() to test for keys.

len()

Get the length of a string, list, or dictionary.

del

Delete an element from a list or dictionary.

range()

Generate a list of integers.

if . . . elif . . . else

Conditional statement.

for . . . in

Loop through the elements in a list.

while

Loop while a condition is true. Can use break to exit.

open()

Open a file. File methods: read(), readline(), readlines(), write(), writelines(), close().

def

Define a procedure.

return

Exit from a procedure, returning a value.

class

Define a class.

try . . . except

Catch exceptions.

Table 23–3 lists the Python modules mentioned in this chapter. See the Python documentation at http://docs.python.org/modindex.html for details.

Table 23–3: Python Module

Module

Use

sys

Standard 1/0, command-line arguments

fileinput

Iterate through files, especially command-line arguments

getopt

Parse command-line options

os

UNIX commands and files

shutil

Copy files

re

Regular expressions

math

Mathematical functions

cmath

Complex number support

random

Generate random numbers

time

System time and date

CGI

CGI scripting

How to Find Out More

A very good introduction to Python for new programmers is

· Fehily, Chris. Python: Visual QuickStart Guide. 1st ed. Berkeley, CA: Peachpit Press, 2001.

For more experienced programmers who are interested in a faster introduction to Python, Dive into Python, by Mark Pilgrim, is a good choice. It is available either as a book or as a free download.

· Pilgrim, Mark. Dive into Python. 1st ed. Berkeley, CA: Apress, 2004.

· http://diυeintopython.org/

This book is a very thorough guide to Python, from the most basic beginner’s material all the way through advanced topics such as web development:

· Norton, Peter, et al. Beginning Python. 1st ed. Indianapolis, Indiana: Wrox-Wiley, 2005.

Like all books in the Nutshell series, Python in a Nutshell is a very good reference to the language. It is often easier to use than the online documentation.

· Martelli, Alex. Python in a Nutshell 1st ed. Sebastopol, CA: O’Reilly Media, 2003.

This is an interesting way to explore new uses of Python, and also a helpful reference:

· Martelli, Alex, and David Ascher, ed. Python Cookbook. 2nd ed. Sebastopol, CA: O’Reilly Media, 2005.

The official web site for Python is

· http://www.python. org/

Documentation, including a tutorial, can be found at

· http://docs.python. org/