Overview of Python Topics - DOING MATH WITH PYTHON Use Programming to Explore Algebra, Statistics, Calculus, and More! (2015)

DOING MATH WITH PYTHON Use Programming to Explore Algebra, Statistics, Calculus, and More! (2015)

B
Overview of Python Topics

image

The aim of this appendix is twofold: to provide a quick refresher on some Python topics that weren’t thoroughly introduced in the chapters and to introduce topics that will help you write better Python programs.

if __name__ == '__main__'

Throughout the book, we’ve used the following block of code, where func() is a function we’ve defined in the program:

if __name__ == '__main__':
# Do something
func()

This block of code ensures that the statements within the block are executed only when the program is run on its own.

When a program runs, the special variable __name__ is set to __main__ automatically, so the if condition evaluates to True and the function func() is called. However, __name__ is set differently when you import the program into another program (see “Reusing Code” on page 235).

Here’s a quick demonstration. Consider the following program, which we’ll call factorial.py:

# Find the factorial of a number
def fact(n):
p = 1
for i in range(1, n+1):
p = p*i
return p

➊ print(__name__)

if __name__ == '__main__':
n = int(input('Enter an integer to find the factorial of: '))
f = fact(n)
print('Factorial of {0}: {1}'.format(n, f))

The program defines a function, fact(), that calculates the factorial of the integer passed to it. When you run it, it prints __main__, which corresponds to the print statement at ➊, because __name__ is automatically set to __main__. Then, it asks an integer to be entered, calculates the factorial, and prints it:

__main__
Enter an integer to find the factorial of: 5
Factorial of 5: 120

Now, say you need to calculate the factorial in another program. Instead of writing the function again, you decide to reuse this function by importing it:

from factorial import fact
if __name__ == '__main__':
print('Factorial of 5: {0}'.format(fact(5)))

Note that both the programs must be in the same directory. When you run this program, you’ll get the following output:

factorial
Factorial of 5: 120

When your program is imported by another program, the value of the variable __main__ is set to that program’s filename, without the extension. In this case, the value of __name__ is factorial instead of __main__. Because the condition __name__ == '__main__' now evaluates to False, the program doesn’t ask for the user’s input anymore. Remove the condition to see for yourself what happens!

To summarize, it’s good practice to use if __name__ == '__main__' in your programs so that the statements you want executed when your program is run as a standalone are also not executed when your program is imported into another program.

List Comprehensions

Let’s say we have a list of integers and we want to create a new list containing the squares of the elements of the original list. Here’s one way that we could do this that’s already familiar to you:

>>> x = [1, 2, 3, 4]
>>> x_square = []
➊ >>> for n in x:
x_square.append(n**2)
>>> x_square
[1, 4, 9, 16]

Here, we used a code pattern that we’ve used in various programs throughout the book. We create an empty list, x_square, and then successively append to it as we calculate the square. We can do this in a more efficient way using list comprehensions:

➌ >>> x_square = [n**2 for n in x]
>>> x_square
[1, 4, 9, 16]

The statement at ➌ is referred to as a list comprehension in Python. It consists of an expression—here, n**2—followed by a for loop, for n in x. Note that it basically allows us to combine the two statements at ➊ and ➋ into one to create a new list in one statement.

As another example, consider one of the programs we wrote in “Drawing the Trajectory” on page 51 to draw the trajectory of a body in projectile motion. In these programs, we have the following block of code to calculate the x- and y-coordinates of the body at each time instant:

# Find time intervals
intervals = frange(0, t_flight, 0.001)
# List of x and y coordinates
x = []
y = []
for t in intervals:
x.append(u*math.cos(theta)*t)
y.append(u*math.sin(theta)*t - 0.5*g*t*t)

Using list comprehension, you can rewrite the block of code as follows:

# Find time intervals
intervals = frange(0, t_flight, 0.001)
# List of x and y coordinates

x = [u*math.cos(theta)*t for t in intervals]
y = [u*math.sin(theta)*t - 0.5*g*t*t for t in intervals]

The code is more compact now, as you didn’t have to create the empty lists, write a for loop, and append to the lists. List comprehension lets you do this in a single statement.

You can also add conditionals to a list comprehension in order to selectively choose which list items are evaluated in the expression. Consider, once again, the first example:

>>> x = [1, 2, 3, 4]
>>> x_square = [n**2 for n in x if n%2 == 0]
>>> x_square
[4, 16]

In this list comprehension, we use the if condition to explicitly tell Python to evaluate the expression n**2 only on the even list items of x.

Dictionary Data Structure

We first used a Python dictionary in Chapter 4 while implementing the subs() method in SymPy. Let’s explore Python dictionaries in more detail. Consider a simple dictionary:

>>> d = {'key1': 5, 'key2': 20}

This code creates a dictionary with two keys—'key1' and 'key2'—with values 5 and 20, respectively. Only strings, numbers, and tuples can be keys in a Python dictionary. These data types are referred to as immutable data types—once created, they can’t be changed—so a list can’t be a key because we can add and remove elements from a list.

We already know that to retrieve the value corresponding to 'key1' in the dictionary, we need to specify it as d['key1']. This is one of the most common use cases of a dictionary. A related use case is checking whether the dictionary contains a certain key, 'x'. We can check that as follows:

>>> d = {'key1': 5, 'key2': 20}
>>> 'x' in d
False

Once we create a dictionary, we can add a new key-value pair to it, similar to how we can append elements to a list. Here’s an example:

>>> d = {'key1': 5, 'key2': 20}
>>> if 'x' in d:
print(d['x'])
else:
d['x'] = 1

>>> d
{'key1': 5, 'x': 1, 'key2': 20}

This code snippet checks whether the key 'x' already exists in the dictionary, d. If it does, it prints the value corresponding to it; otherwise, it adds the key to the dictionary with 1 as the corresponding value. Similar to Python’s behavior with sets, Python can’t guarantee a particular order of the key-value pairs in a dictionary. The key-value pairs can be in any order, irrespective of the order of insertion.

Besides specifying the key as an index to the dictionary, we can also use the get() method to retrieve the value corresponding to the key:

>>> d.get('x')
1

If you specify a nonexistent key to the get() method, None is returned. On the other hand, if you do so while using the index style of retrieving, you’ll get an error.

The get() method also lets you set a default value for nonexistent keys:

>>> d.get('y', 0)
0

There’s no key 'y' in the dictionary d, so 0 is returned. If there is a key, however, the value is returned instead:

>>> d['y'] = 1
>>> d.get('y', 0)
1

The keys() and values() methods each return a list-like data structure of all the keys and values, respectively, in a dictionary:

>>> d.keys()
dict_keys(['key1', 'x', 'key2', 'y'])
>>> d.values()
dict_values([5, 1, 20, 1])

To iterate over the key and value pairs in a dictionary, use the items() method:

>>> d.items()
dict_items([('key1', 5), ('x', 1), ('key2', 20), ('y', 1)])

This method returns a view of tuples, and each tuple is a key-value pair. We can use the following code snippet to print them nicely:

>>> for k, v in d.items():
print(k, v)

key1 5
x 1
key2 20
y 1

Views are more memory efficient than lists, and they don’t let you add or remove items.

Multiple Return Values

In the programs we’ve written so far, most of the functions return a single value, but functions sometimes return multiple values. We saw an example of such a function in “Measuring the Dispersion” on page 71, where in the program to find the range, we returned three numbers from thefind_range() function. Here’s another example of the approach we took there:

import math
def components(u, theta):
x = u*math.cos(theta)
y = u*math.sin(theta)
return x, y

The components() function accepts a velocity, u, and an angle, theta, in radians as parameters, and it calculates the x and y components and returns them. To return the calculated components, we simply list the corresponding Python labels in the return statement separated by a comma. This creates and returns a tuple consisting of the items x and y. In the calling code, we receive the multiple values:

if __name__ == '__main__':
theta = math.radians(45)
x, y = components(theta)

Because the components() function returns a tuple, we can retrieve the returned values using tuple indices:

c = components(theta)
x = c[0]
y = c[1]

This has advantages because we don’t have to know all the different values being returned. For one, you don’t have to write x,y,z = myfunc1() when the function returns three values or a,x,y,z = myfunc1() when the function returns four values, and so on.

In either of the preceding cases, the code calling the components() function must know which of the return values correspond to which component of the velocity, as there’s no way to know that from the values themselves.

A user-friendly approach is to return a dictionary object instead, as we saw in the case of SymPy’s solve() function when used with the dict=True keyword argument. Here’s how we can rewrite the preceding components function to return a dictionary:

import math

def components(theta):
x = math.cos(theta)
y = math.sin(theta)

return {'x': x, 'y': y}

Here, we return a dictionary with the keys 'x' and 'y' referring to the x and y components and their corresponding numerical values. With this new function definition, we don’t need to worry about the order of the returned values. We just use the key 'x' to retrieve the x component and the key 'y' to retrieve the y component:

if __name__ == '__main__':
theta = math.radians(45)
c = components(theta)
y = c['y']
x = c['x']
print(x, y)

This approach eliminates the need to use indices to refer to a specific returned value. The following code rewrites the program to find the range (see “Measuring the Dispersion” on page 71) so that the results are returned as a dictionary instead of a tuple:

'''
Find the range using a dictionary to return values
'''
def find_range(numbers):
lowest = min(numbers)
highest = max(numbers)
# Find the range
r = highest-lowest
return {'lowest':lowest, 'highest':highest, 'range':r}

if __name__ == '__main__':
donations = [100, 60, 70, 900, 100, 200, 500, 500, 503, 600, 1000, 1200]
result = find_range(donations)
➊ print('Lowest: {0} Highest: {1} Range: {2}'.
format(result['lowest'], result['highest'], result['range']))

The find_range() function now returns a dictionary with the keys lowest, highest, and range and with the lowest number, highest number, and the range as their corresponding values. At ➊, we simply use the corresponding key to retrieve the corresponding value.

If we were just interested in the range of a group of numbers and we didn’t care about the lowest and highest numbers, we’d just use result['range'] and not worry about what other values were returned.

Exception Handling

In Chapter 1, we learned that trying to convert a string such as '1.1' to an integer using the int() function results in a ValueError exception. But with a try...except block, we can print a user-friendly error message:

>>> try:
int('1.1')
except ValueError:
print('Failed to convert 1.1 to an integer')

Failed to convert 1.1 to an integer

When any statement in the try block raises an exception, the type of exception raised is matched with the one specified by the except statement. If there’s a match, the program resumes in the except block. If the exception doesn’t match, the program execution halts and displays the exception. Here’s an example:

>>> try:
print(1/0)
except ValueError:
print('Division unsuccessful')

Traceback (most recent call last):
File "<pyshell#66>", line 2, in <module>
print(1/0)
ZeroDivisionError: division by zero

This code block attempts a division by 0, which results in a ZeroDivisionError exception. Although the division is carried out in a try...except block, the exception type is incorrectly specified, and the exception isn’t handled correctly. The correct way to handle this exception is to specifyZeroDivisionError as the exception type.

Specifying Multiple Exception Types

You can also specify multiple exception types. Consider the function reciprocal(), which returns the reciprocal of the number passed to it:

def reciprocal(n):
try:
print(1/n)
except (ZeroDivisionError, TypeError):
print('You entered an invalid number')

We defined the function reciprocal(), which prints the reciprocal of the user’s input. We know that if the function is called with 0, it’ll cause a ZeroDivisionError exception. If you pass a string, however, it’ll cause a TypeError exception. The function considers both these cases as invalid input and specifies both ZeroDivisionError and TypeError in the except statement as a tuple.

Let’s try calling the function with a valid input—that is, a nonzero number:

>>> reciprocal(5)
0.2

Next, we call the function with 0 as the argument:

>>> reciprocal(0)
Enter an integer: 0
You entered an invalid number

The 0 argument raises the ZeroDivisionError exception, which is in the tuple of exception types specified to the except statement, so the code prints an error message.

Now, let’s enter a string:

>>> reciprocal('1')

In this case, we entered an invalid number, which raises the TypeError exception. This exception is also in the tuple of specified exceptions, so the code prints an error message. If you want to give a more specific error message, we can just specify multiple except statements as follows:

def reciprocal(n):
try:
print(1/n)
except TypeError:
print('You must specify a number')
except ZeroDivisionError:
print('Division by 0 is invalid')

>>> reciprocal(0)
Division by 0 is invalid
>>> reciprocal('1')
You must specify a number

In addition to TypeError, ValueError, and ZeroDivisionError, there are a number of other built-in exception types. The Python documentation at https://docs.python.org/3.4/library/exceptions.html#bltin-exceptions lists the builtin exceptions for Python 3.4.

The else Block

The else block is used to specify which statements to execute when there’s no exception. Consider an example from the program we wrote to draw the trajectory of a projectile (see “Drawing the Trajectory” on page 51):

if __name__ == '__main__':
try:
u = float(input('Enter the initial velocity (m/s): '))
theta = float(input('Enter the angle of projection (degrees): '))
except ValueError:
print('You entered an invalid input')
➊ else:
draw_trajectory(u, theta)
plt.show()

If the input for u or theta couldn’t be converted to a floating point number, it doesn’t make sense for the program to call the draw_trajectory() and plt.show() functions. Instead, we specify these two statements in the else block at ➊. Using try...except...else will let you manage different types of errors during runtime and take appropriate action when there is an error or when there is none:

1. If there’s an exception and there’s an except statement corresponding to the exception type raised, the execution is transferred to the corresponding except block.

2. If there’s no exception, the execution is transferred to the else block.

Reading Files in Python

Opening a file is the first step to reading data from it. Let’s start with a quick example. Consider a file that consists of a collection of numbers with one number per line:

100
60
70
900
100
200
500
500
503
600
1000
1200

We want to write a function that reads the file and returns a list of those numbers:

def read_data(path):
numbers = []
➊ f = open(path)
➋ for line in f:
numbers.append(float(line))
f.close()
return numbers

First, we define the function read_data() and create an empty list to store all of the numbers. At ➊, we use the open() function to open the file whose location has been specified via the argument path. An example of the path would be /home/username/mydata.txt on Linux, C:\mydata.txton Microsoft Windows, or /Users/Username/mydata.txt on OS X. The open() function returns a file object, which we use the label f to refer to. We can go over each line of the file using a for loop at ➋. Because each line is returned as a string, we convert it into a number and append it to the list numbers. The loop stops executing once all the lines have been read, and we close the file using the close() method. Finally, we return the numbers list.

This is similar to how we read the numbers from a file in Chapter 3, although we didn’t have to close the file explicitly because we used a different approach there. Using the approach we took in Chapter 3, we would rewrite the preceding function as follows:

def read_data(path):
numbers = []
➊ with open(path) as f:
for line in f:
numbers.append(float(line))
➋ return numbers

The key statement here is at ➊. It’s similar to writing f = open(path) but only partially. Besides opening the file and assigning the file object returned by open() to f, it also sets up a new context with all the statements in that block—in this case, all the statements before the returnstatement. When all the statements in the body have been executed, the file is automatically closed. That is, when the execution reaches the statement at ➋, the file is closed without needing an explicit call to the close() method. This method also means that if there are any exceptions while working with the file, it’ll still be closed before the program exits. This is the preferred approach to working with files.

Reading All the Lines at Once

Instead of reading the lines one by one to build a list, we can use the readlines() method to read all the lines into a list at once. This results in a more compact function:

def read_data(path):
with open(path) as f:
➊ lines = f.readlines()
numbers = [float(n) for n in lines]
return numbers

We read all the lines of the file into a list using the readlines() method at ➊. Then, we convert each of the items in the list into a floating point number using the float() function and list comprehension. Finally, we return the list numbers.

Specifying the Filename as Input

The read_data() function takes the file path as an argument. If your program allows you to specify the filename as an input, this function should work for any file as long as the file contains data we expect to read. Here’s an example:

if __name__=='__main__':
data_file = input('Enter the path of the file: ')
data = read_data(data_file)
print(data)

Once you’ve added this code to the end of the read_data() function and run it, it’ll ask you to input the path to the file. Then, it’ll print the numbers it reads from the file:

Enter the path of the file /home/amit/work/mydata.txt
[100.0,60.0,70.0,900.0,100.0,200.0,500.0,500.0,503.0,600.0,1000.0,1200.0]

Handling Errors When Reading Files

There are a couple of things that can go wrong when reading files: (1) the file can’t be read, or (2) the data in the file isn’t in the expected format. Here’s an example of what happens when a file can’t be read:

Enter the path of the file: /home/amit/work/mydata2.txt
Traceback (most recent call last):
File "read_file.py", line 11, in <module>
data = read_data(data_file)
File "read_file.py", line 4, in read_data
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/amit/work/
mydata2.txt'

Because I entered a file path that doesn’t exist, the FileNotFoundError exception is raised when we try to open the file. We can make the program display a user-friendly error message by modifying our read_data() function as follows:

def read_data(path):
numbers = []
try:
with open(path) as f:
for line in f:
numbers.append(float(line))
except FileNotFoundError:
print('File not found')
return numbers

Now, when you specify a nonexistent file path, you’ll get an error message instead:

Enter the path of the file: /home/amit/work/mydata2.txt
File not found

The second source of errors can be that the data in the file isn’t what your program expects to read. For example, consider a file that has the following:

10
20
3o
1/5
5.6

The third line in this file isn’t convertible to a floating point number because it has the letter o in it instead of the number 0, and the fourth line consists of 1/5, a fraction in string form, which float() can’t handle.

If you supply this data file to the earlier program, it’ll produce the following error:

Enter the path of the file: bad_data.txt
Traceback (most recent call last):
File "read_file.py", line 13, in <module>
data = read_data(data_file)
File "read_file.py", line 6, in read_data
numbers.append(float(line))
ValueError: could not convert string to float: '3o\n'

The third line in the file is 3o, not the number 30, so when we attempt to convert it into a floating point number, the result is ValueError. There are two approaches you can take when such data is present in a file. The first is to report the error and exit the program. The modified read_data()function would appear as follows:

def read_data(path):
numbers = []
try:
with open(path) as f:
for line in f:
➊ try:
➋ n = float(line)
except ValueError:
print('Bad data: {0}'.format(line))
➌ break
➍ numbers.append(n)
except FileNotFoundError:
print('File not found')
return numbers

We insert another try...except block in the function starting at ➊, and we convert the line into a floating point number at ➋. If the program raises the ValueError exception, we print an error message with the offending line and exit out of the for loop using break at ➌. The program then stops reading the file. The returned list, numbers, contains all the data that was successfully read before encountering the bad data. If there’s no error, we append the floating point number to the numbers list at ➍.

Now when you supply the file bad_data.txt to the program, it’ll read only the first two lines, display the error message, and exit:

Enter the path of the file: bad_data.txt
Bad data: 3o

[10.0, 20.0]

Returning partial data may not be desirable, so we could just replace the break statement at ➌ with return and no data would be returned.

The second approach is to ignore the error and continue with the rest of the file. Here’s a modified read_data() function that does this:

def read_data(path):
numbers = []
try:
with open(path) as f:
for line in f:
try:
n = float(line)
except ValueError:
print('Bad data: {0}'.format(line))
➊ continue
numbers.append(n)
except FileNotFoundError:
print('File not found')
return numbers

The only change here is that instead of breaking out of the for loop, we just continue with the next iteration using the continue statement at ➊. The output from the program is now as follows:

Bad data: 3o

Bad data: 1/5

[10.0, 20.0, 5.6]

The specific application where you’re reading the file will determine which of the above approaches you want to take to handle bad data.

Reusing Code

Throughout this book, we’ve used classes and functions that were either part of the Python standard library or available after installing third-party packages, such as matplotlib and SymPy. Now we’ll look at a quick example of how we can import our own programs into other programs.

Consider the function find_corr_x_y() that we wrote in “Calculating the Correlation Between Two Data Sets” on page 75. We’ll create a separate file, correlation.py, which has only the function definition:

'''
Function to calculate the linear correlation coefficient
'''

def find_corr_x_y(x,y):
# Size of each set
n = len(x)

# Find the sum of the products
prod=[]
for xi,yi in zip(x,y):
prod.append(xi*yi)

sum_prod_x_y = sum(prod)
sum_x = sum(x)
sum_y = sum(y)
squared_sum_x = sum_x**2
squared_sum_y = sum_y**2

x_square=[]
for xi in x:
x_square.append(xi**2)
x_square_sum = sum(x_square)

y_square=[]
for yi in y:
y_square.append(yi**2)
y_square_sum = sum(y_square)

numerator = n*sum_prod_x_y - sum_x*sum_y
denominator_term1 = n*x_square_sum - squared_sum_x
denominator_term2 = n*y_square_sum - squared_sum_y
denominator = (denominator_term1*denominator_term2)**0.5

correlation = numerator/denominator

return correlation

Without the .py file extension, a Python file is referred to as a module. This is usually reserved for files that define classes and functions that’ll be used in other programs. The following program imports the find_corr_x_y() function from the correlation module we just defined:

from correlation import find_corr_x_y
if __name__ == '__main__':
high_school_math = [83, 85, 84, 96, 94, 86, 87, 97, 97, 85]
college_admission = [85, 87, 86, 97, 96, 88, 89, 98, 98, 87]
corr = find_corr_x_y(high_school_math, college_admission)
print('Correlation coefficient: {0}'.format(corr))

This program finds the correlation between the high school math grades and college admission scores of students we considered in Table 3-3 on page 80. We import the find_corr_x_y() function from the correlation module, create the lists representing the two sets of grades, and call thefind_corr_x_y() function with the two lists as arguments. When you run the program, it’ll print the correlation coefficient. Note that the two files must be in the same directory—this is strictly to keep things simple.