Py Ingredients: Numbers, Strings, and Variables - Introducing Python (2014)

Introducing Python (2014)

Chapter 2. Py Ingredients: Numbers, Strings, and Variables

In this chapter we’ll begin by looking at Python’s simplest built-in data types:

§ booleans (which have the value True or False)

§ integers (whole numbers such as 42 and 100000000)

§ floats (numbers with decimal points such as 3.14159, or sometimes exponents like 1.0e8, which means one times ten to the eighth power, or 100000000.0)

§ strings (sequences of text characters)

In a way, they’re like atoms. We’ll use them individually in this chapter. Chapter 3 shows how to combine them into larger “molecules.”

Each type has specific rules for its usage and is handled differently by the computer. We’ll also introduce variables (names that refer to actual data; more on these in a moment).

The code examples in this chapter are all valid Python, but they’re snippets. We’ll be using the Python interactive interpreter, typing these snippets and seeing the results immediately. Try running them yourself with the version of Python on your computer. You’ll recognize these examples by the >>> prompt. In Chapter 4, we start writing Python programs that can run on their own.

Variables, Names, and Objects

In Python, everything—booleans, integers, floats, strings, even large data structures, functions, and programs—is implemented as an object. This gives the language a consistency (and useful features) that some other languages lack.

An object is like a clear plastic box that contains a piece of data (Figure 2-1). The object has a type, such as boolean or integer, that determines what can be done with the data. A real-life box marked “Pottery” would tell you certain things (it’s probably heavy, and don’t drop it on the floor). Similarly, in Python, if an object has the type int, you know that you could add it to another int.

An object is like a box

Figure 2-1. An object is like a box

The type also determines if the data value contained by the box can be changed (mutable) or is constant (immutable). Think of an immutable object as a closed box with a clear window: you can see the value but you can’t change it. By the same analogy, a mutable object is like an open box: not only can you see the value inside, you can also change it; however, you can’t change its type.

Python is strongly typed, which means that the type of an object does not change, even if its value is mutable (Figure 2-2).

Strong typing does not mean push the keys harder

Figure 2-2. Strong typing does not mean push the keys harder

Programming languages allow you to define variables. These are names that refer to values in the computer’s memory that you can define for use with your program. In Python, you use = to assign a value to a variable.

NOTE

We all learned in grade school math that = means equal to. So why do many computer languages, including Python, use = for assignment? One reason is that standard keyboards lack logical alternatives such as a left arrow key, and = didn’t seem too confusing. Also, in computer programs you use assignment much more than you test for equality.

The following is a two-line Python program that assigns the integer value 7 to the variable named a, and then prints the value currently associated with a:

>>> a = 7

>>> print(a)

7

Now, it’s time to make a crucial point about Python variables: variables are just names. Assignment does not copy a value; it just attaches a name to the object that contains the data. The name is a reference to a thing rather than the thing itself. Think of a name as a sticky note (seeFigure 2-3).

Names stick to objects

Figure 2-3. Names stick to objects

Try this with the interactive interpreter:

1. As before, assign the value 7 to the name a. This creates an object box containing the integer value 7.

2. Print the value of a.

3. Assign a to b, making b also stick to the object box containing 7.

4. Print the value of b.

>>> a = 7

>>> print(a)

7

>>> b = a

>>> print(b)

7

In Python, if you want to know the type of anything (a variable or a literal value), use type( thing ). Let’s try it with different literal values (58, 99.9, abc) and different variables (a, b):

>>> type(a)

<class 'int'>

>>> type(b)

<class 'int'>

>>> type(58)

<class 'int'>

>>> type(99.9)

<class 'float'>

>>> type('abc')

<class 'str'>

A class is the definition of an object; Chapter 6 covers classes in greater detail. In Python, “class” and “type” mean pretty much the same thing.

Variable names can only contain these characters:

§ Lowercase letters (a through z)

§ Uppercase letters (A through Z)

§ Digits (0 through 9)

§ Underscore (_)

Names cannot begin with a digit. Also, Python treats names that begin with an underscore in special ways (which you can read about in Chapter 4). These are valid names:

§ a

§ a1

§ a_b_c___95

§ _abc

§ _1a

These names, however, are not valid:

§ 1

§ 1a

§ 1_

Finally, don’t use any of these for variable names, because they are Python’s reserved words:

False class finally is return

None continue for lambda try

True def from nonlocal while

and del global not with

as elif if or yield

assert else import pass

break except in raise

These words, and some punctuation, are used to define Python’s syntax. You’ll see all of them as you progress through this book.

Numbers

Python has built-in support for integers (whole numbers such as 5 and 1,000,000,000) and floating point numbers (such as 3.1416, 14.99, and 1.87e4). You can calculate combinations of numbers with the simple math operators in this table:

Operator

Description

Example

Result

+

addition

5 + 8

13

-

subtraction

90 - 10

80

*

multiplication

4 * 7

28

/

floating point division

7 / 2

3.5

//

integer (truncating) division

7 // 2

3

%

modulus (remainder)

7 % 3

1

**

exponentiation

3 ** 4

81

For the next few pages, I’ll show simple examples of Python acting as a glorified calculator.

Integers

Any sequence of digits in Python is assumed to be a literal integer:

>>> 5

5

You can use a plain zero (0):

>>> 0

0

But don’t put it in front of other digits:

>>> 05

File "<stdin>", line 1

05

^

SyntaxError: invalid token

NOTE

This is the first time you’ve seen a Python exception—a program error. In this case, it’s a warning that 05 is an “invalid token.” I’ll explain what this means in Bases. You’ll see many more examples of exceptions in this book because they’re Python’s main error handling mechanism.

A sequence of digits specifies a positive integer. If you put a + sign before the digits, the number stays the same:

>>> 123

123

>>> +123

123

To specify a negative integer, insert a – before the digits:

>>> -123

-123

You can do normal arithmetic with Python, much as you would with a calculator, by using the operators listed in the table on the previous page. Addition and subtraction work as you’d expect:

>>> 5 + 9

14

>>> 100 - 7

93

>>> 4 - 10

-6

You can include as many numbers and operators as you’d like:

>>> 5 + 9 + 3

17

>>> 4 + 3 - 2 - 1 + 6

10

A style note: you’re not required to have a space between each number and operator:

>>> 5+9 + 3

17

It just looks better and is easier to read.

Multiplication is also straightforward:

>>> 6 * 7

42

>>> 7 * 6

42

>>> 6 * 7 * 2 * 3

252

Division is a little more interesting, because it comes in two flavors:

§ / carries out floating-point (decimal) division

§ // performs integer (truncating) division

Even if you’re dividing an integer by an integer, using a / will give you a floating-point result:

>>> 9 / 5

1.8

Truncating integer division gives you an integer answer, throwing away any remainder:

>>> 9 // 5

1

Dividing by zero with either kind of division causes a Python exception:

>>> 5 / 0

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

ZeroDivisionError: division by zero

>>> 7 // 0

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

ZeroDivisionError: integer division ormodulo by z

All of the preceding examples used literal integers. You can mix literal integers and variables that have been assigned integer values:

>>> a = 95

>>> a

95

>>> a - 3

92

Earlier, when we said a - 3, we didn’t assign the result to a, so the value of a did not change:

>>> a

95

If you wanted to change a, you would do this:

>>> a = a - 3

>>> a

92

This usually confuses beginning programmers, again because of our ingrained grade school math training, we see that = sign and think of equality. In Python, the expression on the right side of the = is calculated first, then assigned to the variable on the left side.

If it helps, think of it this way:

§ Subtract 3 from a

§ Assign the result of that subtraction to a temporary variable

§ Assign the value of the temporary variable to a:

>>> a = 95

>>> temp = a - 3

>>> a = temp

So, when you say:

>>> a = a - 3

Python is calculating the subtraction on the righthand side, remembering the result, and then assigning it to a on the left side of the = sign. It’s faster and neater than using a temporary variable.

You can combine the arithmetic operators with assignment by putting the operator before the =. Here, a -= 3 is like saying a = a - 3:

>>> a = 95

>>> a -= 3

>>> a

92

This is like a = a + 8:

>>> a += 8

>>> a

100

And this is like a = a * 2:

>>> a *= 2

>>> a

200

Here’s a floating-point division example, such as a = a / 3:

>>> a /= 3

>>> a

66.66666666666667

Let’s assign 13 to a, and then try the shorthand for a = a // 4 (truncating integer division):

>>> a = 13

>>> a //= 4

>>> a

3

The % character has multiple uses in Python. When it’s between two numbers, it produces the remainder when the first number is divided by the second:

>>> 9 % 5

4

Here’s how to get both the (truncated) quotient and remainder at once:

>>> divmod(9,5)

(1, 4)

Otherwise, you could have calculated them separately:

>>> 9 // 5

1

>>> 9 % 5

4

You just saw some new things here: a function named divmod is given the integers 9 and 5 and returns a two-item result called a tuple. Tuples will take a bow in Chapter 3; functions will make their debut in Chapter 4.

Precedence

What would you get if you typed the following?

>>> 2 + 3 * 4

If you do the addition first, 2 + 3 is 5, and 5 * 4 is 20. But if you do the multiplication first, 3 * 4 is 12, and 2 + 12 is 14. In Python, as in most languages, multiplication has higher precedence than addition, so the second version is what you’d see:

>>> 2 + 3 * 4

14

How do you know the precedence rules? There’s a big table in Appendix F that lists them all, but I’ve found that in practice I never look up these rules. It’s much easier to just add parentheses to group your code as you intend the calculation to be carried out:

>>> 2 + (3 * 4)

14

This way, anyone reading the code doesn’t need to guess its intent or look up precedence rules.

Bases

Integers are assumed to be decimal (base 10) unless you use a prefix to specify another base. You might never need to use these other bases, but you’ll probably see them in Python code somewhere, sometime.

We generally have 10 fingers or toes (one of my cats has a few more, but rarely uses them for counting). So, we count 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Next, we run out of digits and carry the one to the “ten’s place” and put a 0 in the one’s place: 10 means “1 ten and 0 ones”. We don’t have a single digit that represents “ten.” Then, it’s 11, 12, up to 19, carry the one to make 20 (2 tens and 0 ones), and so on.

A base is how many digits you can use until you need to “carry the one.” In base 2 (binary), the only digits are 0 and 1. 0 is the same as a plain old decimal 0, and 1 is the same as a decimal 1. However, in base 2, if you add a 1 to a 1, you get 10 (1 decimal two plus 0 decimal ones).

In Python, you can express literal integers in three bases besides decimal:

§ 0b or 0B for binary (base 2).

§ 0o or 0O for octal (base 8).

§ 0x or 0X for hex (base 16).

The interpreter prints these for you as decimal integers. Let’s try each of these bases. First, a plain old decimal 10, which means 1 ten and 0 ones:

>>> 10

10

Now, a binary (base two), which means 1 (decimal) two and 0 ones:

>>> 0b10

2

Octal (base 8) for 1 (decimal) eight and 0 ones:

>>> 0o10

8

Hexadecimal (base 16) for 1 (decimal) 16 and 0 ones:

>>> 0x10

16

In case you’re wondering what “digits” base 16 uses, they are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f. 0xa is a decimal 10, and 0xf is a decimal 15. Add 1 to 0xf and you get 0x10 (decimal 16).

Why use a different base from 10? It’s useful in bit-level operations, which are described in Chapter 7, along with more details about converting numbers from one base to another.

Type Conversions

To change other Python data types to an integer, use the int() function. This will keep the whole number and discard any fractional part.

Python’s simplest data type is the boolean, which has only the values True and False. When converted to integers, they represent the values 1 and 0:

>>> int(True)

1

>>> int(False)

0

Converting a floating-point number to an integer just lops off everything after the decimal point:

>>> int(98.6)

98

>>> int(1.0e4)

10000

Finally, here’s an example of converting a text string (you’ll see more about strings in a few pages) that contains only digits, possibly with + or - signs:

>>> int('99')

99

>>> int('-23')

-23

>>> int('+12')

12

Converting an integer to an integer doesn’t change anything but doesn’t hurt either:

>>> int(12345)

12345

If you try to convert something that doesn’t look like a number, you’ll get an exception:

>>> int('99 bottles of beer on the wall')

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

ValueError: invalid literal for int() with base 10: '99 bottles of beer on the wall'

>>> int('')

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

ValueError: invalid literal for int() with base 10: ''

The preceding text string started with valid digit characters (99), but it kept on going with others that the int() function just wouldn’t stand for.

NOTE

We’ll get to exceptions in Chapter 4. For now, just know that it’s how Python alerts you that an error occurred (rather than crashing the program, as some languages might do). Instead of assuming that things always go right, I’ll show many examples of exceptions throughout this book, so you can see what Python does when they go wrong.

int() will make integers from floats or strings of digits, but won’t handle strings containing decimal points or exponents:

>>> int('98.6')

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

ValueError: invalid literal for int() with base 10: '98.6'

>>> int('1.0e4')

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

ValueError: invalid literal for int() with base 10: '1.0e4'

If you mix numeric types, Python will sometimes try to automatically convert them for you:

>>> 4 + 7.0

11.0

The boolean value False is treated as 0 or 0.0 when mixed with integers or floats, and True is treated as 1 or 1.0:

>>> True + 2

3

>>> False + 5.0

5.0

How Big Is an int?

In Python 2, the size of an int was limited to 32 bits. This was enough room to store any integer from –2,147,483,648 to 2,147,483,647.

A long had even more room: 64 bits, allowing values from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. In Python 3, long is long gone, and an int can be any size—even greater than 64 bits. Thus, you can say things like the following (10**100 is called a googol, and was the original name of Google before they decided on the easier spelling):

>>>

>>> googol = 10**100

>>> googol

100000000000000000000000000000000000000000000000000000000000000000000000000000

00000000000000000000000

>>> googol * googol

100000000000000000000000000000000000000000000000000000000000000000000000000000

000000000000000000000000000000000000000000000000000000000000000000000000000000

000000000000000000000000000000000000000000000

In many languages, trying this would cause something called integer overflow, where the number would need more space than the computer allowed for it, causing various bad effects. Python handles humungous integers with no problem. Score one for Python.

Floats

Integers are whole numbers, but floating-point numbers (called floats in Python) have decimal points. Floats are handled similarly to integers: you can use the operators (+, –, *, /, //, **, and %) and divmod() function.

To convert other types to floats, you use the float() function. As before, booleans act like tiny integers:

>>> float(True)

1.0

>>> float(False)

0.0

Converting an integer to a float just makes it the proud possessor of a decimal point:

>>> float(98)

98.0

>>> float('99')

99.0

And, you can convert a string containing characters that would be a valid float (digits, signs, decimal point, or an e followed by an exponent) to a real float:

>>> float('98.6')

98.6

>>> float('-1.5')

-1.5

>>> float('1.0e4')

10000.0

Math Functions

Python has the usual math functions such as square roots, cosines, and so on. We’ll save them for Appendix C, in which we also discuss Python uses in science.

Strings

Nonprogrammers often think that programmers must be good at math because they work with numbers. Actually, most programmers work with strings of text much more than numbers. Logical (and creative!) thinking is often more important than math skills.

Because of its support for the Unicode standard, Python 3 can contain characters from any written language in the world, plus a lot of symbols. Its handling of that standard was a big reason for its split from Python 2. It’s also a good reason to use version 3. I’ll get into Unicode in various places, because it can be daunting at times. In the string examples that follow, I’ll mostly use ASCII examples.

Strings are our first example of a Python sequence. In this case, they’re a sequence of characters.

Unlike other languages, strings in Python are immutable. You can’t change a string in-place, but you can copy parts of strings to another string to get the same effect. You’ll see how to do this shortly.

Create with Quotes

You make a Python string by enclosing characters in either single quotes or double quotes, as demonstrated in the following:

>>> 'Snap'

'Snap'

>>> "Crackle"

'Crackle'

The interactive interpreter echoes strings with a single quote, but all are treated exactly the same by Python.

Why have two kinds of quote characters? The main purpose is so that you can create strings containing quote characters. You can have single quotes inside double-quoted strings, or double quotes inside single-quoted strings:

>>> "'Nay,' said the naysayer."

"'Nay,' said the naysayer."

>>> 'The rare double quote in captivity: ".'

'The rare double quote in captivity: ".'

>>> 'A "two by four" is actually 1 1⁄2" × 3 1⁄2".'

'A "two by four is" actually 1 1⁄2" × 3 1⁄2".'

>>> "'There's the man that shot my paw!' cried the limping hound."

"'There's the man that shot my paw!' cried the limping hound."

You can also use three single quotes (''') or three double quotes ("""):

>>> '''Boom!'''

'Boom'

>>> """Eek!"""

'Eek!'

Triple quotes aren’t very useful for short strings like these. Their most common use is to create multiline strings, like this classic poem from Edward Lear:

>>> poem = '''There was a Young Lady of Norway,

... Who casually sat in a doorway;

... When the door squeezed her flat,

... She exclaimed, "What of that?"

... This courageous Young Lady of Norway.'''

>>>

(This was entered in the interactive interpreter, which prompted us with >>> for the first line and ... until we entered the final triple quotes and went to the next line.)

If you tried to create that poem with single quotes, Python would make a fuss when you went to the second line:

>>> poem = 'There was a young lady of Norway,

File "<stdin>", line 1

poem = 'There was a young lady of Norway,

^

SyntaxError: EOL while scanning string literal

>>>

If you have multiple lines within triple quotes, the line ending characters will be preserved in the string. If you have leading or trailing spaces, they’ll also be kept:

>>> poem2 = '''I do not like thee, Doctor Fell.

... The reason why, I cannot tell.

... But this I know, and know full well:

... I do not like thee, Doctor Fell.

... '''

>>> print(poem2)

I do notlike thee, Doctor Fell.

The reason why, I cannot tell.

But this I know, andknow full well:

I do notlike thee, Doctor Fell.

>>>

By the way, there’s a difference between the output of print() and the automatic echoing done by the interactive interpreter:

>>> poem2

'I do not like thee, Doctor Fell.\n The reason why, I cannot tell.\n But

this I know, andknow full well:\n I do notlike thee, Doctor Fell.\n'

print() strips quotes from strings and prints their contents. It’s meant for human output. It helpfully adds a space between each of the things it prints, and a newline at the end:

>>> print(99, 'bottles', 'would be enough.')

99 bottles would be enough.

If you don’t want the space or newline, you’ll see how to avoid them shortly.

The interpreter prints the string with single quotes and escape characters such as \n, which are explained in Escape with \.

Finally, there is the empty string, which has no characters at all but is perfectly valid. You can create an empty string with any of the aforementioned quotes:

>>> ''

''

>>> ""

''

>>> ''''''

''

>>> """"""

''

>>>

Why would you need an empty string? Sometimes you might want to build a string from other strings, and you need to start with a blank slate.

>>> bottles = 99

>>> base = ''

>>> base += 'current inventory: '

>>> base += str(bottles)

>>> base

'current inventory: 99'

Convert Data Types by Using str()

You can convert other Python data types to strings by using the str() function:

>>> str(98.6)

'98.6'

>>> str(1.0e4)

'10000.0'

>>> str(True)

'True'

Python uses the str() function internally when you call print() with objects that are not strings and when doing string interpolation, which you’ll see in Chapter 7.

Escape with \

Python lets you escape the meaning of some characters within strings to achieve effects that would otherwise be hard to express. By preceding a character with a backslash (\), you give it a special meaning. The most common escape sequence is \n, which means to begin a new line. With this you can create multiline strings from a one-line string.

>>> palindrome = 'A man,\nA plan,\nA canal:\nPanama.'

>>> print(palindrome)

A man,

A plan,

A canal:

Panama.

You will see the escape sequence \t (tab) used to align text:

>>> print('\tabc')

abc

>>> print('a\tbc')

a bc

>>> print('ab\tc')

ab c

>>> print('abc\t')

abc

(The final string has a terminating tab which, of course, you can’t see.)

You might also need \' or \" to specify a literal single or double quote inside a string that’s quoted by the same character:

>>> testimony = "\"I did nothing!\" he said. \"Not that either! Or the other

thing.\""

>>> print(testimony)

"I did nothing!" he said. "Not that either! Or the other thing."

>>> fact = "The world's largest rubber duck was 54'2\" by 65'7\" by 105'"

>>> print(fact)

The world's largest rubber duck was 54'2" by 65'7" by 105'

And if you need a literal backslash, just type two of them:

>>> speech = 'Today we honor our friend, the backslash: \\.'

>>> print(speech)

Today we honor our friend, the backslash: \.

Combine with +

You can combine literal strings or string variables in Python by using the + operator, as demonstrated here:

>>> 'Release the kraken! ' + 'At once!'

'Release the kraken! At once!'

You can also combine literal strings (not string variables) just by having one after the other:

>>> "My word! " "A gentleman caller!"

'My word! A gentleman caller!'

Python does not add spaces for you when concatenating strings, so in the preceding example, we needed to include spaces explicitly. It does add a space between each argument to a print() statement, and a newline at the end:

>>> a = 'Duck.'

>>> b = a

>>> c = 'Grey Duck!'

>>> a + b + c

'Duck.Duck.Grey Duck!'

>>> print(a, b, c)

Duck. Duck. Grey Duck!

Duplicate with *

You use the * operator to duplicate a string. Try typing these lines into your interactive interpreter and see what they print:

>>> start = 'Na ' * 4 + '\n'

>>> middle = 'Hey ' * 3 + '\n'

>>> end = 'Goodbye.'

>>> print(start + start + middle + end)

Extract a Character with []

To get a single character from a string, specify its offset inside square brackets after the string’s name. The first (leftmost) offset is 0, the next is 1, and so on. The last (rightmost) offset can be specified with –1 so you don’t have to count; going to the left are –2, –3, and so on.

>>> letters = 'abcdefghijklmnopqrstuvwxyz'

>>> letters[0]

'a'

>>> letters[1]

'b'

>>> letters[-1]

'z'

>>> letters[-2]

'y'

>>> letters[25]

'z'

>>> letters[5]

'f'

If you specify an offset that is the length of the string or longer (remember, offsets go from 0 to length–1), you’ll get an exception:

>>> letters[100]

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

IndexError: string index out of range

Indexing works the same with the other sequence types (lists and tuples), which we cover in Chapter 3.

Because strings are immutable, you can’t insert a character directly into one or change the character at a specific index. Let’s try to change 'Henny' to 'Penny' and see what happens:

>>> name = 'Henny'

>>> name[0] = 'P'

Traceback (most recent call last):

File "<stdin>", line 1, in<module>

TypeError: 'str' object does notsupport item assignment

Instead you need to use some combination of string functions such as replace() or a slice (which you’ll see in a moment):

>>> name = 'Henny'

>>> name.replace('H', 'P')

'Penny'

>>> 'P' + name[1:]

'Penny'

Slice with [ start : end : step ]

You can extract a substring (a part of a string) from a string by using a slice. You define a slice by using square brackets, a start offset, an end offset, and an optional step size. Some of these can be omitted. The slice will include characters from offset start to one before end.

§ [:] extracts the entire sequence from start to end.

§ [ start :] specifies from the start offset to the end.

§ [: end ] specifies from the beginning to the end offset minus 1.

§ [ start : end ] indicates from the start offset to the end offset minus 1.

§ [ start : end : step ] extracts from the start offset to the end offset minus 1, skipping characters by step.

As before, offsets go 0, 1, and so on from the start to the right, and –1,–2, and so forth from the end to the left. If you don’t specify start, the slice uses 0 (the beginning). If you don’t specify end, it uses the end of the string.

Let’s make a string of the lowercase English letters:

>>> letters = 'abcdefghijklmnopqrstuvwxyz'

Using a plain : is the same as 0: (the entire string):

>>> letters[:]

'abcdefghijklmnopqrstuvwxyz'

Here’s an example from offset 20 to the end:

>>> letters[20:]

'uvwxyz'

Now, from offset 10 to the end:

>>> letters[10:]

'klmnopqrstuvwxyz'

And another, offset 12 to 14 (Python does not include the last offset):

>>> letters[12:15]

'mno'

The three last characters:

>>> letters[-3:]

'xyz'

In this next example, we go from offset 18 to the fourth before the end; notice the difference from the previous example, in which starting at –3 gets the x, but ending at –3 actually stops at –4, the w:

>>> letters[18:-3]

'stuvw'

In the following, we extract from 6 before the end to 3 before the end:

>>> letters[-6:-2]

'uvwx'

If you want a step size other than 1, specify it after a second colon, as shown in the next series of examples.

From the start to the end, in steps of 7 characters:

>>> letters[::7]

'ahov'

From offset 4 to 19, by 3:

>>> letters[4:20:3]

'ehknqt'

From offset 19 to the end, by 4:

>>> letters[19::4]

'tx'

From the start to offset 20 by 5:

>>> letters[:21:5]

'afkpu'

(Again, the end needs to be one more than the actual offset.)

And that’s not all! Given a negative step size, this handy Python slicer can also step backward. This starts at the end and ends at the start, skipping nothing:

>>> letters[-1::-1]

'zyxwvutsrqponmlkjihgfedcba'

It turns out that you can get the same result by using this:

>>> letters[::-1]

'zyxwvutsrqponmlkjihgfedcba'

Slices are more forgiving of bad offsets than are single-index lookups. A slice offset earlier than the beginning of a string is treated as 0, and one after the end is treated as -1, as is demonstrated in this next series of examples.

From 50 before the end to the end:

>>> letters[-50:]

'abcdefghijklmnopqrstuvwxyz'

From 51 before the end to 50 before the end:

>>> letters[-51:-50]

''

From the start to 69 after the start:

>>> letters[:70]

'abcdefghijklmnopqrstuvwxyz'

From 70 after the start to 70 after the start:

>>> letters[70:71]

''

Get Length with len()

So far, we’ve used special punctuation characters such as + to manipulate strings. But there are only so many of these. Now, we start to use some of Python’s built-in functions: named pieces of code that perform certain operations.

The len() function counts characters in a string:

>>> len(letters)

26

>>> empty = ""

>>> len(empty)

0

You can use len() with other sequence types, too, as is described in Chapter 3.

Split with split()

Unlike len(), some functions are specific to strings. To use a string function, type the name of the string, a dot, the name of the function, and any arguments that the function needs: string . function ( arguments ). You’ll see a longer discussion of functions in Functions.

You can use the built-in string split() function to break a string into a list of smaller strings based on some separator. You’ll see lists in the next chapter. A list is a sequence of values, separated by commas and surrounded by square brackets.

>>> todos = 'get gloves,get mask,give cat vitamins,call ambulance'

>>> todos.split(',')

['get gloves', 'get mask', 'give cat vitamins', 'call ambulance']

In the preceding example, the string was called todos and the string function was called split(), with the single separator argument ','. If you don’t specify a separator, split() uses any sequence of white space characters—newlines, spaces, and tabs.

>>> todos.split()

['get', 'gloves,get', 'mask,give', 'cat', 'vitamins,call', 'ambulance']

You still need the parentheses when calling split with no arguments—that’s how Python knows you’re calling a function.

Combine with join()

In what may not be an earthshaking revelation, the join() function is the opposite of split(): it collapses a list of strings into a single string. It looks a bit backward because you specify the string that glues everything together first, and then the list of strings to glue: string .join(list ). So, to join the list lines with separating newlines, you would say '\n'.join(lines). In the following example, let’s join some names in a list with a comma and a space:

>>> crypto_list = ['Yeti', 'Bigfoot', 'Loch Ness Monster']

>>> crypto_string = ', '.join(crypto_list)

>>> print('Found and signing book deals:', crypto_string)

Found andsigning book deals: Yeti, Bigfoot, Loch Ness Monster

Playing with Strings

Python has a large set of string functions. Let’s explore how the most common of them work. Our test subject is the following string containing the text of the immortal poem “What Is Liquid?” by Margaret Cavendish, Duchess of Newcastle:

>>> poem = '''All that doth flow we cannot liquid name

Or else would fire and water be the same;

But that is liquid which is moist and wet

Fire that property can never get.

Then 'tis not cold that doth the fire put out

But 'tis the wet that makes it die, no doubt.'''

To begin, get the first 13 characters (offsets 0 to 12):

>>> poem[:13]

'All that doth'

How many characters are in this poem? (Spaces and newlines are included in the count.)

>>> len(poem)

250

Does it start with the letters All?

>>> poem.startswith('All')

True

Does it end with That's all, folks!?

>>> poem.endswith('That\'s all, folks!')

False

Now, let’s find the offset of the first occurrence of the word the in the poem:

>>> word = 'the'

>>> poem.find(word)

73

And the offset of the last the:

>>> poem.rfind(word)

214

How many times does the three-letter sequence the occur?

>>> poem.count(word)

3

Are all of the characters in the poem either letters or numbers?

>>> poem.isalnum()

False

Nope, there were some punctuation characters.

Case and Alignment

In this section, we’ll look at some more uses of the built-in string functions. Our test string is the following:

>>> setup = 'a duck goes into a bar...'

Remove . sequences from both ends:

>>> setup.strip('.')

'a duck goes into a bar'

NOTE

Because strings are immutable, none of these examples actually changes the setup string. Each example just takes the value of setup, does something to it, and returns the result as a new string.

Capitalize the first word:

>>> setup.capitalize()

'A duck goes into a bar...'

Capitalize all the words:

>>> setup.title()

'A Duck Goes Into A Bar...'

Convert all characters to uppercase:

>>> setup.upper()

'A DUCK GOES INTO A BAR...'

Convert all characters to lowercase:

>>> setup.lower()

'a duck goes into a bar...'

Swap upper- and lowercase:

>>> setup.swapcase()

'a DUCK GOES INTO A BAR...'

Now, we’ll work with some layout alignment functions. The string is aligned within the specified total number of spaces (30 here).

Center the string within 30 spaces:

>>> setup.center(30)

' a duck goes into a bar... '

Left justify:

>>> setup.ljust(30)

'a duck goes into a bar... '

Right justify:

>>> setup.rjust(30)

' a duck goes into a bar...'

I have much more to say about string formatting and conversions in Chapter 7, including how to use % and format().

Substitute with replace()

You use replace() for simple substring substitution. You give it the old substring, the new one, and how many instances of the old substring to replace. If you omit this final count argument, it replaces all instances. In this example, only one string is matched and replaced:

>>> setup.replace('duck', 'marmoset')

'a marmoset goes into a bar...'

Change up to 100 of them:

>>> setup.replace('a ', 'a famous ', 100)

'a famous duck goes into a famous bar...'

When you know the exact substring(s) you want to change, replace() is a good choice. But watch out. In the second example, if we had substituted for the single character string 'a' rather than the two character string 'a ' (a followed by a space), we would have also changed a in the middle of other words:

>>> setup.replace('a', 'a famous', 100)

'a famous duck goes into a famous ba famousr...'

Sometimes, you want to ensure that the substring is a whole word, or the beginning of a word, and so on. In those cases, you need regular expressions, which are described in detail in Chapter 7.

More String Things

Python has many more string functions than I’ve shown here. Some will turn up in later chapters, but you can find all the details at the standard documentation link.

Things to Do

This chapter introduced the atoms of Python: numbers, strings, and variables. Let’s try a few small exercises with them in the interactive interpreter.

2.1 How many seconds are in an hour? Use the interactive interpreter as a calculator and multiply the number of seconds in a minute (60) by the number of minutes in an hour (also 60).

2.2 Assign the result from the previous task (seconds in an hour) to a variable called seconds_per_hour.

2.3 How many seconds are in a day? Use your seconds_per_hour variable.

2.4 Calculate seconds per day again, but this time save the result in a variable called seconds_per_day.

2.5 Divide seconds_per_day by seconds_per_hour. Use floating-point (/) division.

2.6 Divide seconds_per_day by seconds_per_hour, using integer (//) division. Did this number agree with the floating-point value from the previous question, aside from the final .0?