Coding for Beginners in Easy Steps: Basic Programming for All Ages (2015)
9. Managing text
This chapter demonstrates how to manipulate text strings in your programs and how to store text in files.
Manipulating strings
Formatting strings
Modifying strings
Accessing files
Manipulating content
Updating content
Summary
Manipulating strings
String values can be manipulated in a Python program using the various operators listed in the table below:
Operator: |
Description: |
Example: |
+ |
Concatenate – join strings together |
‘Hello’ + ‘Mike’ |
* |
Repeat – multiply the string |
‘Hello’ * 2 |
[ ] |
Slice – select a character at a specified index position |
‘Hello’ [0] |
[ : ] |
Range Slice – select characters in a specified index range |
‘Hello’ [ 0 : 4 ] |
in |
Membership Inclusive – return True if character exists in the string |
‘H’ in ‘Hello’ |
not in |
Membership Exclusive – return True if character doesn’t exist in string |
‘h’ not in ‘Hello’ |
r/R |
Raw String – suppress meaning of escape characters |
print( r’\n’ ) |
‘‘‘ ‘‘‘ |
Docstring – describe a module, function, class, or method |
def sum( a,b ) : ‘‘‘ Add Args ‘‘‘ |
The membership operators perform a case-sensitive match, so ‘A’ in ‘abc’ will fail.
The [ ] slice operator and [ : ] range slice operator recognize that a string is simply a list containing an individual character within each list element, which can be referenced by their index number.
Similarly, the in and not in membership operators iterate through each element seeking to match the specified character.
The raw string operator r ( or uppercase R ) must be placed immediately before the opening quote mark to suppress escape characters in the string and is useful when the string contains the backslash character.
The Range Slice returns the string up to, but not including, the final specified index position.
A “docstring” is a descriptive string literal that occurs as the first statement in a module, a function, a class, or a method definition. This should be enclosed within triple single quote marks. Uniquely, the docstring becomes the __doc__ special attribute of that object, so can be referenced using its name and dot-suffixing. All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings.
manipulate.py
Start a new program by defining a simple function that includes a docstring description
def display( s ) :
‘‘’Display an argument value.’’’
print( s )
Next, add a statement to display the function description
display( display.doc )
Now, add a statement to display a raw string value that contains the backslash character
display( r’C:\Program Files’ )
Then, add a statement to display a concatenation of two string values that include an escape character and a space
display( ‘\nHello’ + ‘ Python’ )
Next, add a statement to display a slice of a specified string within a range of element index numbers
display( ‘Python In Easy Steps\n’ [ 7 : ] )
Finally, display the results of seeking characters within a specified string
display( ‘P’ in ‘Python’ )
display( ‘p’ in ‘Python’ )
Save then run the program – to see manipulated strings get displayed
Remember that strings must be enclosed within either single quote marks or double quote marks.
With range slice, if the start index number is omitted zero is assumed and if the end index number is omitted the string length is assumed.
Formatting strings
The Python built-in dir() function can be useful to examine the names of functions and variables defined in a module by specifying the module name within its parentheses. Interactive mode can easily be used for this purpose by importing the module name then calling the dir() function. The example below examines the “dog” module created here in Chapter Six:
Notice that the __doc__ attribute introduced in the previous example appears listed here by the dir() function.
Those defined names that begin and end with a double underscore are Python objects, whereas the others are programmer-defined. The __builtins__ module can also be examined using the dir() function, to reveal the names of functions and variables defined by default, such as the print() function and a str object.
The str object defines several useful functions for string formatting, including an actual format() function that performs replacements. A string to be formatted by the format() function can contain both text and “replacement fields” marking places where text is to be inserted from an ordered comma-separated list of values. Each replacement field is denoted by { } braces, which may optionally contain the index number position of the replacement in the list.
Do not confuse the str object described here with the str() function that converts values to the string data type.
Strings may also be formatted using the C-style %s substitution operator to mark places in a string where text is to be inserted from a comma-separated ordered list of values.
format.py
Start a new program by initializing a variable with a formatted string
snack = ‘{} and {}’.format( ‘Burger’ , ‘Fries’ )
Next, display the variable value to see the text replaced in their listed order
print( ‘\nReplaced:’ , snack )
Now, assign a differently formatted string to the variable
snack = ‘{1} and {0}’.format( ‘Burger’ , ‘Fries’ )
Then, display the variable value again to see the text now replaced by their specified index element value
print( ‘Replaced:’ , snack )
Assign another formatted string to the variable
snack = ‘%s and %s’ % ( ‘Milk’ , ‘Cookies’ )
Finally, display the variable value once more to see the text substituted in their listed order
print( ‘\nSubstituted:’ , snack )
Save then run the program – to see formatted strings get displayed
You cannot leave spaces around the index number in the replacement field.
Other data types can be substituted using %d for a decimal integer, %c for a character, and %f for a floating-point number.
Modifying strings
The Python str object has many useful functions that can be dot-suffixed to its name for modification of the string and to examine its contents. Most commonly used string modification functions are listed in the table below together with a brief description:
Method: |
Description: |
capitalize( ) |
Change string’s first letter to uppercase |
title( ) |
Change all first letters to uppercase |
upper( ) |
Change the case of all letters to uppercase, to lowercase, or to the inverse of the current case respectively |
join( seq ) |
Merge string into separator sequence seq |
lstrip( ) |
Remove leading whitespace, trailing whitespace, or both leading and trailing whitespace respectively |
replace( old, new ) |
Replace all occurrences of old with new |
ljust( w, c ) |
Pad string to right or left respectively to total column width w with character c |
center( w, c ) |
Pad string each side to total column width w with character c ( default is space ) |
count( sub ) |
Return the number of occurrences of sub |
find( sub ) |
Return the index number of the first occurrence of sub or return -1 if not found |
startswith( sub ) |
Return True if sub is found at start or end respectively – otherwise return False |
isalpha( ) |
Return True if all characters are letters only, are numbers only, are letters or numbers only – otherwise return False |
islower( ) |
Return True if string characters are lowercase, uppercase, or all first letters are uppercase only – otherwise return False |
isspace( ) |
Return True if string contains only whitespace – otherwise return False |
isdigit( ) |
Return True if string contains only digits or decimals – otherwise return False |
A space character is not alphanumeric so isalnum() returns False when examining strings that contain spaces.
modify.py
Start a new program by initializing a variable with a string of lowercase characters and spaces
string = ‘coding for beginners in easy steps’
Next, display the string capitalized, titled, and centered
print( ‘\nCapitalized:\t’ , string.capitalize() )
print( ‘\nTitled:\t\t’ , string.title() )
print( ‘\nCentered:\t’ , string.center( 30 , ‘*’ ) )
Now, display the string in all uppercase and merged with a sequence of two asterisks
print( ‘\nUppercase:\t’ , string.upper() )
print( ‘\nJoined:\t\t’ , string.join( ‘**’ ) )
Then, display the string padded with asterisks on the left
print( ‘\nJustified:\t\t’ ,string.rjust( 30 , ‘*’ ) )
Finally, display the string with all occurrences of the ‘s’
character replaced by asterisks
print( ‘\nReplaced:\t’ , string.replace( ‘s’ , ‘*’ ) )
Save then run the program – to see modified strings get displayed
With the rjust() function a RIGHT-justified string gets padding added to its LEFT, and with the ljust() function a LEFT-justified string gets padding added to its RIGHT.
Accessing files
The __builtins__ module can be examined using the dir() function to reveal that it contains a file object that defines several methods for working with files, including open(), read(), write(), and close().
Before a file can be read or written it firstly must always be opened using the open() function. This requires two string arguments to specify the name and location of the file, and one of the following “mode” specifiers in which to open the file:
File mode: |
Operation: |
r |
Open an existing file to read |
w |
Open an existing file to write. Creates a new file if none exists or opens an existing file and discards all its previous contents |
a |
Append text. Opens or creates a text file for writing at the end of the file |
r+ |
Open a text file to read from or write to |
w+ |
Open a text file to write to or read from |
a+ |
Open or creates a text file to read from or write to at the end of the file |
Where the mode includes a b after any of the file modes listed above, the operation relates to a binary file rather than a text file. For example, rb or w+b |
File mode parameters are string values so must be surrounded by quotes.
Once a file is opened and you have a file object and can get various information related to that file from its properties:
Property: |
Description: |
name |
Name of the opened file |
mode |
Mode in which the file was opened |
closed |
Status Boolean value of True or False |
readable( ) |
Read permission Boolean value of True or False |
writable( ) |
Write permission Boolean value of True or False |
You can also use a readlines() function that returns a list of all lines.
access.py
Start a new program by creating a file object for a new text file named “example.txt” in which to write content
file = open( ‘example.txt’ , ‘w’ )
Next, add statements to display the file name and mode
print( ‘File Name:’ , file.name )
print( ‘File Open Mode:’ , file.mode )
Now, add statements to display the file access permissions
print( ‘Readable:’ , file.readable() )
print( ‘Writable:’ , file.writable() )
Then, define a function to determine the file’s status
def get_status( f ) :
if ( f.closed != False ) :
return ‘Closed’
else :
return ‘Open’
Finally, add statements to display the current file status then close the file and display the file status once more
print( ‘File Status:’ , get_status( file ) )
file.close()
print( ‘\nFile Status:’ , get_status( file ) )
Save then run the program – to see a file get opened for writing, then see the file get closed
If your program tries to open a non-existent file in r mode the interpreter will report an error.
Manipulating content
Once a file has been successfully opened it can be read, or added to, or new text can be written in the file, depending on the mode specified in the call to the open() function. Following this, the open file must always be closed by calling the close() function.
As you might expect, the read() function returns the entire content of the file and the write() function adds content to the file.
You can quickly and efficiently read the entire contents in a loop, iterating line by line.
readwrite.py
Start a new program by initializing a variable with a concatenated string containing newline characters
poem = ‘I never saw a man who looked\n’
poem += ‘With such a wistful eye\n’
poem += ‘Upon that little tent of blue\n’
poem += ‘Which prisoners call the sky\n’
Next, add a statement to create a file object for a new text file named “poem.txt” to write content into
file = open( ‘poem.txt’ , ‘w’ )
Now, add statements to write the string contained in the variable into the text file, then close that file
file.write( poem )
file.close()
Then, add a statement to create a file object for the existing text file “poem.txt” to read from
file = open( ‘poem.txt’ , ‘r’ )
Now, add statements to display the contents of the text file, then close that file
for line in file :
print( line , end = ‘’ )
file.close()
Save then run the program – to see the file created and read out to display
Launch the Notepad text editor to confirm the new text file exists and reveal its contents written by the program
Now, add statements at the end of the program to append a citation to the text file then save the script file again
file = open( ‘poem.txt’ , ‘a’ )
file.write( ‘(Oscar Wilde)’ )
file.close()
Save then run the program again to re-write the text file then view its contents in Notepad – to see the citation now appended after the original text content
Writing to an existing file will automatically overwrite its contents!
Suppress the default newline provided by the print() function where the strings themselves contain newlines.
You can also use the file object’s readlines() function that returns a list of all lines in a file – one line per element.
Updating content
A file object’s read() function will, by default, read the entire contents of the file from the very beginning, at index position zero, to the very end – at the index position of the final character. Optionally, the read() function can accept an integer argument to specify how many characters it should read.
The position within the file, from which to read or at which to write, can be finely controlled by the file object’s seek() function. This accepts an integer argument specifying how many characters to move position as an offset from the start of the file.
The current position within a file can be discovered at any time by calling the file object’s tell() function to return an integer location.
When working with file objects it is good practice to use the Python with keyword to group the file operational statements within a block. This technique ensures that the file is properly closed after operations end, even if an exception is raised on the way, and much shorter than writing equivalent try except blocks.
update.py
Start a new program by assigning a string value to a variable containing text to be written in a file
text = ‘The political slogan “Workers Of The World Unite!” \nis from The Communist Manifesto.’
Next, add statements to write the text string into a file and display the file’s current status in the “with” block
with open( ‘update.txt’ , ‘w’ ) as file :
file.write( text )
print( ‘\nFile Now Closed?:’ , file.closed )
Now, add a non-indented statement after the “with” code block to display the file’s new status
print( ‘File Now Closed?:’ , file.closed )
Then, re-open the file and display its contents to confirm it now contains the entire text string
with open( ‘update.txt’ , ‘r+’ ) as file :
text = file.read()
print( ‘\nString:’ , text )
Next, add indented statements to display the current file position, then reposition and display that new position
print( ‘\nPosition In File Now:’ , file.tell() )
position = file.seek( 33 )
print( ‘Position In File Now:’ , file.tell() )
Now, add an indented statement to overwrite the text from the current file position
file.write( ‘All Lands’ )
Then, add indented statements to reposition in the file once more and overwrite the text from the new position
file.seek( 61 )
file.write( ‘the tombstone of Karl Marx.’ )
Finally, add indented statements to return to the start of the file and display its entire updated contents
file.seek( 0 )
text = file.read()
print( ‘\nString:’ , text )
Save then run the program – to see the file strings get updated
The seek() function may optionally accept a second argument value of 0, 1, or 2 to move the specified number of characters from the start, current, or end position respectively – zero is the default start position.
As with strings, the first character in a file is at index position zero – not at index position one.
Summary
•Strings can be manipulated by operators for concatenation + , to join strings together, and for repetition * of strings
•Strings can be manipulated by operators for slice [ ], and range slice [ : ] , that reference the index number of string characters
•Strings can be manipulated by membership operators in and not in that seek to match a specified character within a string
•The r (or R) raw string operator can be placed immediately before a string to suppress any escape characters it contains
•A “docstring” is a descriptive string within triple quote marks at the start of a module, class, or function, to define its purpose
•The __doc__ attribute can be used to reference the string description within a docstring
•The __builtins__ module can be examined using the dir() function to reveal the names of default functions and variables
•A str object has a format() function for string formatting and many functions for string modification, such as capitalize()
•Replacement fields in a string to be formatted using the format function are denoted in a comma-separated list by { } braces
•Strings can also be formatted using the C-style %s substitution operator to mark places in a string where text is to be inserted
•A file object has open(), read(), write(), and close() functions for working with files, and features that describe the file properties
•The open() function must specify a file name string argument and a file mode string parameter, such as ’r’ to read the file
•A opened file object has information properties that reveal its current status, such as mode and readable() values
•Position in a file, at which to read or write, can be specified with the seek() method and reported by the tell() function
•The Python with keyword groups file operational statements within a block and automatically closes an open file