Managing text - Coding for Beginners in Easy Steps: Basic Programming for All Ages (2015)

Coding for Beginners in Easy Steps: Basic Programming for All Ages (2015)

9. Managing text

This chapter demonstrates how to manipulate text strings in your programs and how to store text in files.

Manipulating strings

Formatting strings

Modifying strings

Accessing files

Manipulating content

Updating content

Summary

Manipulating strings

String values can be manipulated in a Python program using the various operators listed in the table below:

Operator:

Description:

Example:

+

Concatenate – join strings together

‘Hello’ + ‘Mike’

*

Repeat – multiply the string

‘Hello’ * 2

[ ]

Slice – select a character at a specified index position

‘Hello’ [0]

[ : ]

Range Slice – select characters in a specified index range

‘Hello’ [ 0 : 4 ]

in

Membership Inclusive – return True if character exists in the string

‘H’ in ‘Hello’

not in

Membership Exclusive – return True if character doesn’t exist in string

‘h’ not in ‘Hello’

r/R

Raw String – suppress meaning of escape characters

print( r’\n’ )

‘‘‘ ‘‘‘

Docstring – describe a module, function, class, or method

def sum( a,b ) : ‘‘‘ Add Args ‘‘‘

image

The membership operators perform a case-sensitive match, so ‘A’ in ‘abc’ will fail.

The [ ] slice operator and [ : ] range slice operator recognize that a string is simply a list containing an individual character within each list element, which can be referenced by their index number.

Similarly, the in and not in membership operators iterate through each element seeking to match the specified character.

The raw string operator r ( or uppercase R ) must be placed immediately before the opening quote mark to suppress escape characters in the string and is useful when the string contains the backslash character.

image

The Range Slice returns the string up to, but not including, the final specified index position.

A “docstring” is a descriptive string literal that occurs as the first statement in a module, a function, a class, or a method definition. This should be enclosed within triple single quote marks. Uniquely, the docstring becomes the __doc__ special attribute of that object, so can be referenced using its name and dot-suffixing. All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings.

image

manipulate.py

imageStart a new program by defining a simple function that includes a docstring description

def display( s ) :

‘‘’Display an argument value.’’’

print( s )

imageNext, add a statement to display the function description

display( display.doc )

imageNow, add a statement to display a raw string value that contains the backslash character

display( r’C:\Program Files’ )

imageThen, add a statement to display a concatenation of two string values that include an escape character and a space

display( ‘\nHello’ + ‘ Python’ )

imageNext, add a statement to display a slice of a specified string within a range of element index numbers

display( ‘Python In Easy Steps\n’ [ 7 : ] )

imageFinally, display the results of seeking characters within a specified string

display( ‘P’ in ‘Python’ )

display( ‘p’ in ‘Python’ )

imageSave then run the program – to see manipulated strings get displayed

image

image

Remember that strings must be enclosed within either single quote marks or double quote marks.

image

With range slice, if the start index number is omitted zero is assumed and if the end index number is omitted the string length is assumed.

Formatting strings

The Python built-in dir() function can be useful to examine the names of functions and variables defined in a module by specifying the module name within its parentheses. Interactive mode can easily be used for this purpose by importing the module name then calling the dir() function. The example below examines the “dog” module created here in Chapter Six:

image

image

Notice that the __doc__ attribute introduced in the previous example appears listed here by the dir() function.

Those defined names that begin and end with a double underscore are Python objects, whereas the others are programmer-defined. The __builtins__ module can also be examined using the dir() function, to reveal the names of functions and variables defined by default, such as the print() function and a str object.

The str object defines several useful functions for string formatting, including an actual format() function that performs replacements. A string to be formatted by the format() function can contain both text and “replacement fields” marking places where text is to be inserted from an ordered comma-separated list of values. Each replacement field is denoted by { } braces, which may optionally contain the index number position of the replacement in the list.

image

Do not confuse the str object described here with the str() function that converts values to the string data type.

Strings may also be formatted using the C-style %s substitution operator to mark places in a string where text is to be inserted from a comma-separated ordered list of values.

image

format.py

imageStart a new program by initializing a variable with a formatted string

snack = ‘{} and {}’.format( ‘Burger’ , ‘Fries’ )

imageNext, display the variable value to see the text replaced in their listed order

print( ‘\nReplaced:’ , snack )

imageNow, assign a differently formatted string to the variable

snack = ‘{1} and {0}’.format( ‘Burger’ , ‘Fries’ )

imageThen, display the variable value again to see the text now replaced by their specified index element value

print( ‘Replaced:’ , snack )

imageAssign another formatted string to the variable

snack = ‘%s and %s’ % ( ‘Milk’ , ‘Cookies’ )

imageFinally, display the variable value once more to see the text substituted in their listed order

print( ‘\nSubstituted:’ , snack )

imageSave then run the program – to see formatted strings get displayed

image

image

You cannot leave spaces around the index number in the replacement field.

image

Other data types can be substituted using %d for a decimal integer, %c for a character, and %f for a floating-point number.

Modifying strings

The Python str object has many useful functions that can be dot-suffixed to its name for modification of the string and to examine its contents. Most commonly used string modification functions are listed in the table below together with a brief description:

Method:

Description:

capitalize( )

Change string’s first letter to uppercase

title( )

Change all first letters to uppercase

upper( )
lower( )
swapcase( )

Change the case of all letters to uppercase, to lowercase, or to the inverse of the current case respectively

join( seq )

Merge string into separator sequence seq

lstrip( )
rstrip ( )
strip( )

Remove leading whitespace, trailing whitespace, or both leading and trailing whitespace respectively

replace( old, new )

Replace all occurrences of old with new

ljust( w, c )
rjust( w, c )

Pad string to right or left respectively to total column width w with character c

center( w, c )

Pad string each side to total column width w with character c ( default is space )

count( sub )

Return the number of occurrences of sub

find( sub )

Return the index number of the first occurrence of sub or return -1 if not found

startswith( sub )
endswith( sub )

Return True if sub is found at start or end respectively – otherwise return False

isalpha( )
isnumeric( )
isalnum( )

Return True if all characters are letters only, are numbers only, are letters or numbers only – otherwise return False

islower( )
isupper( )
istitle( )

Return True if string characters are lowercase, uppercase, or all first letters are uppercase only – otherwise return False

isspace( )

Return True if string contains only whitespace – otherwise return False

isdigit( )
isdecimal( )

Return True if string contains only digits or decimals – otherwise return False

image

A space character is not alphanumeric so isalnum() returns False when examining strings that contain spaces.

image

modify.py

imageStart a new program by initializing a variable with a string of lowercase characters and spaces

string = ‘coding for beginners in easy steps’

imageNext, display the string capitalized, titled, and centered

print( ‘\nCapitalized:\t’ , string.capitalize() )

print( ‘\nTitled:\t\t’ , string.title() )

print( ‘\nCentered:\t’ , string.center( 30 , ‘*’ ) )

imageNow, display the string in all uppercase and merged with a sequence of two asterisks

print( ‘\nUppercase:\t’ , string.upper() )

print( ‘\nJoined:\t\t’ , string.join( ‘**’ ) )

imageThen, display the string padded with asterisks on the left

print( ‘\nJustified:\t\t’ ,string.rjust( 30 , ‘*’ ) )

imageFinally, display the string with all occurrences of the ‘s’

character replaced by asterisks

print( ‘\nReplaced:\t’ , string.replace( ‘s’ , ‘*’ ) )

imageSave then run the program – to see modified strings get displayed

image

image

With the rjust() function a RIGHT-justified string gets padding added to its LEFT, and with the ljust() function a LEFT-justified string gets padding added to its RIGHT.

Accessing files

The __builtins__ module can be examined using the dir() function to reveal that it contains a file object that defines several methods for working with files, including open(), read(), write(), and close().

Before a file can be read or written it firstly must always be opened using the open() function. This requires two string arguments to specify the name and location of the file, and one of the following “mode” specifiers in which to open the file:

File mode:

Operation:

r

Open an existing file to read

w

Open an existing file to write. Creates a new file if none exists or opens an existing file and discards all its previous contents

a

Append text. Opens or creates a text file for writing at the end of the file

r+

Open a text file to read from or write to

w+

Open a text file to write to or read from

a+

Open or creates a text file to read from or write to at the end of the file

Where the mode includes a b after any of the file modes listed above, the operation relates to a binary file rather than a text file. For example, rb or w+b

image

File mode parameters are string values so must be surrounded by quotes.

Once a file is opened and you have a file object and can get various information related to that file from its properties:

Property:

Description:

name

Name of the opened file

mode

Mode in which the file was opened

closed

Status Boolean value of True or False

readable( )

Read permission Boolean value of True or False

writable( )

Write permission Boolean value of True or False

image

You can also use a readlines() function that returns a list of all lines.

image

access.py

imageStart a new program by creating a file object for a new text file named “example.txt” in which to write content

file = open( ‘example.txt’ , ‘w’ )

imageNext, add statements to display the file name and mode

print( ‘File Name:’ , file.name )

print( ‘File Open Mode:’ , file.mode )

imageNow, add statements to display the file access permissions

print( ‘Readable:’ , file.readable() )

print( ‘Writable:’ , file.writable() )

imageThen, define a function to determine the file’s status

def get_status( f ) :

if ( f.closed != False ) :

return ‘Closed’

else :

return ‘Open’

imageFinally, add statements to display the current file status then close the file and display the file status once more

print( ‘File Status:’ , get_status( file ) )

file.close()

print( ‘\nFile Status:’ , get_status( file ) )

imageSave then run the program – to see a file get opened for writing, then see the file get closed

image

image

If your program tries to open a non-existent file in r mode the interpreter will report an error.

Manipulating content

Once a file has been successfully opened it can be read, or added to, or new text can be written in the file, depending on the mode specified in the call to the open() function. Following this, the open file must always be closed by calling the close() function.

As you might expect, the read() function returns the entire content of the file and the write() function adds content to the file.

You can quickly and efficiently read the entire contents in a loop, iterating line by line.

image

readwrite.py

imageStart a new program by initializing a variable with a concatenated string containing newline characters

poem = ‘I never saw a man who looked\n’

poem += ‘With such a wistful eye\n’

poem += ‘Upon that little tent of blue\n’

poem += ‘Which prisoners call the sky\n’

imageNext, add a statement to create a file object for a new text file named “poem.txt” to write content into

file = open( ‘poem.txt’ , ‘w’ )

imageNow, add statements to write the string contained in the variable into the text file, then close that file

file.write( poem )

file.close()

imageThen, add a statement to create a file object for the existing text file “poem.txt” to read from

file = open( ‘poem.txt’ , ‘r’ )

imageNow, add statements to display the contents of the text file, then close that file

for line in file :

print( line , end = ‘’ )

file.close()

imageSave then run the program – to see the file created and read out to display

image

imageLaunch the Notepad text editor to confirm the new text file exists and reveal its contents written by the program

image

imageNow, add statements at the end of the program to append a citation to the text file then save the script file again

file = open( ‘poem.txt’ , ‘a’ )

file.write( ‘(Oscar Wilde)’ )

file.close()

imageSave then run the program again to re-write the text file then view its contents in Notepad – to see the citation now appended after the original text content

image

image

Writing to an existing file will automatically overwrite its contents!

image

Suppress the default newline provided by the print() function where the strings themselves contain newlines.

image

You can also use the file object’s readlines() function that returns a list of all lines in a file – one line per element.

Updating content

A file object’s read() function will, by default, read the entire contents of the file from the very beginning, at index position zero, to the very end – at the index position of the final character. Optionally, the read() function can accept an integer argument to specify how many characters it should read.

The position within the file, from which to read or at which to write, can be finely controlled by the file object’s seek() function. This accepts an integer argument specifying how many characters to move position as an offset from the start of the file.

The current position within a file can be discovered at any time by calling the file object’s tell() function to return an integer location.

When working with file objects it is good practice to use the Python with keyword to group the file operational statements within a block. This technique ensures that the file is properly closed after operations end, even if an exception is raised on the way, and much shorter than writing equivalent try except blocks.

image

update.py

imageStart a new program by assigning a string value to a variable containing text to be written in a file

text = ‘The political slogan “Workers Of The World Unite!” \nis from The Communist Manifesto.’

imageNext, add statements to write the text string into a file and display the file’s current status in the “with” block

with open( ‘update.txt’ , ‘w’ ) as file :

file.write( text )

print( ‘\nFile Now Closed?:’ , file.closed )

imageNow, add a non-indented statement after the “with” code block to display the file’s new status

print( ‘File Now Closed?:’ , file.closed )

imageThen, re-open the file and display its contents to confirm it now contains the entire text string

with open( ‘update.txt’ , ‘r+’ ) as file :

text = file.read()

print( ‘\nString:’ , text )

imageNext, add indented statements to display the current file position, then reposition and display that new position

print( ‘\nPosition In File Now:’ , file.tell() )

position = file.seek( 33 )

print( ‘Position In File Now:’ , file.tell() )

imageNow, add an indented statement to overwrite the text from the current file position

file.write( ‘All Lands’ )

imageThen, add indented statements to reposition in the file once more and overwrite the text from the new position

file.seek( 61 )

file.write( ‘the tombstone of Karl Marx.’ )

imageFinally, add indented statements to return to the start of the file and display its entire updated contents

file.seek( 0 )

text = file.read()

print( ‘\nString:’ , text )

imageSave then run the program – to see the file strings get updated

image

image

The seek() function may optionally accept a second argument value of 0, 1, or 2 to move the specified number of characters from the start, current, or end position respectively – zero is the default start position.

image

As with strings, the first character in a file is at index position zero – not at index position one.

Summary

•Strings can be manipulated by operators for concatenation + , to join strings together, and for repetition * of strings

•Strings can be manipulated by operators for slice [ ], and range slice [ : ] , that reference the index number of string characters

•Strings can be manipulated by membership operators in and not in that seek to match a specified character within a string

•The r (or R) raw string operator can be placed immediately before a string to suppress any escape characters it contains

•A “docstring” is a descriptive string within triple quote marks at the start of a module, class, or function, to define its purpose

•The __doc__ attribute can be used to reference the string description within a docstring

•The __builtins__ module can be examined using the dir() function to reveal the names of default functions and variables

•A str object has a format() function for string formatting and many functions for string modification, such as capitalize()

•Replacement fields in a string to be formatted using the format function are denoted in a comma-separated list by { } braces

•Strings can also be formatted using the C-style %s substitution operator to mark places in a string where text is to be inserted

•A file object has open(), read(), write(), and close() functions for working with files, and features that describe the file properties

•The open() function must specify a file name string argument and a file mode string parameter, such as ’r’ to read the file

•A opened file object has information properties that reveal its current status, such as mode and readable() values

•Position in a file, at which to read or write, can be specified with the seek() method and reported by the tell() function

•The Python with keyword groups file operational statements within a block and automatically closes an open file