DATA FILE HANDLING - Python Programming Made Easy (2016)

Python Programming Made Easy (2016)

Chapter 11: DATA FILE HANDLING

We have done a lot of programming where we have stored data in lists or dictionaries. The data entered is stored in primary memory which is a temporary memory.

Let’s say for example that we are generating a student’s report card. It would be nice if the data is stored in permanent storage. Files are used to store data permanently. A file is a bunch of bytes stored on some storage device like tape, magnetic disk etc. There are 2 types of files.

Ø Text file is a file that stores information in ASCII characters. In text files, each line of text is terminated with a special character known as EOL (End of Line) character or delimiter character. When this EOL character is read or written, certain internal translations take place.

Ø Binary file is a file that contains information in the same format as it is held in memory. In binary files, no delimiters are used for a line and no translations occur here. So, binary files are faster and occupy lesser storage space.

Open Function

It creates a file object which would be utilized to call other support methods associated with it.

file object = open(filename [, access_mode][, buffering])

filename: The file_name argument is a string value that contains the name of the file that you want to access.

access_mode: The access_mode determines the mode in which the file has to be opened ie. read, write append etc. A complete list of possible values is given below in the table. This is optional parameter and the default file access mode is read (r)

buffering: If the buffering value is set to 0, no buffering will take place. If the buffering value is 1, line buffering will be performed while accessing a file. If you specify the buffering value as an integer greater than 1, then buffering action will be performed with the indicated buffer size. If negative, the buffer size is the system default(default behavior).

File opening Modes

Modes

Description

r

Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.

rb

Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.

r+

Opens a file for both reading and writing. The file pointer will be at the beginning of the file.

rb+

Opens a file for both reading and writing in binary format. The file pointer will be at the beginning of the file.

w

Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

wb

Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

w+

Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

wb+

Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

a

Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

ab

Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

a+

Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

ab+

Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

The file object atrributes

Once a file is opened and you have one file object, you can get various information related to that file. Here is a list of all attributes related to file object:

Attribute

Description

file.closed

Returns true if file is closed, false otherwise.

file.mode

Returns access mode with which file was opened.

file.name

Returns name of the file.

file.softspace

Returns false if space explicitly required with print, true otherwise.

Opening a file

Method 1:

file= open("Test.txt","r+")

will open a file called Test.txt for reading and writing purpose.

We can use a variable instead of a constant as name of the file.

filename = raw_input(“Enter file name”)

File = open(filename, “r”)

The file has to be in the same folder where we are working now, otherwise we have to specify the complete path.

Method 2:

F = file(“Test”,”w”)

Method 3:

with statement can also be used for same purpose. Using with ensures that all the resources allocated to file objects gets deallocated automatically once we stop using the file. Its syntax is :

with open() as fileobject :

Example:

with open("Test.txt","r+") as file :

……………..

File Methods

S.No

Methods

Description

Example

1

close()

Close the file.

fileobject.close()

2

flush()

Flush the internal buffer

fileobject.flush()

3

next()

Returns the next line from the file each time it is being called.

fileobject.next()

4

read([size])

Reads at most size bytes from the file

X = fileobject.read([size])

5

readline([size])

Reads one entire line from the file. A trailing newline character is kept in the string.

X = fileobject.readline()

6

readlines([size])

Reads until EOF using readline() and return a list containing the lines. If sizehint argument is present, lines totalling approximately sizehint bytes are read.

X[] = fileobject.readlines()

7

seek(offset[, pos])

Sets the file's current position.

fileobject.seek(5)

8

tell()

Returns the file's current position

fileobject.tell()

9

truncate([size])

Truncates the file's size.

fileobject.truncate()

10

write(str)

Writes a string to the file.

str = “Hello”

fileobject.write(str)

11

writelines(sequence)

Writes a sequence of strings to the file.

strlist = [‘I’, ‘love’, ‘Python’]

Fileobject.writelines(strlist)

From the above table on file methods, readlines() will get all the lines of the file as a list (or sequence). To write in a file we use write and writelines method.

os module functions

Some of the useful methods, which can be used with files in os module are as follows:

1. getcwd() – gives the name of current working directory

2. path.abspath(filename) - gives us the complete path name of the data file.

3. path.isfile(filename) - checks whether the file exists or not.

4. remove(filename) - deletes the file.

5. rename(filename1,filename2) - changes the name of filename1 with filename2.

Operations on text files

Writing/Creating a text file

file = open(“input.txt”,’w’)

while True:

line = raw_input(“Enter the line”)

file.write(line)

choice = raw_input(“Do you have more lines? Y or N”)

if choice == ‘N’:

break

file.close()

The above code would create a file “input.txt”. It would get lines from the user and write it into the file.

Reading from a text file

file = open(“input.txt”,’r’)

lines = file.readlines()

file.close()

for line in lines:

print line

The above code would open the file and read the entire contents of the file into the list – lines. We can then use a for loop to print the contents of the list.

Traversal / Display

for line in open("story.txt","r").readlines():

print line

file.close()

Deletion

with open('story.txt','r') as file :

l = file.readlines()

file.close()

print l

lineno = input(“Enter the line number to be deleted”)

del l [lineno]

print l

file.open('story.txt','w')

file.writelines(l)

file.close()

Updation

with open("data.txt","r") as file:

contents = file.readlines()

file.close()

i=0

for line in contents:

print line

ch =raw_input("Do you want to change this line?")

if ch=='Y':

newline = raw_input("Enter the new line")

contents[i] = newline

i=i+1

with open("data.txt","w") as file:

file.writeline(contents)

print contents

Random Access

tell() method returns an integer giving the current position of object in the file. The integer returned specifies the number of bytes from the beginning of the file till the current position of file object.

Syntax: f.tell()

seek()method can be used to position the file object at particular place in the file.

Syntax: f.seek(offset [,pos])

here offset is used to calculate the position of file object in the file in bytes. Offset is added to pos (reference point) to get the position. Following is the list of pos values:

Value reference point

0 beginning of the file

1 current position of file

2 end of file

default value of pos is 0, i.e. beginning of the file

Example:

>>> f.tell() # get the current file position

34

>>> f.seek(0) # bring file cursor to initial position

0

>>> print(f.read()) # read the entire file

This is a Python file

This explains about

seek()

tell()

Binary files

Serialization is the process of converting a data structure / object that can be stored in non string format and can be resurrected later. Serialization can also be called as deflating the data and resurrecting as inflating it.

Pickle module is used in serialization of data. It allow us to store data in binary form in the file. Dump and load functions are used to write data and read data from file.

Writing in a binary file

import pickle # step 1

file = open('data.dat','wb') # step 2

while True:

x = int(raw_input("Enter number")) # step 3

pickle.dump(x,file) # step 4

ans = raw_input('want to enter more data Y / N')

if ans.upper()== 'N' : break

file.close() # step 5

Reading from a binary file

import pickle # step 1

file = open('data.dat','rb') # step 2

try :

while True :

y = load(file) # step 3

print y # step 4

except EOFError : # End of file error

pass

file.close() # step 5

Binary file Manipulation

Let us take an example of an inventory system. We need to import the required modules

from pickle import load, dump

import os

import sys

The next step is to create the class with the required attributes. In this case, we need to create an item class with attributes like iemcode, itemname, price and qty. We have also defined the get and display functions.

class ITEM:

def __init__(self):

self.itemcode=0

self.itemname=''

self.price=0.0

self.qty=0

def get(self):

self.itemcode=input("Enter item code")

self.itemname=raw_input("Enter item name")

self.price=float(raw_input("Enter price"))

self.qty=input("Enter quantity")

def display(self):

print "Item Code",self.itemcode

print "Item Name",self.itemname

print "Price",self.price

print "Quantity",self.qty

Let us define the file name

fname='ITEM.dat'

choice=0

while choice < 5:

choice = input("1-create 2-display 3-search 4- update 5-exit")

if choice==1:

ofile =open(fname,'wb')

print 'Going to dump'

num = input("Enter number of items")

for i in range(num):

t = ITEM()

t.get()

dump(t,ofile)

ofile.close()

print "Written into file"

We open the file in ‘wb’- write binary mode. We call the get() function to get all the details of the item from the user. Each item is then written into the file using the dump() function.

elif choice==2:

if not os.path.isfile(fname) :

print "file does not exist"

else:

ifile = open(fname,'rb')

try :

while True:

t = load(ifile)

t.display()

except EOFError:

pass

ifile.close()

For displaying the written contents from the file, we open the file in ‘rb’ read binary mode. The load function gets the item from the file and we use the display() function to display the item. In a similar way, till the end-of-file all the contents are displayed on the screen.

elif choice==3:

ifile = open(fname,'rb')

search = input("Enter item code")

try :

while True:

t = load(ifile)

if t.itemcode==search:

t.display()

except EOFError:

pass

ifile.close()

To search for a particular item,

1. Get the item code to be searched.

2. Open the file in read mode.

3. Get the item

4. If the itemcode = the item to be searched the display the details of the item like itemname, price and quantity ordered

5. Repeat steps 3 & 4 till we reach the end of file

6. Close the file

elif choice==4:

ifile = open(fname,'rb+')

search = input("Enter item code")

try :

while True:

# Store the position of the first byte of line

pos=ifile.tell()

t = load(ifile)

if t.itemcode==search:

t.itemname = raw_input("Enter new item name")

t.price=float(raw_input("Enter new price"))

t.qty=input("Enter new quantity")

# Move the file pointer to the stored position

ifile.seek(pos)

dump(t,ifile)

print "Updated"

break

except EOFError:

pass

To modify a file,

1. Open the file in ‘rb+’ mode which enables both reading & writing

2. Get the item code to be modified

3. Store the position of the first byte

4. Get the item

5. If itemcode = item to be modified then

a) get the new details

b) Move the file pointer to the starting of this line

c) Write the new details (Overwrite)

6. Close the file

else:

break

Solved Questions

1. What is the use of read() function ?

Ans. It reads the whole file and returns the whole thing in a string

2. What is the use of readline() function?

Ans. reads the first line of the file and returns it as a string

3. What is tell() method?

Ans. The tell() method tells you the current position within the file in other words, the next read or write will occur at that many bytes from the beginning of the file

4. What is the use of rename() function?

Ans. The rename() method takes two arguments, the current filename and the new filename.

os.rename(current_file_name, new_file_name)

5. What is the use of remove() function?

Ans. You can use the remove() method to delete files by supplying the name of the file to be deleted as the argument.

os.remove(file_name)

6. What is the use of pickle module?

Ans. Pickle module can be used to store any kind of object in file as it allows us to store python objects with their structure. So for storing data in binary format, we will use pickle module.

7. What is the use of dump method?

Ans. pickle.dump() writes the object in file, which is opened in binary access mode.

Syntax of dump() method is:

dump(object, fileobject)

8. What is the use of load method?

Ans. For reading data from file we will use pickle.load() to read the object from pickle file.

Syntax of load() is :

object = load(fileobject)

9. In how many ways can end of file be detected?

Ans.

1. When end of file is reached, readline() will return an empty string.

2. try :

while True :

y = pickle.load(file)

-- do something --

except EOFError :

pass

10. What is the difference between text files and binary files? (or) What are the advantages of binary files?

Ans. As we talk of lines in text file, each line is terminated by a special character, known as End of Line (EOL). Text files are stored in human readable form and they can also be created using any text editor.

A binary file contains arbitrary binary data i.e. numbers stored in the file, can be used for numerical operation(s). So when we work on binary file, we have to interpret the raw bit pattern(s) read from the file into correct type of data in our program. Python provides special module(s) for encoding and decoding of data for binary file.

11. Write a function CountYouMe() in Python which reads the contents of a text file Notes.txt and counts the words You and Me (not case sensitive).

Ans.

def CountYouMe():

wordlist = [line.strip() for line in open('notes.txt')]

# Searching for a word in a file

count =0

for word in wordlist:

words = word.split(" ")

for word in words:

# Remove all leading and trailing white spaces

word =word.strip().lower()

if word == 'you' or word=='me':

count = count + 1

if count == 0:

print "not found in file"

else:

print "count=",count

CountYouMe()

Example: If the file contains

You are my best friend

You and me make a good team.

Output would be: count=3

12. Write a function to count the number of lines starting with a digit in a text file “Diary.txt”.

Ans.

def CountFirstDigit():

count=0

with open('Diary.txt','r') as f:

while True:

line=f.readline()

if not line: break

if line[0].isdigit():

count = count+1

if count == 0:

print "no line starts with a digit"

else:

print "count=",count

CountFirstDigit()

13. Given a class TRAIN as follows,

class TRAIN:

def __init__(self):

self.trainno=0

self.trainname=''

self.start=''

self.dest=''

def get(self):

self.trainno=input("Enter train number")

self.trainname=raw_input("Enter train name")

self.start=raw_input("Enter starting place")

self.dest=raw_input("Enter destintion")

def display(self):

print "Train Number",self.trainno

print "Train Name",self.trainname

print "STARTING PLACE",self.start

print "DESTINATION",self.dest

Write a function to

(i) To write an object of TRAIN type into a binary file “TRAIN.dat”

(ii) To search and display details of trains starting from “MUMBAI”

(iii)To search and display details of trains whose destination in “DELHI”

Ans.

from pickle import load, dump

import os

import sys

fname='TRAIN.dat'

choice=0

while choice < 5:

choice = input("1-create 2-display 3-search using start 4- search using dest 5-exit")

if choice==1:

ofile =open(fname,'wb')

num = input("Enter number of trains")

for i in range(num):

t = TRAIN()

t.get()

dump(t,ofile)

ofile.close()

elif choice==2:

if not os.path.isfile(fname) :

print "file does not exist"

else:

ifile = open(fname,'rb')

try :

while True:

t = load(ifile)

t.display()

except EOFError:

pass

ifile.close()

elif choice==3:

ifile = open(fname,'rb')

start = raw_input("Enter starting station")

try :

while True:

t = load(ifile)

if t.start==start:

t.display()

except EOFError:

pass

ifile.close()

elif choice==4:

ifile = open(fname,'rb')

dest = raw_input("Enter destination station")

try :

while True:

t = load(ifile)

if t.dest==dest:

t.display()

except EOFError:

pass

ifile.close()

else:

break

Practice Questions

1. Write a function to read the content of a text file “DELHI.txt” and display all those lines on screen, which are either starting with ‘D’ or ‘M’.

2. A text file “PARA.txt” contains a paragraph. Write a function that searches for a given character and reports the no. of occurrences of the character in the file.

3. Create a dictionary having decimal equivalent of roman numerals. Store it in a binary file. Write a function to convert roman number to decimal equivalent using the binary file data.

4. Write a function CountHisHer() in Python which reads the contents of a text file “Story.txt” and counts the words His and Her (not case sensitive).

5. Write a function to count the number of lines starting with uppercase characters in a text file “Help.doc”.

6. Write a program to accept a filename from the user and display all the lines from the file which contain python comment character '#'.

7. Write a program in Python to insert, delete, search and display details of all employees from a binary file “Employee.dat”.