Practical C Programming, 3rd Edition (2011)

Part II. Simple Programming

Chapter 14. File Input/Output

I the heir of all the ages, in the foremost files of time.

—Alfred, Lord Tennyson

A file is a collection of related data. C treats a file as a series of bytes. Many files reside on disk; however, devices like terminals, printers, and magnetic tapes are also considered files.

The C library contains a large number of routines for manipulating files. The declarations for the structures and functions used by the file functions are stored in the standard include file <stdio.h>. Before doing anything with files, you must put the line:

#include <stdio.h>

at the beginning of your program.

The declaration for a file variable is:

FILE *file-variable; /* comment */

For example:

#include <stdio.h>

FILE *in_file; /* file containing the input data */

Before a file can be used, it must be opened using the function fopen. fopen returns a pointer to the file structure for the file. The format for fopen is:

file-variable = fopen(name, mode);

where:

file-variable

is a file variable. A value of NULL is returned on error.

name

is the actual name of the file (data.txt, temp.dat, etc.).

mode

indicates if the file is to be read or written. mode is "w" for writing and "r" for reading. The flag “b” can be added to indicate a binary file. Omitting the binary flag indicates an ASCII (text) file. (See Section 14.2 for a description of ASCII and binary files.)

Flags can be combined. So “wb” is used to write a binary file.

The function returns a file handle that will be used in subsequent I/O operations. If there is an I/O error, then the value NULL is returned:

FILE *in_file; /* File to read */

in_file = fopen("input.txt", "r"); /* Open the input file */

if (in_file == NULL) { /* Test for error */

fprintf(stderr, "Error: Unable to input file 'input.txt'\n");

exit (8);

}

The function fclose will close the file. The format of fclose is:

status = fclose(file-variable);

or:

fclose(file-variable);

The variable status is zero if the fclose was successful or nonzero for an error. If you don’t care about the status, the second form closes the file and discards the return value.

C provides three pre-opened files. These are listed in Table 14-1.

Table 14-1. Standard Files

File	Description
stdin	Standard input (open for reading)
stdout	Standard output (open for writing)
stderr	Standard error (open for writing)

The function fgetc reads a single character from a file. If no more data exists in the file, the function will return the constant EOF (EOF is defined in stdio.h). Note that fgetc returns an integer, not a character. This return is necessary because the EOF flag must be a noncharacter value.

Example 14-1 counts the number of characters in the file input.txt.

Example 14-1. copy/copy.c

[File: copy/copy.c]

#include <stdio.h>

const char FILE_NAME[] = "input.txt";

#include <stdlib.h>

int main()

{

int count = 0; /* number of characters seen */

FILE *in_file; /* input file */

/* character or EOF flag from input */

int ch;

in_file = fopen(FILE_NAME, "r");

if (in_file == NULL) {

printf("Cannot open %s\n", FILE_NAME);

exit(8);

}

while (1) {

ch = fgetc(in_file);

if (ch == EOF)

break;

++count;

}

printf("Number of characters in %s is %d\n",

FILE_NAME, count);

fclose(in_file);

return (0);

}

A similar function, fputc, exists for writing a single character. Its format is:

fputc(character, file);

The functions fgets and fputs work on one line at a time. The format of the fgets call is:

string_ptr = fgets(string, size, file);

where:

string_ptr

is equal to string if the read was successful, or NULL if end-of-file or an error is detected.

string

is a character array in which the function places the string.

size

is the size of the character array. fgets reads until it gets a line (complete with ending \n) or it reads size-1 characters. It then ends the string with a null (\0).

Problems can occur if the size specified is too big. C provides a convenient way of making sure that the size parameter is just right through the use of the sizeof operator.

The sizeof operator returns the size of its argument in bytes. For example:

long int array[10]; /* (Each element contains 4 bytes) */

char string[30];

Then sizeof(string) is 30. This size is not the same as the length. The length of string can be anywhere from to 29 characters. The sizeof function returns the number of bytes in string (used or not). A long int takes up 4 bytes so sizeof(array) is 40.

The sizeof operator is particularly useful when you use the fgets routine. By using sizeof, you don’t have to worry about how big a string is or, worse, what happens if someone changes the dimension of the string.

For example:

char string[100];

. . .

fgets(string, sizeof(string), in_file);

fputs is similar to fgets except that it writes a string instead of reading it. The format of the fputs function is:

string_ptr = fputs(string, file);

The parameters to fputs are similar to the ones for fgets. fputs needs no size because it gets the size of the line to write from the length of the string. (It keeps writing until it hits a null, '\0'.)

Conversion Routines

So far we have just discussed writing characters and strings. In this section, we will discuss some of the more sophisticated I/O operations and conversions.

In order to write a number to a printer or terminal, you must convert it to characters. The printer understands only characters, not numbers. For example, the number 567 must be converted to the three characters 5, 6, and 7 in order to be printed.

The function fprintf converts data and writes it to a file. The general form of the fprintf function is:

count = fprintf(file, format, parameter-1, parameter-2, ...);

where:

count

is the number of characters sent or -1 if an error occurred.

format

describes how the arguments are to be printed.

parameter-1, parameter-2, ...

are parameters to be converted and sent.

fprintf has two sister functions: printf and sprintf. printf() has been seen often in this book, and is equivalent to fprintf with a first argument of stdout. sprintf is similar to fprintf, except that the first argument is a string. For example:

char string[40]; /* the filename */

int file_number = 0; /* current file number for this segment */

sprintf(string, "file.%d", file_number);

++file_number;

out_file = fopen(string, "w");

scanf has similar sister functions: fscanf and sscanf. The format for fscanf is:

number = fscanf(file, format, &parameter-1, ...);

where:

number

is the number of parameters successfully converted. If there was input but nothing could be converted, a zero is returned. If no data is present, EOF is returned.

file

is a file opened for reading.

format

describes the data to be read.

parameter-1

is the first parameter to be read.

sscanf is similar to fscanf, except that a string is scanned instead of a file.

scanf is very fussy about where the end-of-line characters occur in the input. Frequently, the user has to type extra returns to get scanf unstuck.

We have avoided this problem by using fgets to read a line from the file and then using sscanf to parse it. fgets almost always gets a single line without trouble.

Example 14-2 shows a program that attempts to read two parameters from the standard input (the keyboard). It then prints a message based on the number of inputs actually found.

Example 14-2. Using the sscanf Return Value

char line[100]; /* Line from the keyboard */

int count, total; /* Number of entries & total value */

int scan_count; /* Number of parameters scanned */

fgets(line, sizeof(line), stdin);

scan_count = sscanf(line,"%d %d", &count, &total);

switch (scan_count) {

case EOF:

case O:

printf("Didn't find any number\n");

break;

case 1:

printf("Found 'count'(%d), but not 'total'\n", count);

break;

case 2:

printf("Found both 'count'(%d) and 'total'(%d)\n", count, total);

break;

default:

printf("This should not be possible\n");

break;

}

Question 14-1: No matter what filename we give Example 14-3, our program can’t find it. Why? (Click here for the answer Section 14.8)

Example 14-3. fun-file/fun-file.c

#include <stdio.h>

#include <stdlib.h>

int main()

{

char name[100]; /* name of the file to use */

FILE *in_file; /* file for input */

printf("Name? ");

fgets(name, sizeof(name), stdin);

in_file = fopen(name, "r");

if (in_file == NULL) {

fprintf(stderr, "Could not open file\n");

exit(8);

}

printf("File found\n");

fclose(in_file);

return (0);

}

Binary and ASCII Files

We have been working with ASCII files. ASCII stands for American Standard Code for Information Interchange. It is a set of 95 printable characters and 33 control codes. ASCII files are readable text. When you write a program, the prog.c file is in ASCII.

Terminals, keyboards, and printers deal with character data. When you want to write a number like 1234 to the screen, it must be converted to four characters ('1', '2', '3', and '4') and written. Similarly, when you read a number from the keyboard, the data must be converted from characters to integers. This is done by the sscanf routine.

The ASCII character '0' has the value of 48, '1' has the value of 49, and so on. When you want to convert a single digit from ASCII to integer, you must subtract this number. For example:

int integer;

char ch;

ch = '5';

integer = ch - 48;

printf("Integer %d\n", integer);

Rather than remember that '0' is 48, you can just subtract '0':

integer = ch - '0';

Computers work on binary data. When reading numbers from an ASCII file, the program must process the character data through a conversion routine like sscanf. This is expensive. Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer. (If you’ve ever seen a long printout coming out of the printer displaying pages with a few characters at the top that look like “!E#(@$%@^Aa^AA^^JHC%^X,” then you know what happens when you try to print a binary file.)

ASCII files are portable (for the most part). They can be moved from machine to machine with very little trouble. Binary files are almost certainly nonportable. Unless you are an expert programmer, you will find it almost impossible to make a portable binary file.

Which file type should you use? In most cases, ASCII. If you have small-to- medium amounts of data, the conversion time will not seriously affect the performance of your program. (Who cares if it takes 0.5 seconds to start up instead of 0.3?) ASCII files allow you to easily check the data for correctness.

Only when you are using large amounts of data will the space and performance problems force you to use the binary format.

The End-of-Line Puzzle

Back in the dark ages BC (Before Computers), a magical device called a Teletype Model 33 existed. This amazing machine contained a shift register made out of a motor, with a rotor, and a keyboard ROM consisting solely of levers and springs. It contained a keyboard, a printer, and a paper tape reader/punch. It could transmit messages over the phones using a modem at the rate of 10 characters a second.

The Teletype had a problem. It took two-tenths of a second to move the printhead from the right side to the left. Two-tenths of a second is two character times. If a second character came while the printhead was in the middle of a return, that character was lost.

The Teletype people solved this problem by making end-of-line two characters: <RETURN> to position the printhead at the left margin and <LINE FEED> to move the paper up one line.

When the early computers came out, some designers realized that using two characters for end-of-line wasted storage (at this time, storage was very expensive). Some picked <LINE FEED> for their end-of-line, some <RETURN>. Some of the diehards stayed with the two-character sequence.

UNIX uses <LINE FEED> for end-of-line. The newline character, \n, is code 0x0A (LF or <LINE FEED>). MS-DOS/Windows uses the two characters: <RETURN><LINE FEED>. Apple uses <RETURN>.

MS-DOS/Windows compiler designers had a problem. What do we do about the old C programs that thought that newline was just <LINE FEED>? The solution was to add code to the I/O library that stripped out the <RETURN> characters from ASCII input files and changed <LINE FEED> to <LINE FEED> <RETURN> on output.

In MS-DOS/Windows, it makes a difference whether or not a file is opened as ASCII or binary. The flag b is used to indicate a binary file:

/* open ASCII file for reading */

ascii_file = fopen("name", "r");

/* open binary file for reading */

binary_file = fopen("name", "rb");

If you open a file that contains text as a binary file under MS-DOS/Windows, you have to handle the carriage returns in your program. If you open it as ASCII, the carriage returns are automatically removed by the read routines.

Question 14-2: The routine fputc can be used to write out a single byte of a binary file. Example 14-4 writes out numbers to 127 to a file called test.out. It works just fine on UNIX, creating a 128-byte-long file; however, on MS-DOS/Windows, the file contains 129 bytes. Why?

Example 14-4. wbin/wbin.c

[File: wbin/wbin.c]

#include <stdio.h>

#include <stdlib.h>

#ifndef __MSDOS__

#include <unistd.h>

#endif __MSDOS__

int main()

{

int cur_char; /* current character to write */

FILE *out_file; /* output file */

out_file = fopen("test.out", "w");

if (out_file == NULL) {

fprintf(stderr,"Cannot open output file\n");

exit (8);

}

for (cur_char = 0; cur_char < 128; ++cur_char) {

fputc(cur_char, out_file);

}

fclose(out_file);

return (0);

}

Hint: Here is a hex dump of the MS-DOS/Windows file:

000:0001 0203 0405 0607 0809 0d0a 0b0c 0d0e

010:0f10 1112 1314 1516 1718 191a 1b1c 1d1e

020:1f20 2122 2324 2526 2728 292a 2b2c 2d2e

030:2f30 3132 3334 3536 3738 393a 3b3c 3d3e

040:3f40 4142 4344 4546 4748 494a 4b4c 4d4e

050:4f50 5152 5354 5556 5758 595a 5b5c 5d5e

060:5f60 6162 6364 6566 6768 696a 6b6c 6d6e

070:6f70 7172 7374 7576 7778 797a 7b7c 7d7e

080:7f

(Click here for the answer Section 14.8.)

UNIX programmers don’t have to worry about the C library automatically fixing their ASCII files. In UNIX, a file is a file and ASCII is no different from binary. In fact, you can write a half-ASCII, half-binary file if you want to.

Binary I/O

Binary I/O is accomplished through two routines: fread and fwrite. The syntax for fread is:

read_size = fread(data_ptr, 1, size, file);

where:

read_size

is the size of the data that was read. If this value is less than size, then an end-of-file or error was encountered.

data_ptr

is the pointer to the data to be read. This pointer must be cast to a character point (char *) if the data is any type other than a character.

size

is the number of bytes to be read.

file

is the input file.

For example:

struct {

int width;

int height;

} rectangle;

int read_size;

read_size = fread((char *)&rectangle, 1, sizeof(rectangle), in_file);

if (read_size != sizeof(rectangle)) {

fprintf(stderr,"Unable to read rectangle\n");

exit (8);

}

In this example, we are reading in the structure rectangle. The & operator makes it into a pointer. The sizeof operator is used to determine how many bytes to read in, as well as to check that the read was successful.

fwrite has a calling sequence similar to fread:

write_size = fwrite(data_ptr, 1, size, file);

NOTE

In order to make programming simpler and easier, we always use one as the second parameter to fread and fwrite. Actually, fread and fwrite are designed to read an array of objects. The second parameter is the size of the object and the third parameter is the number of objects. For a full description of these functions, see your C reference manual.

Buffering Problems

Buffered I/O does not write immediately to the file. Instead, the data is kept in a buffer until there is enough for a big write, or until it is flushed. The following program is designed to print out a progress message as each section is finished:

printf("Starting");

do_step_1();

printf("Step 1 complete");

do_step_2();

printf("Step 2 complete");

do_step_3();

printf("Step 3 complete\n");

Instead of writing the messages as each step is completed, the printf function puts them in a buffer. Only when the program is finished does the buffer get flushed and all the messages come spilling out at once.

The routine fflush will force the flushing of the buffers. Properly written, our example should be:

printf("Starting");

fflush(stdout);

do_step_1();

printf("Step 1 complete");

fflush(stdout);

do_step_2();

printf("Step 2 complete");

fflush(stdout);

do_step_3();

printf("Step 3 complete\n");

fflush(stdout);

Unbuffered I/O

In buffered I/O, data is buffered and then sent to the file. In unbuffered I/O, the data is immediately sent to the file.

If you drop a number of paper clips on the floor, you can pick them up in buffered or unbuffered mode. In buffered mode, you use your right hand to pick up a paper clip and transfer it to your left hand. The process is repeated until your left hand is full, and then you dump a handful of paper clips into the box on your desk.

In unbuffered mode, you pick up a paper clip and dump it immediately into the box. There is no left-hand buffer.

In most cases, buffered I/O should be used instead of unbuffered. In unbuffered I/O, each read or write requires a system call. Any call to the operating system is expensive. Buffered I/O minimizes these calls.

Unbuffered I/O should be used only when reading or writing large amounts of binary data or when direct control of a device or file is required.

Back to our paper clip example—if we were picking up small items like paperclips, we would probably use a left-hand buffer. But if we were picking up cannonballs (which are much larger), no buffer would be used.

The open system call is used for opening an unbuffered file. The macro definitions used by this call differ from system to system. We are using both UNIX and MS-DOS/Windows, so we have used conditional compilation (#ifdef/#endif) to bring in the correct files:

#ifndef __MSDOS__ /* if we are not MS-DOS */

#define __UNIX__ /* then we are UNIX */

#endif __MSDOS__

#ifdef __UNIX__

#include <sys/types.h> /* file defines for UNIX filesystem */

#include <sys/stat.h>

#include <fcntl.h>

#endif __UNIX__

#ifdef __MSDOS__

#include <stdlib.h>

#include <fcntl.h> /* file defines for DOS filesystem */

#include <sys\stat.h>

#include <io.h>

#endif __MSDOS__

intfile_descriptor;

file_descriptor = open(name, flags); /* existing file */

file_descriptor = open(name, flags, mode); /*new file */

where:

file_descriptor

is an integer that is used to identify the file for the read, write, and close calls. If file descriptor is less than zero, an error occurred.

name

is the name of the file.

flags

are defined in the fcntl.h header file. Flags are described in Table 14-2.

mode

is the protection mode for the file. Normally, this is 0644 for most files.

Table 14-2. Open Flags

Flag	Meaning
O_RDONLY	Open for reading only
O_WRONLY	Open for writing only
O_RDWR	Open for reading and writing
O_APPEND	Append new data at the end of the file
O_CREAT	Create file (mode is required when this flag is present)
O_TRUNC	If the file exists, truncate it to zero length
O_EXCL	Fail if file exists
O_BINARY	Open in binary mode (MS-DOS/Windows only)
O_TEXT	Open in text mode (MS-DOS/Windows only)

For example, to open the existing file data.txt in text mode for reading, we use the following:

data_fd = open("data.txt", O_RDONLY);

The next example shows how to create a file called output.dat for writing only:

out_fd = open("output.dat", O_CREAT|O_WRONLY, 0644);

Notice that we combined flags using the or operator (|). This is a quick and easy way of merging multiple flags.

When any program is initially run, three files are already opened. These are described in Table 14-3.

Table 14-3. Standard Unbuffered Files

File number	Symbolic name	Description
0	STDIN_FILENO	Standard in
1	STDOUT_FILENO	Standard out
2	STDERR_FILENO	Standard err

The symbolic names are defined in the header file unistd.h. These are a relatively new part of the language and are very rarely used. (You really should use the symbolic names, but most people don’t.)

The format of the read call is:

read_size = read(file_descriptor, buffer, size);

where:

read_size

is the number of bytes read. Zero indicates end-of-file, and a negative number indicates an error.

file_descriptor

is the file descriptor of an open file.

buffer

is the pointer to the place to read the data.

size

is the size of the data to be read.

The format of a write call is:

write_size = write(file_descriptor, buffer, size);

where:

write_size

is the number of bytes written. A negative number indicates an error.

file_descriptor

is the file descriptor of an open file.

buffer

is the pointer to the data to be written.

size

is the size of the data to be written.

Finally, the close call closes the file:

flag = close(file_descriptor)

where:

flag

is zero for success, negative for error.

file_descriptor

is the file descriptor of an open file.

Chapter 14 copies a file. Unbuffered I/O is used because of the large buffer size. Using buffered I/O to read 1K of data into a buffer and then transfer it into a 16K buffer makes no sense.

Example 14-5. copy2/copy2.c

[File: copy2/copy2.c]

/****************************************

* copy -- Copies one file to another. *

* *

* Usage *

* copy <from> <to> *

* *

* <from> -- The file to copy from. *

* <to> -- The file to copy into. *

****************************************/

#include <stdio.h>

#ifndef __MSDOS__ /* if we are not MS-DOS */

#define __UNIX__ /* then we are UNIX */

#endif /* __MSDOS__ */

#include <stdlib.h>

#ifdef __UNIX__

#include <sys/types.h> /* file defines for UNIX filesystem */

#include <sys/stat.h>

#include <fcntl.h>

#include <unistd.h>

#endif /* __UNIX__ */

#ifdef __MSDOS__

#include <fcntl.h> /* file defines for DOS filesystem */

#include <sys\stat.h>

#include <io.h>

#endif __MSDOS__

#ifndef O_BINARY

#define O_BINARY 0 /* Define the flag if not defined yet */

#endif /* O_BINARY */

#define BUFFER_SIZE (16 * 1024) /* use 16K buffers */

int main(int argc, char *argv[])

{

char buffer[BUFFER_SIZE]; /* buffer for data */

int in_file; /* input file descriptor */

int out_file; /* output file descriptor */

int read_size; /* number of bytes on last read */

if (argc != 3) {

fprintf(stderr, "Error:Wrong number of arguments\n");

fprintf(stderr, "Usage is: copy <from> <to>\n");

exit(8);

}

in_file = open(argv[1], O_RDONLY|O_BINARY);

if (in_file < 0) {

fprintf("Error:Unable to open %s\n", argv[1]);

exit(8);

}

out_file = open(argv[2], O_WRONLY|O_TRUNC|O_CREAT|O_BINARY, 0666);

if (out_file < 0) {

fprintf("Error:Unable to open %s\n", argv[2]);

exit(8);

}

while (1) {

read_size = read(in_file, buffer, sizeof(buffer));

if (read_size == 0)

break; /* end of file */

if (read_size < 0) {

fprintf(stderr, "Error:Read error\n");

exit(8);

}

write(out_file, buffer, (unsigned int) read_size);

}

close(in_file);

close(out_file);

return (0);

}

Question 14-3: Why does Example 14-5 dump core instead of printing an error message, if it can’t open the input file? (Click here for Section 14.8)

Several things should be noted about this program. First of all, the buffer size is defined as a constant, so it is easily modified. Rather than have to remember that 16K is 16384, the programmer used the expression (16 * 1024). This form of the constant is obviously 16K.

If the user improperly uses the program, an error message results. To help get it right, the message tells how to use the program.

We may not read a full buffer for the last read. Consequently, read_size is used to determine the number of bytes to write.

Designing File Formats

Suppose you are designing a program to produce a graph. The height, width, limits, and scales are to be defined in a graph configuration file. You are also assigned to write a user-friendly program that asks the operator questions and then writes a configuration file so that he does not have to learn the text editor. How should you design a configuration file?

One way would be as follows:

height (in inches)

width (in inches)

x lower limit

x upper limit

y lower limit

y upper limit

x scale

y scale

A typical plotter configuration file might look like:

10.0

7.0

100

300

0.5

2.0

This file does contain all the data, but in looking at it, you have difficulty telling what, for example, is the value of the y lower limit. A solution is to comment the file. That is, to have the configuration program write out not only the data, but a string describing the data.

10.0 height (in inches)

7.0 width (in inches)

0 x lower limit

100 x upper limit

30 y lower limit

300 y upper limit

0.5 x scale

2.0 y scale

Now the file is user readable. But suppose that one of the users runs the plot program and types in the wrong filename, and the program gets the lunch menu for today instead of a plot configuration file. The program is probably going to get very upset when it tries to construct a plot whose dimensions are “BLT on white” versus “Meatloaf and gravy.”

The result is that you wind up with egg on your face. There should be some way of identifying this file as a plot configuration file. One method of doing so is to put the words “Plot Configuration File” on the first line of the file. Then, when someone tries to give your program the wrong file, the program will print an error message.

This solution takes care of the wrong-file problem, but what happens when you are asked to enhance the programs and add optional logarithmic plotting? You could simply add another line to the configuration file, but what about all those old files? You cannot reasonably ask everyone to throw them away. The best thing to do (from a user’s point of view) is to accept old format files. You can make this task easier by putting a version number in the file.

A typical file now looks like:

Plot Configuration File V1.0

log Logarithmic or Normal plot

10.0 height (in inches)

7.0 width (in inches)

0 x lower limit

100 x upper limit

30 y lower limit

300 y upper limit

0.5 x scale

2.0 y scale

In binary files, you usually put an identification number in the first four bytes of the file. This number is called the magic number. The magic number should be different for each type of file.

One method for choosing a magic number is to start with the first four letters of the program name (i.e., list) and then convert them to hexadecimal: 0x6c607374. Then, add 0x80808080 to the number, producing a magic number of 0xECE0F3F4.

This algorithm generates a magic number that is probably unique. The high bit is set on each byte to make the byte non-ASCII and avoid confusion between ASCII and binary files.

When reading and writing a binary file containing many different types of structures, a programmer can easily get lost. For example, you might read a name structure when you expected a size structure. This error is usually not detected until later in the program. In order to locate this problem early, the programmer can put magic numbers at the beginning of each structure.

Now, if the program reads the name structure and the magic number is not correct, it knows something is wrong.

Magic numbers for structures do not need to have the high bit set on each byte. Making the magic number just four ASCII characters allows you to easily pick out the beginning of structures in a file dump.

Answers

Answer 14-1: The problem is that fgets gets the entire line including the newline character (\n). If you have a file named sam, the program will read sam\n and try to look for a file by that name. Because there is no such file, the program reports an error.

The fix is to strip the newline character from the name:

name[strlen(name)-1] = '\0'; /* get rid of last character */

The error message in this case is poorly designed. True, we did not open the file, but the programmer could supply the user with more information. Are we trying to open the file for input or output? What is the name of the file we are trying to open? We don’t even know if the message we are getting is an error, a warning, or just part of the normal operation. A better error message is:

fprintf(stderr,"Error:Unable to open %s for input\n", name);

Notice that this message would also help us detect the programming error. When we typed in sam, the error would be:

Error:Unable to open sam

for input

This message clearly shows us that we are trying to open a file with a newline in its name.

Answer 14-2: The problem is that we are writing an ASCII file, but we wanted a binary file. On UNIX, ASCII is the same as binary, so the program runs fine. On MS-DOS/Windows, the end-of-line problem causes us problems. When we write a newline character (0x0a) to the file, a carriage return (0x0D) is added to it. (Remember that end-of-line on MS-DOS/Windows is <RETURN> <LINE FEED>, or 0x0d, 0x0a.) Because of this editing, we get an extra carriage return (0x0d) in the output file.

In order to write binary data (without output editing), we need to open the file with the binary option:

out_file = fopen("test.out", "wb");

Answer 14-3: The problem is with the fprintf call. The first parameter of an fprintf should be a file; instead, it is the format string. Trying to use a format string when the program is expecting a file causes a core dump.

Programming Exercises

Exercise 14-1: Write a program that reads a file and then counts the number of lines in it.

Exercise 14-2: Write a program to copy a file, expanding all tabs to multiple spaces.

Exercise 14-3: Write a program that reads a file containing a list of numbers, and then writes two files, one with all numbers divisible by three and another containing all the other numbers.

Exercise 14-4: Write a program that reads an ASCII file containing a list of numbers and writes a binary file containing the same list. Write a program that goes the other way so that you can check your work.

Exercise 14-5: Write a program that copies a file and removes all characters with the high bit set (((ch & 0x80) != 0)).

Exercise 14-6: Design a file format to store a person’s name, address, and other information. Write a program to read this file and produce a set of mailing labels.