Characters - C Programming For Beginners (2015)

C Programming For Beginners (2015)

CHAPTER 6

Characters

In this chapter, we will explain the following:

§ Some important features of character sets

§ How to work with character constants and values

§ How to declare character variables in C

§ How you can use characters in arithmetic expressions

§ How to read, manipulate and print characters

§ How to test for end-of-line using \n

§ How to test for end-of-file using EOF

§ How to compare characters

§ How to read characters from a file

§ How to convert a number from character to integer

6.1 Character Sets

Most of us are familiar with a computer or typewriter keyboard (called the standard English keyboard). On it, we can type the letters of the alphabet (both uppercase and lowercase), the digits and other ‘special’ characters like +, =, <, >, & and %—these are the so-called printable characters.

On a computer, each character is assigned a unique integer value, called its code. This code may be different from one computer to another depending on the character set being used. For example, the code for A might be 33 on one computer but 65 on another.

Inside the computer, this integer code is stored as a sequence of bits; for example, the 6-bit code for 33 is 100001 and the 7-bit code for 65 is 1000001.

Nowadays, most computers use the ASCII (American Standard Code for Information Interchange) character set for representing characters. This is a 7-bit character standard which includes the letters, digits and special characters found on a standard keyboard. It also includes control characters such as backspace, tab, line feed, form feed and carriage return.

The ASCII codes run from 0 to 127 (the range of numbers which can be stored using 7 bits). The ASCII character set is shown in Appendix B. Interesting features to note are:

§ The digits 0 to 9 occupy codes 48 to 57.

§ The uppercase letters A to Z occupy codes 65 to 90.

§ The lowercase letters a to z occupy codes 97 to 122.

Note, however, that even though the ASCII set is defined using a 7-bit code, it is stored on most computers in 8-bit bytes—a 0 is added at the front of the 7-bit code. For example, the 7-bit ASCII code for A is 1000001; on a computer, it is stored as 01000001, occupying one byte.

In this book, as far as is possible, we will write our programs making no assumptions about the underlying character set. Where it is unavoidable, we will assume that the ASCII character set is used. For instance, we may need to assume that the uppercase letters are assigned consecutive codes; similarly for lowercase letters. This may not necessarily be true for another character set. Even so, we will not rely on the specific values of the codes, only that they are consecutive.

6.2 Character Constants and Values

A character constant is a single character enclosed in single quotes such as 'A', '+' and '5'. Some characters cannot be represented like this because we cannot type them. Others play a special role in C (e.g. ', \). For these, we use an escape sequence enclosed in single quotes. Some examples are shown in the following table:

char

description

code

'\n'

new line

10

'\f'

form feed

12

'\t'

tab

9

'\''

single quote

39

'\\'

backslash

92

The character constant '\0' is special in C; it is the character whose code is 0, normally referred to as the null character. One of its special uses is to indicate the end of a string in memory (see Chapter 8).

The character value of a character constant is the character represented, without the single quotes. Thus, the character value of 'T' is T and the character value of '\\' is \.

A character constant has an integer value associated with it—the numeric code of the character represented. Thus, the integer value of 'T' is 84 since the ASCII code for T is 84. The integer value of '\\' is 92 since the ASCII code for \ is 92. And the integer value of '\n' is 10 since the ASCII code for the newline character is 10.

We could print the character value using the specification %c in printf and we could print the integer value using %d. For example, the statement

printf("Character: %c, Integer: %d\n", 'T', 'T');

will print

Character: T, Integer: 84

6.3 The Type char

In C, we use the keyword char to declare a variable in which we wish to store a character. For example, the statement

char ch;

declares ch as a character variable. We could, for instance, assign a character constant to ch, as follows:

ch = 'R'; //assign the letter R to ch

ch = '\n'; //assign the newline character, code 10, to ch

We could print the character value of a character variable using %c in printf. And we could print the integer value of a character variable using %d. For instance,

ch = 'T';

printf("Mr. %c\n", ch);

printf("Mr. %d\n", ch);

will print

Mr. T

Mr. 84

6.4 Characters in Arithmetic Expressions

C allows us to use variables and constants of type char directly in arithmetic expressions. When we do, it uses the integer value of the character. For example, the statement

int n = 'A' + 3;

assigns 68 to n since the code for 'A' is 65.

Similarly, we can assign an integer value to a char variable. For example,

char ch = 68;

In this case, “the character whose code is 68” is assigned to ch; this character is 'D'.

For a more useful example, consider the following:

int d = '5' - '0';

The integer 5 is assigned to d since the code for '5' is 53 and the code for '0' is 48.

Note that the code for a digit in character form is not the same as the value of the digit; for instance, the code for the character '5' is 53 but the value of the digit 5 is 5. Sometimes we know that a character variable contains a digit and we want to get the (integer) value of the digit.

The above statements show how we can get the value of the digit—we simply subtract the code for '0' from the code for the digit. It does not matter what the actual codes for the digits are; it matters only that the codes for 0 to 9 are consecutive. (Exercise: check this for yourself assuming a different set of code values for the digits.)

In general, if ch contains a digit character ('0' to '9'), we can obtain the integer value of the digit with the statement

d = ch - '0';

6.4.1 Uppercase To/From Lowercase

Suppose ch contains an uppercase letter and we want to convert it to its equivalent lowercase letter. For example, assume ch contains 'H' and we want to change it to 'h'. First we observe that the ASCII codes for 'A' to 'Z' range from 65 to 90 and the codes for 'a' to 'z' range from 97 to 122. We further observe that the difference between the codes for the two cases of a letter is always 32; for example,

'r' - 'R' = 114 – 82 = 32

Hence we can convert a letter from uppercase to lowercase by adding 32 to the uppercase code. This can be done with

ch = ch + 32;

If ch contains 'H' (code 72), the above statement adds 32 to 72 giving 104; the “character whose code is 104” is assigned to ch, that is, 'h'. We have changed the value of ch from 'H' to 'h'. Conversely, to convert a letter from lowercase to uppercase, we subtract 32 from the lowercase code.

By the way, we do not really need to know the codes for the letters. All we need is the difference between the uppercase and lowercase codes. We can let C tell us what the difference is by using 'a' - 'A', like this:

ch = ch + 'a' - 'A';

This works no matter what the actual codes for the letters are. It assumes, of course, that ch contains an uppercase letter and the difference between the uppercase and lowercase codes is the same for all letters.

6.5 Read and Print Characters

Many programs revolve around the idea of reading and writing one character at a time and developing the skill of writing such programs is a very important aspect of programming. We can use scanf to read a single character from the standard input (the keyboard) into a char variable (ch, say) with:

scanf("%c", &ch);

The next character in the data is stored in ch. It is very important to note a big difference between reading a number and reading a character. When reading a number, scanf will skip over any amount of whitespace until it finds the number. When reading a character, the very next character (whatever it is, even if it's a space) is stored in the variable.

While we can use scanf, reading a character is important enough that C provides a special function getchar for reading characters from the standard input. (Strictly speaking, getchar is what’s called a macro, but the distinction is not important for our purposes.) For the most part, we can think that getchar returns the next character in the data. However, it actually returns the numeric code of the next character. For this reason, it is usually assigned to an intvariable, as in:

int c = getchar(); // the brackets are required

But it can also be assigned to a char variable, as in:

char ch = getchar(); // the brackets are required

To be precise, getchar returns the next byte in the data—to all intents and purposes, this is the next character. If we call getchar when there is no more data, it returns -1.

To be more precise, it returns the value designated by the symbolic constant EOF (all uppercase) defined in stdio.h. This value is usually, though not always, -1. The actual value is system-dependent but EOF will always denote the value returned on the system on which the program is run. We can, of course, always find out what value is returned by printing EOF, thus:

printf("Value of EOF is %d \n", EOF);

For an example, consider the statement:

char ch = getchar();

Suppose the data typed by the user is this:

Hello

When ch = getchar() is executed, the first character H is read and stored in ch. We can then use ch in whatever way we like. Suppose we just want to print the first character read. We could use:

printf("%c \n", ch);

This would print

H

on a line by itself. We could, of course, label our output as in the following statement:

printf("The first character is %c \n", ch);

This would print

The first character is H

Finally, we don’t even need ch. If all we want to do is print the first character in the data, we could do so with:

printf("The first character is %c \n", getchar());

If we want to print the numeric code of the first character, we could do so by using the specification %d instead of %c. These ideas are incorporated in Program P6.1.

Program P6.1

//read the first character in the data, print it,

//its code and the value of EOF

#include <stdio.h>

int main() {

printf("Type some data and press 'Enter' \n");

char ch = getchar();

printf("\nThe first character is %c \n", ch);

printf("Its code is %d \n", ch);

printf("Value of EOF is %d \n", EOF);

}

The following is a sample run:

Type some data and press 'Enter'

Hello

The first character is H

Its code is 72

Value of EOF is -1

A word of caution: we might be tempted to write the following:

printf("The first character is %c \n", getchar());

printf("Its code is %d \n", getchar()); // wrong

But if we did, and assuming that Hello is typed as input, these statements will print:

The first character is H

Its code is 101

Why? In the first printf, getchar returns H which is printed. In the second printf, getchar returns the next character which is e; it is e’s code (101) that is printed.

In Program P6.1, we could use an int variable (n, say) instead of ch and the program would work in an identical manner. If an int variable is printed using %c, the last (rightmost) 8 bits of the variable are interpreted as a character and this character is printed. For example, the code for H is 72 which is 01001000 in binary, using 8 bits. Assuming n is a 16-bit int, when H is read, the value assigned to n will be

00000000 01001000

If n is now printed with %c, the last 8 bits will be interpreted as a character which, of course, is H.

Similarly, if an int value n is assigned to a char variable (ch, say), the last 8 bits of n will be assigned to ch.

As mentioned, getchar returns the integer value of the character read. What does it return when the user presses “Enter” or “Return” on the keyboard? It returns the newline character \n, whose code is 10. This can be seen using Program P6.1. When the program is waiting for you to type data, if you press the “Enter” or “Return” key only , the first lines of output would be as follows (note the blank line):

The first character is

Its code is 10

Why the blank line? Since ch contains \n, the statement

printf("\nThe first character is %c \n", ch);

is effectively the same as the following (with %c replaced by the value of ch)

printf("\nThe first character is \n \n");

The \n after is ends the first line and the last \n ends the second line, effectively printing a blank line. Note, however, that the code for \n is printed correctly.

In Program P6.1, we read just the first character. If we want to read and print the first 3 characters, we could do this with Program P6.2.

Program P6.2

//read and print the first 3 characters in the data

#include <stdio.h>

int main() {

printf("Type some data and press 'Enter' \n");

for (int h = 1; h <= 3; h++) {

char ch = getchar();

printf("Character %d is %c \n", h, ch);

}

}

The following is a sample run of the program:

Type some data and press 'Enter'

Hi, how are you?

Character 1 is H

Character 2 is i

Character 3 is ,

If we want to read and print the first 20 characters, all we have to do is change 3 to 20 in the for statement.

Suppose the first part of the data line contains an arbitrary number of blanks, including none. How do we find and print the first non-blank character? Since we do not know how many blanks to read, we cannot say something like “read 7 blanks, then the next character”.

More likely, we need to say something like “as long as the character read is a blank, keep reading”. We have the notion of doing something (reading a character) as long as some ‘condition’ is true; the condition here is whether the character is a blank. This can be expressed more concisely as follows:

read a character

while the character read is a blank

read the next character

Program P6.3 shows how to read the data and print the first non-blank character. (This code will be written more concisely later in this section.)

Program P6.3

//read and print the first non-blank character in the data

#include <stdio.h>

int main() {

printf("Type some data and press 'Enter' \n");

char ch = getchar(); // get the first character

while (ch == ' ') // as long as ch is a blank

ch = getchar(); // get another character

printf("The first non-blank is %c \n", ch);

}

The following is a sample run of the program (◊ denotes a blank):

Type some data and press 'Enter'

◊◊◊Hello

The first non-blank is H

The program will locate the first non-blank character regardless of how many blanks precede it.

As a reminder of how the while statement works, consider the following portion of code from Program P6.3 with different comments:

char ch = getchar(); //executed once; gives ch a value

//to be tested in the while condition

while (ch == ' ')

ch = getchar(); //executed as long as ch is ' '

and suppose the data entered is (◊ denotes a space):

◊◊◊Hello

The code will execute as follows:

1. The first character is read and stored in ch; it is a blank.

2. The while condition is tested; it is true.

3. The while body ch = getchar(); is executed and the second character is read and stored in ch; it is a blank.

4. The while condition is tested; it is true.

5. The while body ch = getchar(); is executed and the third character is read and stored in ch; it is a blank.

6. The while condition is tested; it is true.

7. The while body ch = getchar(); is executed and the fourth character is read and stored in ch; it is H.

8. The while condition is tested; it is false.

9. Control goes to the printf which prints

10. The first non-blank is H

What if H was the very first character in the data? The code will execute as follows:

1. The first character is read and stored in ch; it is H.

2. The while condition is tested; it is false.

3. Control goes to the printf which prints

4. The first non-blank is H

It still works! If the while condition is false the first time it is tested, the body is not executed at all.

As another example, suppose we want to print all characters up to, but not including, the first blank. To do this, we could use Program P6.4.

Program P6.4

//print all characters before the first blank in the data

#include <stdio.h>

int main() {

printf("Type some data and press 'Enter' \n");

char ch = getchar(); // get the first character

while (ch != ' ') { // as long as ch is NOT a blank

printf("%c \n", ch); // print it

ch = getchar(); // and get another character

}

}

The following is a sample run of P6.4:

Type some data and press 'Enter'

Way to go

W

a

y

The body of the while consists of two statements. These are enclosed by { and } to satisfy C’s rule that the while body must be a single statement or a block. Here, the body is executed as long as the character read is not a blank—we write the condition using != (not equal to).

If the character is not a blank, it is printed and the next character read. If that is not a blank, it is printed and the next character read. If that is not a blank, it is printed and the next character read. And so on, until a blank character isread, making the while condition false, causing an exit from the loop.

We would be amiss if we didn’t enlighten you about some of the expressive power in C. For instance, in Program P6.3, we could have read the character and tested it in the while condition. We could have rewritten the following three lines:

ch = getchar(); // get the first character

while (ch == ' ') // as long as ch is a blank

ch = getchar(); // get another character

as one line

while ((ch = getchar()) == ' '); // get a character and test it

ch = getchar() is an assignment expression whose value is the character assigned to ch, that is, the character read. This value is then tested to see if it is a blank. The brackets around ch = getchar() are required since == has higher precedence than =. Without them, the condition would be interpreted as ch = (getchar() == ' '). This would assign the value of a condition (which, in C, is 0 for false or 1 for true) to the variable ch; this is not what we want.

Now that we have moved the statement in the body into the condition, the body is empty; this is permitted in C. The condition would now be executed repeatedly until it becomes false.

To give another example, in Program 6.4, consider the following code:

char ch = getchar(); // get the first character

while (ch != ' ') { // as long as ch is NOT a blank

printf("%c \n", ch) // print it

ch = getchar(); // and get another character

}

This could be re-coded as follows (assuming ch is declared before the loop):

while ((ch = getchar()) != ' ') // get a character

printf("%c \n", ch); // print it if non-blank; repeat

Now that the body consists of just one statement, the braces are no longer required. Five lines have been reduced to two!

6.6 Count Characters

Program P6.3 prints the first non-blank character. Suppose we want to count how many blanks there were before the first non-blank. We could use an integer variable numBlanks to hold the count. Program P6.5 is the modified program for counting the leading blanks.

Program P6.5

//find and print the first non-blank character in the data;

// count the number of blanks before the first non-blank

#include <stdio.h>

int main() {

char ch;

int numBlanks = 0;

printf("Type some data and press 'Enter' \n");

while ((ch = getchar()) == ' ') // repeat as long as ch is blank

numBlanks++; // add 1 to numBlanks

printf("The number of leading blanks is %d \n", numBlanks);

printf("The first non-blank is %c \n", ch);

}

The following is a sample run of the program (◊ denotes a space):

Type some data and press 'Enter'

◊◊◊◊Hello

The number of leading blanks is 4

The first non-blank is H

Comments on Program P6.5:

§ numBlanks is initialized to 0 before the while loop.

§ numBlanks is incremented by 1 inside the loop so that numBlanks is incremented each time the loop body is executed. Since the loop body is executed when ch contains a blank, the value of numBlanks is always the number of blanks read so far.

§ When we exit the while loop, the value in numBlanks will be the number of blanks read. This value is then printed.

§ Observe that if the first character in the data were non-blank, the while condition would be immediately false and control will go directly to the first printf statement with numBlanks having the value 0. The program will print, correctly:

§ The number of leading blanks is 0

6.6.1 Count Characters in a Line

Suppose we want to count the number of characters in a line of input. Now we must read characters until the end of the line. How does our program test for end-of-line? Recall that when the “Enter” or “Return” key is pressed by the user, the newline character, \n, is returned by getchar. The following while condition reads a character and tests for \n.

while ((ch = getchar()) != '\n')

Program P6.6 reads a line of input and counts the number of characters in it, not counting the “end-of-line” character.

Program P6.6

//count the number of characters in the input line

#include <stdio.h>

int main() {

char ch;

int numChars = 0;

printf("Type some data and press 'Enter' \n");

while ((ch = getchar()) != '\n') // repeat as long as ch is not \n

numChars++; // add 1 to numChars

printf("The number of characters is %d \n", numChars);

}

The main difference between this and Program P6.5 is that this one reads characters until the end of the line rather than until the first non-blank. A sample run is:

Type some data and press 'Enter'

One moment in time

The number of characters is 18

6.7 Count Blanks in a Line of Data

Suppose we want to count all the blanks in a line of data. We must still read characters until the end of the line is encountered. But now, for each character read, we must check whether it is a blank. If it is, the count is incremented. We would need two counters—one to count the number of characters in the line and the other to count the number of blanks. The logic could be expressed as:

set number of characters and number of blanks to 0

while we are not at the end-of-line

read a character

add 1 to number of characters

if character is a blank then add 1 to number of blanks

endwhile

This logic is implemented as shown in Program P6.7.

Program P6.7

//count the number of characters and blanks in the input line

#include <stdio.h>

int main() {

char ch;

int numChars = 0;

int numBlanks = 0;

printf("Type some data and press 'Enter' \n");

while ((ch = getchar()) != '\n') { // repeat as long as ch is not \n

numChars++; // add 1 to numChars

if (ch == ' ') numBlanks++; // add 1 if ch is blank

}

printf("\nThe number of characters is %d \n", numChars);

printf("The number of blanks is %d \n", numBlanks);

}

Here is a sample run:

Type some data and press 'Enter'

One moment in time

The number of characters is 18

The number of blanks is 3

The if statement tests the condition ch == ' '; if it is true (that is, ch contains a blank), numBlanks is incremented by 1. If it is false, numBlanks is not incremented; control would normally go to the next statement within the loop but there is none (the if is the last statement). Therefore, control goes back to the top of the while loop, where another character is read and tested for \n.

6.8 Compare Characters

Characters can be compared using the relational operators ==, !=, <, <=, > and >=. We’ve compared the char variable ch with a blank using ch == ' ' and ch != ' '.

Let us now write a program to read a line of data and print the ‘largest’ character, that is, the character with the highest code. For instance, if the line consisted of English words, the letter which comes latest in the alphabet would be printed. (Recall, though, that lowercase letters have higher codes than uppercase letters so that, for instance, 'g' is greater than 'T'.)

‘Finding the largest character’ involves the following steps:

§ Choose a variable to hold the largest value; we choose bigChar.

§ Initialize bigChar to a very small value. The value chosen should be such that no matter what character is read, its value would be greater than this initial value. For characters, we normally use '\0'—the null character, the ‘character’ with a code of 0.

§ As each character (ch, say) is read, it is compared with bigChar; if ch is greater than bigChar, then we have a ‘larger’ character and bigChar is set to this new character.

§ When all the characters have been read and checked, bigChar will contain the largest one.

These ideas are expressed in Program P6.8.

Program P6.8

//read a line of data and find the 'largest' character

#include <stdio.h>

int main() {

char ch, bigChar = '\0';

printf("Type some data and press 'Enter' \n");

while ((ch = getchar()) != '\n')

if (ch > bigChar) bigChar = ch; //is this character bigger?

printf("\nThe largest character is %c \n", bigChar);

}

The following is a sample run; u is printed since its code is the highest of all the characters typed.

Type some data and press 'Enter'

Where The Mind Is Without Fear

The largest character is u

6.9 Read Characters From a File

In our examples so far, we have read characters typed at the keyboard. If we want to read characters from a file (input.txt, say), we must declare a file pointer (in, say) and associate it with the file using

FILE * in = fopen("input.txt", "r");

Once this is done, we could read the next character from the file into a char variable (ch, say) with this statement:

fscanf(in, "%c", &ch);

However, C provides the more convenient function getc (get a character) for reading a character from a file. It is used as follows:

ch = getc(in);

getc takes one argument, the file pointer (not the name of the file). It reads and returns the next character in the file. If there are no more characters to read, getc returns EOF. Thus, getc works exactly like getchar except that getcharreads from the keyboard while getc reads from a file.

To illustrate, let us write a program which reads one line of data from a file, input.txt, and prints it on the screen. This is shown as Program P6.9.

Program P6.9

#include <stdio.h>

int main() {

char ch;

FILE *in = fopen("input.txt", "r");

while ((ch = getc(in)) != '\n')

putchar(ch);

putchar('\n');

fclose(in);

}

This program uses the standard function putchar to write a single character to the standard output. (Like getchar, putchar is a macro but the distinction is not important for our purposes.) It takes a character value as its only argument and writes the character in the next position in the output. However, if the character is a control character, the effect of the character is produced. For example,

putchar('\n');

will end the current output line, the same effect as if “Enter” or “Return” is pressed.

The program reads one character at a time from the file and prints it on the screen using putchar. It does this until \n is read, indicating that the entire line has been read. On exit from the while loop, it uses putchar('\n') to terminate the line on the screen.

Be careful, though. This program assumes that the line of data is terminated by an end-of-line character, \n (generated when you press “Enter” or “Return”). However, if the line is not terminated by \n, the program will ‘hang’—it will be caught in a loop from which it cannot get out (we say it will be caught in an infinite loop). Why?

Because the while condition ((ch = getc(in)) != '\n') will never become false (this happens when ch is '\n') since there is no \n to be read. But, as discussed before, when we reach the end-of-file, the value returned by getchar, and now also by getc, is the symbolic constant EOF defined in stdio.h. Knowing this, we could easily fix our problem by testing for \n and EOF in the while condition, thus:

while ((ch = getc(in)) != '\n' && ch != EOF)

Even if \n is not present, getc(in) will return EOF when the end of the file is reached, and the condition ch != EOF would be false, causing an exit from the loop.

6.10 Write Characters to a File

Suppose we want to write characters to a file (output.txt, say). As always, we must declare a file pointer (out, say) and associate it with the file using

FILE * out = fopen("output.txt", "w");

If ch is a char variable, we can write the value of ch to the file with

fprintf(out, "%c", ch);

C also provides the function putc (put a character) to do the same job. To write the value of ch to the file associated with out, we must write:

putc(ch, out);

Note that the file pointer is the second argument to putc.

6.10.1 Echo Input, Number Lines

Let us expand the example on the previous page to read data from a file and write back the same data (echo the data) to the screen with the lines numbered starting from 1.

The program would read the data from the file and write it to the screen, thus:

1. First line of data

2. Second line of data

etc.

This problem is a bit more difficult than those we have met so far. When faced with such a problem, it is best to tackle it a bit at a time, solving easier versions of the problem and working your way up to solving the complete problem.

For this problem, we can first write a program which simply echoes the input without numbering the lines. When we get this right, we can tackle the job of numbering the lines.

An outline of the algorithm for this first version is the following:

read a character, ch

while ch is not the end-of-file character

print ch

read a character, ch

endwhile

This will maintain the line structure of the data file since, for instance, when \n is read from the file, it is immediately printed to the screen, forcing the current line to end.

Program P6.10 implements the above algorithm for reading the data from a file and printing an exact copy on the screen.

Program P6.10

#include <stdio.h>

int main() {

char ch;

FILE *in = fopen("input.txt", "r");

while ((ch = getc(in)) != EOF)

putchar(ch);

fclose(in);

}

Now that we can echo the input, we need only figure out how to print the line numbers. A simplistic approach is based on the following outline:

set lineNo to 1

print lineNo

read a character, ch

while ch is not the end-of-file character

print ch

if ch is \n

add 1 to lineNo

print lineNo

endif

read a character, ch

endwhile

We have simply added the statements which deal with the line numbers to the algorithm above. We can easily add the code that deals with the line numbers to Program P6.10 to get Program P6.11. Note that when we print the line number, we do not terminate the line with \n since the data must be written on the same line as the line number.

Program P6.11

//This program prints the data from a file numbering the lines

#include <stdio.h>

int main() {

char ch;

FILE *in = fopen("input.txt", "r");

int lineNo = 1;

printf("%2d. ", lineNo);

while ((ch = getc(in)) != EOF) {

putchar(ch);

if (ch == '\n') {

lineNo++;

printf("%2d. ", lineNo);

}

}

fclose(in);

}

Assume the input file contains the following:

There was a little girl

Who had a little curl

Right in the middle of her forehead

Program P6.11 will print this:

1. There was a little girl

2. Who had a little curl

3. Right in the middle of her forehead

4.

Almost, but not quite, correct! The little glitch is that we print an extra line number at the end. To see why, look at the if statement. When \n of the third data line is read, 1 would be added to lineNo, making it 4, which is printed by the next statement. This printing of an extra line number also holds if the input file is empty, since line number 1 would be printed in this case, but there is no such line.

To get around this problem, we must delay printing the line number until we are sure that there is at least one character on the line. We will use an int variable writeLineNo, initially set to 1. If we have a character to print and writeLineNo is 1, the line number is printed and writeLineNo is set to 0. When writeLineNo is 0, all that happens is that the character just read is printed.

When \n is printed to end a line of output, writeLineNo is set to 1. If it turns out that there is a character to print on the next line, the line number will be printed first since writeLineNo is 1. If there are no more characters to print, nothing further is printed; in particular, the line number is not printed.

Program P6.12 contains all the details. When run, it will number the lines without printing an extra line number.

Program P6.12

//This program prints the data from a file numbering the lines

#include <stdio.h>

int main() {

char ch;

FILE *in = fopen("input.txt", "r");

int lineNo = 0, writeLineNo = 1;

while ((ch = getc(in)) != EOF) {

if (writeLineNo) {

printf("%2d. ", ++lineNo);

writeLineNo = 0;

}

putchar(ch);

if (ch == '\n') writeLineNo = 1;

}

fclose(in);

}

We wrote the if condition as follows:

if (writeLineNo)...

If writeLineNo is 1 the condition evaluates to 1 and is, therefore, true; if it is 0, the condition is false. We could also have written the condition as

if (writeLineNo == 1)...

In the statement

printf("%d. ", ++lineNo);

the expression ++lineNo means that lineNo is incremented first before being printed. By comparison, if we had used lineNo++, then lineNo would be printed first and then incremented.

Exercise: Modify Program P6.12 to send the output to a file, linecopy.txt.

Exercise: Write a program to copy the contents of a file, input.txt, to a file, copy.txt. Hint: you just need to make minor changes to Program P6.10.

6.11 Convert Digit Characters to Integer

Let us consider how we can convert a sequence of digits into an integer. When we type the number 385, we are actually typing three individual characters – '3' then '8' then '5'. Inside the computer, the integer 385 is completely different from the three characters '3' '8' '5'. So when we type 385 and try to read it into an int variable, the computer has to convert this sequence of three characters into the integer 385.

To illustrate, the 8-bit ASCII codes for the characters '3', '8' and '5' are 00110011, 00111000 and 00110101, respectively. When typed to the screen or a file, the digits 385 are represented by this:

00110011 00111000 00110101

Assuming an integer is stored using 16 bits, the integer 385 is represented by its binary equivalent

0000000110000001

Observe that the character representation is quite different from the integer representation. When we ask scanf (or fscanf) to read an integer that we type, it must convert the character representation to the integer representation. We now show how this is done.

The basic step requires us to convert a digit character into its equivalent integer value. For example, we must convert the character '5' (represented by 00110101) into the integer 5 (represented by 0000000000000101).

Assuming that the codes for the digits 0 to 9 are consecutive (as they are in ASCII and other character sets), this can be done as follows:

integer value of digit = code for digit character – code for character '0'

For example, in ASCII, the code for '5' is 53 and the code for '0' is 48. Subtracting 48 from 53 gives us the integer value (5) of the character '5'. Once we can convert individual digits, we can construct the value of the number as we read it from left to right, using the following algorithm:

set num to 0

get a character, ch

while ch is a digit character

convert ch to the digit value, d = ch - '0'

set num to num*10 + d

get a character, ch

endwhile

num now contains the integer value

The sequence of characters 385 is converted as follows:

num = 0

get '3'; convert to 3

num = num*10 + 3 = 0*10 + 3; num is now 3

get '8'; convert to 8

num = num*10 + 8 = 3*10 + 8; num is now 38

get '5'; convert to 5

num = num*10 + 5 = 38*10 + 5; num is now 385

There are no more digits and the final value of num is 385.

Let us use this idea to write a program which reads data character by character until it finds an integer. It constructs and prints the integer.

The program will have to read characters until it finds a digit, the first of the integer. Having found the first digit, it must construct the integer by reading characters as long as it keeps getting a digit. For example, suppose the data was this:

Number of items: 385, all in good condition

The program will read characters until it finds the first digit, 3. It will construct the integer using the 3 and then reading 8 and 5. When it reads the comma, it knows the integer has ended.

This outline can be expressed in pseudocode as follows:

read a character, ch

while ch is not a digit do

read a character, ch

endwhile

//at this point, ch contains a digit

while ch is a digit do

use ch to build the integer

read a character, ch

endwhile

print the integer

How do we test if the character in ch is a digit? We must test if

ch >= '0' && ch <= '9'

If this is true, we know that the character is between '0' and '9', inclusive. Conversely, to test if ch is not a digit, we can test if

ch < '0' || ch > '9'

Putting all these ideas together gives us Program P6.13.

Program P6.13

#include <stdio.h>

int main() {

printf("Type data including a number and press \"Enter\"\n");

char ch = getchar();

// as long as the character is not a digit, keep reading

while (ch < '0' || ch > '9') ch = getchar() ;

// at this point, ch contains the first digit of the number

int num = 0;

while (ch >= '0' && ch <= '9') { // as long as we get a digit

num = num * 10 + ch - '0'; // update num

ch = getchar();

}

printf("Number is %d\n", num);

}

A sample run is shown below:

Type data including a number and press "Enter"

hide the number &(%%)4719&*(&^ here

Number is 4719

This program will find the number, no matter where it is hidden in the line.

EXERCISES 6

1. Give the range of ASCII codes for (a) the digits (b) the uppercase letters (c) the lowercase letters.

2. How is the single quote represented as a character constant?

3. What is the character value of a character constant?

4. What is the numeric value of a character constant?

5. How is the expression 5 + 'T' evaluated? What is its value?

6. What value is assigned to n by n = 7 + 't'?

7. What character is stored in ch by ch = 4 + 'n'?

8. If ch = '8', what value is assigned to d by d = ch - '0'?

9. If ch contains any uppercase letter, explain how to change ch to the equivalent lowercase letter.

10. If ch contains any lowercase letter, explain how to change ch to the equivalent uppercase letter.

11. Write a program to request a line of data and print the first digit on the line.

12. Write a program to request a line of data and print the first letter on the line.

13. Write a program to request a line of data and print the number of digits and letters on the line.

14. Write a program to read a passage from a file and print how many times each vowel appears.

15. Modify Program P6.13 so that it will find negative integers as well.

16. Write a program which reads a file containing a C program and outputs the program to another file with all the // comments removed.

17. Write a program to read the data, character by character, and store the next number (with or without a decimal point) in a double variable (dv, say). For example, given the data

18. Mary works for $43.75 per hour

19. your program should store 43.75 in dv.

20. In the programming language Pascal, comments can be enclosed by { and } or by (* and *). Write a program which reads a data file input.pas containing Pascal code and writes the code to a file output.pas, replacing each { with (* and each } with *). For example, the statements

21. read(ch); {get the first character}

22. while ch = ' ' do {as long as ch is a blank}

23. read(ch); {get another character}

24. writeln('The first non-blank is ', ch);

25. should be converted to

26. read(ch); (*get the first character*)

27. while ch = ' ' do (*as long as ch is a blank*)

28. read(ch); (*get another character*)

29. writeln('The first non-blank is ', ch);

30. You are given the same data as in 18, but now remove the comments altogether.

31. Someone has typed a letter in a file letter.txt, but does not always start the word after a period with a capital letter. Write a program to copy the file to another file format.txt so that all words after a period now begin with a capital letter. Also ensure there is exactly one space after each period. For example, the text

32. Things are fine. we can see you now. let us know when is a good time. bye for now.

33. must be re-written as

34. Things are fine. We can see you now. Let us know when is a good time. Bye for now.