C Preprocessor - Simple Programming - Practical C Programming, 3rd Edition (2011)

Practical C Programming, 3rd Edition (2011)

Part II. Simple Programming

Chapter 10. C Preprocessor

The speech of man is like embroidered tapestries, since like them this has to be extended in order to display its patterns, but when it is rolled up it conceals and distorts them.

—Themistocles

In the early days, when C was still being developed, it soon became apparent that C needed a facility for handling named constants, macros, and include files. The solution was to create a preprocessor that recognized these constructs in the programs before they were passed to the C compiler. The preprocessor is nothing more than a specialized text editor. Its syntax is completely different from that of C, and it has no understanding of C constructs.

The preprocessor was very useful, and soon it was merged into the main C compiler. On some systems, like UNIX, the preprocessor is still a separate program, automatically executed by the compiler wrapper cc. Some of the new compilers, like Turbo C++ and Microsoft Visual C++, have the preprocessor built in.

#define Statement

Example 10-1 initializes two arrays (data and twice). Each array contains 10 elements. Suppose we wanted to change the program to use 20 elements. Then we would have to change the array size (two places) and the index limit (one place). Aside from being a lot of work, multiple changes can lead to errors.

Example 10-1. init2a/init2a.c

int data[10]; /* some data */

int twice[10]; /* twice some data */

int main()

{

int index; /* index into the data */

for (index = 0; index < 10; ++index) {

data[index] = index;

twice[index] = index * 2;

}

return (0);

}

We would like to be able to write a generic program in which we can define a constant for the size of the array, and then let C adjust the dimensions of our two arrays. By using the #define statement, we can do just that. Example 10-2 is a new version of Example 10-1.

Example 10-2. init2b/init2b.c

#define SIZE 20 /* work on 20 elements */

int data[SIZE]; /* some data */

int twice[SIZE]; /* twice some data */

int main()

{

int index; /* index into the data */

for (index = 0; index < SIZE; ++index) {

data[index] = index;

twice[index] = index * 2;

}

return (0);

}

The line #define SIZE 20 acts as a command to a special text editor to globally change SIZE to 20. This line takes the drudgery and guesswork out of making changes.

All preprocessor commands begin with a hash mark (#) in column one. Although C is free format, the preprocessor is not, and it depends on the hash mark’s being in the first column. As we will see, the preprocessor knows nothing about C. It can be (and is) used to edit things other than C programs.

NOTE

You can easily forget that the preprocessor and the C compile use different syntaxes. One of the most common errors new programmers make is to try to use C constructs in a preprocessor directive.

A preprocessor directive terminates at the end-of-line. This format is different from that of C, where a semicolon (;) is used to end a statement. Putting a semicolon at the end of a preprocessor directive can lead to unexpected results. A line may be continued by putting a backslash (\) at the end.

The simplest use of the preprocessor is to define a replacement macro. For example, the command:

#define FOO bar

causes the preprocessor to replace the word “FOO” with the word “bar” everywhere “FOO” occurs. It is common programming practice to use all uppercase letters for macro names. This practice makes telling the difference between a variable (all lowercase) and a macro (all uppercase) very easy.

The general form of a simple define statement is:

#definename substitute-text

where name can be any valid C identifier and substitute-text can be anything. You could use the following definition:

#define FOR_ALL for(i = 0; i < ARRAY_SIZE; i++)

and use it like this:

/*

* Clear the array

*/

FOR_ALL {

data[i] = 0;

}

However, defining macros in this manner is considered bad programming practice. Such definitions tend to obscure the basic control flow of the program. In this example, a programmer who wants to know what the loop does would have to search the beginning of the program for the definition of FOR_ALL.

An even worse practice is to define macros that do large-scale replacement of basic C programming constructs. For example, you can define the following:

#define BEGIN {

#define END }

. . .

if (index == 0)

BEGIN

printf("Starting\n");

END

The problem is that you are no longer programming in C, but in a half-C/half-Pascal mongrel. You can find the extremes to which such mimicry can be taken in the Bourne shell, which uses preprocessor directives to define a language that looks a lot like Algol-68.

Here’s a sample section of code:

IF (x GREATER_THAN 37) OR (Y LESS_THAN 83) THEN

CASE value OF

SELECT 1:

start();

SELECT 3:

backspace();

OTHERWISE:

error();

ESAC

FI

Most programmers encountering this program curse at first, and then use the editor to turn the source back into a reasonable version of C.

The preprocessor can cause unexpected problems because it does not check for correct C syntax. For example, Example 10-3 generates an error on line 11.

Example 10-3. big/big.c

1 #define BIG_NUMBER 10 ** 10

2

3 main()

4 {

5 /* index for our calculations */

6 int index;

7

8 index = 0;

9

10 /* syntax error on next line */

11 while (index < BIG_NUMBER) {

12 index = index * 8;

13 }

14 return (0);

15 }

The problem is in the #define statement on line 1, but the error message points to line 11. The definition in line 1 causes the preprocessor to expand line 11 to look like:

while (index < 10 ** 10)

Because ** is an illegal operator, this expansion generates a syntax error.

Question 10-1: Example 10-4 generates the answer 47 instead of the expected answer 144. Why? (See the hint below.)

Example 10-4. first/first.c

#include <stdio.h>

#define FIRST_PART 7

#define LAST_PART 5

#define ALL_PARTS FIRST_PART + LAST_PART

int main() {

printf("The square of all the parts is %d\n",

ALL_PARTS * ALL_PARTS);

return (0);

}

Hint: The answer may not be readily apparent. Luckily, C allows you to run your program through the preprocessor and view the output. In UNIX, the command:

%cc -E prog.c

will send the output of the preprocessor to the standard output.

In MS-DOS/Windows, the command:

C:>cpp prog.c

will do the same thing.

Running this program through the preprocessor gives us:

# 1 "first.c"

# 1 "/usr/include/stdio.h" 1... listing of data in include file <stdio.h>

# 2 "first.c" 2

main() {

printf("The square of all the parts is %d\n",

7 + 5 * 7 + 5);

return (0);

}

(Click here for the answer Section 10.7.)

Question 10-2: Example 10-5 generates a warning that counter is used before it is set. This warning is a surprise to us because the for loop should set it. We also get a very strange warning, “null effect,” for line 11.

Example 10-5. max/max.c

1 /* warning, spacing is VERY important */

2

3 #include <stdio.h>

4

5 #define MAX =10

6

7 int main()

8 {

9 int counter;

10

11 for (counter =MAX; counter > 0; --counter)

12 printf("Hi there\n");

13

14 return (0);

15 }

Hint: Take a look at the preprocessor output. (Click here for the answer Section 10.7)

Question 10-3: Example 10-6 computes the wrong value for size. Why? (Click here for the answer Section 10.7)

Example 10-6. size/size.c

#include <stdio.h>

#define SIZE 10;

#define FUDGE SIZE -2;

int main()

{

int size;/* size to really use */

size = FUDGE;

printf("Size is %d\n", size);

return (0);

}

Question 10-4: Example 10-7 is supposed to print the message “Fatal Error: Abort” and exit when it receives bad data. But when it gets good data, it exits. Why? (Click here for the answer Section 10.7)

Example 10-7. die/die.c

1 #include <stdio.h>

2 #include <stdlib.h>

3

4 #define DIE \

5 fprintf(stderr, "Fatal Error:Abort\n");exit(8);

6

7 int main() {

8 /* a random value for testing */

9 int value;

10

11 value = 1;

12 if (value < 0)

13 DIE;

14

15 printf("We did not die\n");

16 return (0);

17 }

#define vs. const

The const keyword is relatively new. Before const, #define was the only keyword available to define constants, so most older code uses #define directives. However,the useof const is preferred over #define for several reasons. First of all, C checks the syntax of const statements immediately. The #define directive is not checked until the macro is used. Also const uses C syntax, while the #define has a syntax all its own. Finally, const follows normal C scope rules, while constantsdefined by a #define directive continue on forever.

So, in most cases, a const statement is preferred over #define. Here are two ways of defining the same constant:

#define MAX 10 /* Define a value using the preprocessor */

/* (This definition can easily cause problems) */

const int MAX = 10; /* Define a C constant integer */

/* (safer) */

NOTE

Some compilers will not allow you to use a constant to define the size of an array. They should, but they have not caught up to the standard yet.

The #define directive can only define simple constants. The const statement can define almost any type of C constant, including things like structure classes. For example:

struct box {

int width, height; /* Dimensions of the box in pixels */

};

const box pink_box ={1.0, 4.5};/* Size of a pink box to be used for input */

The #define directive is, however, essential for things like conditional compilation and other specialized uses.

Conditional Compilation

One of the problems programmers have is writing code that can work on many different machines. In theory, C code is portable; in actual practice, different operating systems have little quirks that must be accounted for. For example, this book covers both the MS-DOS/Windows compiler and UNIX C. Although they are almost the same, there are differences, especially when you must access the more advanced features of the operating system.

Another portability problem is caused by the fact that the standard leaves some of the features of the language up to the implementers. For example, the size of an integer is implementation dependent.

The preprocessor, through the use of conditional compilation, allows the programmer great flexibility in changing the code generated. Suppose we want to put debugging code in the program while we are working on it, and then remove it in the production version. We could do so by including the code in a #ifdef/#endif section:

#ifdef DEBUG

printf("In compute_hash, value %d hash %d\n", value, hash);

#endif /* DEBUG */

NOTE

You do not have to put the /* DEBUG */ after the #endif; however, the entry is very useful as a comment.

If the beginning of the program contains the directive:

#define DEBUG /* Turn debugging on */

the printf will be included. If the program contains the directive:

#undef DEBUG /* Turn debugging off */

the printf will be omitted.

Strictly speaking, the #undef DEBUG is unnecessary. If there is no #define DEBUG statement, then DEBUG is undefined. The #undef DEBUG statement is used to indicate explicitly that DEBUG is used for conditional compilation and is now turned off.

The directive #ifndef will cause the code to be compiled if the symbol is not defined:

#ifndef DEBUG

printf("Production code, no debugging enabled\n");

#endif /* DEBUG */

The #else directive reverses the sense of the conditional. For example:

#ifdef DEBUG

printf("Test version. Debugging is on\n");

#else DEBUG

printf("Production version\n");

#endif /* DEBUG */

A programmer may wish to remove a section of code temporarily. One common method is to comment out the code by enclosing it in /* */. This method can cause problems, as shown by the following example:

1: /***** Comment out this section

2: section_report();

3: /* Handle the end of section stuff */

4: dump_table();

5: **** end of commented out section */

This code generates a syntax error for the fifth line. Why?

A better method is to use the #ifdef construct to remove the code:

#ifdef UNDEF

section_report();

/* Handle the end of section stuff */

dump_table();

#endif /* UNDEF */

(Of course, the code will be included if anyone defines the symbol UNDEF; however, anyone who does so should be shot.)

The compiler switch -Dsymbol allows symbols to be defined on the command line. For example, the command:

%cc -DDEBUG -g -o prog prog.c

compiles the program prog.c and includes all the code in between #ifdef DEBUG and #endif DEBUG even though there is no #define DEBUG in the program.

The general form of the option is -Dsymbol or -Dsymbol=value. For example, the following sets MAX to 10:

%cc -DMAX=10 -o prog prog.c

Notice that the programmer can override the command-line options with directives in the program. For example, the directive:

#undef DEBUG

will result in DEBUG being undefined whether or not you use -DDEBUG.

Most C compilers automatically define some system-dependent symbols. For example, Turbo C++ defines the symbols _ _TURBOC_ _ and _ _MSDOS_ _. The ANSI Standard compiler defines the symbol _ _STDC_ _. Most UNIX compilers define a name for the system (i.e., SUN, VAX, Celerity, etc.); however, they are rarely documented. The symbol _ _unix_ _ is always defined for all UNIX machines.

include Files

The #include directive allows the program to use source code from another file.

For example, we have been using the directive:

#include <stdio.h>

in our programs. This directive tells the preprocessor to take the file stdio.h (Standard I/O) and insert it in the program. Files that are included in other programs are called header files. (Most #include directives come at the head of the program.) The angle brackets (<>) indicate that the file is a standard header file. On UNIX, these files are located in /usr/include. On MS-DOS/Windows, they are located in whatever directory was specified at the time the compiler was installed.

Standard include files define data structures and macros used by library routines. For example, printf is a library routine that prints data on the standard output. The FILE structure used by printf and its related routines is defined in stdio.h.

Sometimes the programmer may want to write her own set of include files. Local include files are particularly useful for storing constants and data structures when a program spans several files. They are especially useful for information passing when a team of programmers is working on a single project. (See Chapter 18.)

Local include files may be specified by using double quotes ("") around the file name, for example:

#include "defs.h"

The filename defs.h can be any valid filename. This specification can be a simple file, defs.h; a relative path, ../../data.h; or an absolute path, /root/include/const.h. (On MS-DOS/Windows, you should use backslash (\) instead of slash (/) as a directory separator.)

NOTE

Absolute pathnames should be avoided in #include directives because they make the program very nonportable.

Include files may be nested, and this feature can cause problems. Suppose you define several useful constants in the file const.h. If the files data.h and io.h both include const.h and you put the following in your program:

#include "data.h"

#include "io.h"

you will generate errors because the preprocessor will set the definitions in const.h twice. Defining a constant twice is not a fatal error; however, defining a data structure or union twice is a fatal error and must be avoided.

One way around this problem is to have const.h check to see if it has already been included and does not define any symbol that has already been defined. The directive #ifndef symbol is true if the symbol is not defined. The directive is the reverse of #ifdef.

Look at the following code:

#ifndef _CONST_H_INCLUDED_

/* define constants */

#define _CONST_H_INCLUDED_

#endif /* _CONST_H_INCLUDED_ */

When const.h is included, it defines the symbol _CONST_H_INCLUDED_. If that symbol is already defined (because the file was included earlier), the #ifndef conditional hides all defines so they don’t cause trouble.

NOTE

Anything can be put in a header file. This includes not only definitions and types, but also code, initialization data, and yesterday’s lunch menu. However, good programming practices dictate that you should limit header files to types and definitions only.

Parameterized Macros

So far, we have discussed only simple #defines or macros. But macros can take parameters. The following macro will compute the square of a number:

#define SQR(x) ((x) * (x)) /* Square a number */

NOTE

No spaces must exist between the macro name (SQR) and the parenthesis.

When used, the macro will replace x by the text of the following argument:

SQR(5)expands to ((5) * (5))

Always put parentheses, ( ), around the parameters of a macro. Example 10-8 illustrates the problems that can occur if this rule is not followed.

Example 10-8. sqr/sqr.c

#include <stdio.h>

#define SQR(x) (x * x)

int main()

{

int counter; /* counter for loop */

for (counter = 0; counter < 5; ++counter) {

printf("x %d, x squared %d\n",

counter+1, SQR(counter+1));

}

return (0);

}

Question 10-5: What does Example 10-8 output? Try running it on your machine. Why did it output what it did? Try checking the output of the preprocessor. (Click here for the answer Section 10.7)

The keep-it-simple system of programming tells us to use the increment (++) and decrement (--) operators only on line, by themselves. When used in a macro parameter, they can lead to unexpected results, as illustrated by Example 10-9.

Example 10-9. sqr-i/sqr-i.c

#include <stdio.h>

#define SQR(x) ((x) * (x))

int main()

{

int counter; /* counter for loop */

counter = 0;

while (counter < 5)

printf("x %d square %d\n", counter, SQR(++counter));

return (0);

}

Question 10-6: Why will Example 10-9 not produce the expected output? By how much will the counter go up each time? (Click here for the answer Section 10.7)

Question 10-7: Example 10-10 tells us that we have an undefined variable number, but our only variable name is counter. (Click here for the answer Section 10.7)

Example 10-10. rec/rec.c

#include <stdio.h>

#define RECIPROCAL (number) (1.0 / (number))

int main()

{

float counter; /* Counter for our table */

for (counter = 1.0; counter < 10.0;

counter += 1.0) {

printf("1/%f = %f\n",

counter, RECIPROCAL(counter));

}

return (0);

}

Advanced Features

This book does not cover the complete list of C preprocessor directives. Among the more advanced features are an advanced form of the #if directive for conditional compilations, and the #pragma directive for inserting compiler-dependent commands into a file. See your C reference manual for more information about these features.

Summary

The C preprocessor is a very useful part of the C language. It has a completely different look and feel, though, and it must be treated apart from the main C compiler.

Problems in macro definitions often do not show up where the macro is defined, but result in errors much further down in the program. By following a few simple rules, you can decrease the chances of having problems:

§ Put parentheses ( ) around everything. In particular, they should enclose #define constants and macro parameters.

§ When defining a macro with more than one statement, enclose the code in curly braces ({}).

§ The preprocessor is not C. Don’t use = and ;. Finally, if you got this far, be glad that the worst is over.

Answers

Answer 10-1: After the program has been run through the preprocessor, the printf statement is expanded to look like:

printf("The square of all the parts is %d\n",

7 + 5 * 7 + 5);

The equation 7 + 5 * 7 + 5 evaluates to 47. Put parentheses around all expressions in macros. If you change the definition of ALL_PARTS to:

#define ALL_PARTS (FIRST_PART + LAST_PART)

the program will execute correctly.

Answer 10-2: The preprocessor is a very simple-minded program. When it defines a macro, everything past the identifier is part of the macro. In this case, the definition of MAX is literally = 10. When the for statement is expanded, the result is:

for (counter==10; counter > 0; --counter)

C allows you to compute a result and throw it away. (This will generate a null- effect warning in some compilers.) For this statement, the program checks to see if counter is 10, and then discards the answer. Removing the = from the definition will correct the problem.

Answer 10-3: As with the previous problem, the preprocessor does not respect C syntax conventions. In this case, the programmer used a semicolon (;) to end the statement, but the preprocessor included it as part of the definition for SIZE. The assignment statement for SIZE, when expanded, is:

size = 10; -2;;

The two semicolons at the end do not hurt us, but the one in the middle is the killer. This line tells C to do two things:

§ Assign 10 to size.

§ Compute the value -2 and throw it away (this code results in the null-effect warning). Removing the semicolons will fix the problem.

Answer 10-4: The output of the preprocessor looks like:

void exit();

main() {

int value;

value = 1;

if (value < 0)

printf("Fatal Error:Abort\n");exit(8);

printf("We did not die\n");

return (0);

}

The problem is that two statements follow the if line. Normally, they would be put on two lines. Let’s look at this program properly indented:

#include <stdio.h>

#include <stdlib.h>

main() {

int value; /* a random value for testing */

value = 1;

if (value < 0)

printf("Fatal Error:Abort\n");

exit(8);

printf("We did not die\n");

return (0);

}

With this new format, we can easily determine why we always exit. The fact that there were two statements after the if was hidden from us by using a single preprocessor macro.

The cure for this problem is to put curly braces ({}) around all multistatement macros; for example:

#define DIE {printf("Fatal Error:Abort\n");exit(8);}

Answer 10-5: The program prints:

x 1 x squared 1

x 2 x squared 3

x 3 x squared 5

x 4 x squared 7

x 5 x squared 9

The problem is with the SQR(counter+1) expression. Expanding this expression we get:

SQR(counter+1)

(counter + 1 * counter + 1)

So our SQR macro does not work. Putting parentheses around the parameters solves this problem:

#define SQR(x) ((x) * (x))

Answer 10-6: The answer is that the counter is incremented by two each time through the loop. This incrementation occurs because the macro call:

SQR(++counter)

is expanded to:

((++counter) * (++counter))

Answer 10-7: The only difference between a parameterized macro and one without parameters is the parenthesis immediately following the macro name. In this case, a space follows the definition of RECIPROCAL, so the macro is not parameterized. Instead, it is a simple text-replacement macro that will replace RECIPROCAL with:

(number) (1.0 / number)

Removing the space between RECIPROCAL and (number) will correct the problem.

Programming Exercises

Exercise 10-1: Write a macro that returns TRUE if its parameter is divisible by 10 and FALSE otherwise.

Exercise 10-2: Write a macro is_digit that returns TRUE if its argument is a decimal digit.

Exercise 10-3: Write a second macro is_hex that returns true if its argument is a hex digit (0-9, A-F, a-f). The second macro should reference the first.

Exercise 10-4: Write a preprocessor macro that swaps two integers. (For the real hacker, write one that does not use a temporary variable declared outside the macro.)