The C++ Preprocessor - Becoming a Functional C++Programmer - C++ For Dummies (2014)

C++ For Dummies (2014)

Part II

Becoming a Functional C++Programmer

Chapter 10

The C++ Preprocessor

In This Chapter

arrow Including source files

arrow Defining constants and macros

arrow Enumerating alternatives to constants

arrow Inserting compile-time checks

arrow Simplifying declarations via typedef

You only thought that all you had to learn was C++. It turns out that C++ includes a preprocessor that works on your source files before the “real C++ compiler” ever gets to see it. Unfortunately, the syntax of the preprocessor is completely different than that of C++ itself.

Before you despair, however, let me hasten to add that the preprocessor is very basic and the C++ ’11 standard has added a number of features that make the preprocessor almost unnecessary. Nevertheless, if the conversation turns to C++ at your next Coffee Club meeting, you’ll be expected to understand the preprocessor.

What Is a Preprocessor?

Up until now, you may have thought of the C++ compiler as munching on your source code and spitting out an executable program in one step, but that isn’t quite true.

First, the preprocessor makes a pass through your program looking for preprocessor instructions. The output of this preprocessor step is an intermediate file that has all the preprocessor commands expanded. This intermediate file gets passed to the C++ compiler for processing. The output from the C++ compiler is an object file that contains the machine instruction equivalent to your C++ source code. During the final step, a separate program known as the linker combines a set of standard libraries with your object file (or files, as we’ll see in Chapter 21) to create an executable program. (More on the standard library in the next section of this chapter.)

image Object files normally carry the extension .o. Executable programs always carry the extension .exe in Windows and have no extension under Linux or Mac OS X. Code::Blocks stores the object and executable files in their own folders. For example, if you’ve already built theIntAverage program from Chapter 2, you will have on your hard disk a folder C:\CPP_Programs_from_book\IntAverage\obj\Debug containing main.o and a folder C:\CPP_Programs_from_book\IntAverage\bin\Debug that contains the executable program.

All preprocessor commands start with a # symbol in column 1 and end with the newline.

image Like almost all rules in C++, this rule has an exception. You can spread a preprocessor command across multiple lines by ending the line with a backslash character: \. We won’t have any preprocessor commands that are that complicated, however.

In this book, we’ll be working with three preprocessor commands:

· #include includes the contents of the specified file in place of the #include statement.

· #define defines a constant or macro.

· #if includes a section of code in the intermediary file if the following condition is true.

Each of these preprocessor commands is covered in the following sections.

Including Files

The C++ standard library consists of functions that are basic enough that almost everyone needs them. It would be silly to force every programmer to have to write them for herself. For example, the I/O functions, which we have been using to read input from the keyboard and write out to the console, are contained in the standard library.

However, C++ requires a prototype declaration for any function you call, whether it’s in a library or not (see Chapter 6 if that doesn’t make sense to you). Rather than force the programmer to type all these declarations by hand, the library authors created include files that contain little more than prototype declarations. All you have to do is #include the source file that contains the prototypes for the library routines you intend to use.

Take the following simple example. Suppose I had created a library that contains the trigonometric functions sin(), cosin(), tan(), and a whole lot more. I would likely create an include file mytrig with the following contents to go along with my standard library:

// include prototype declarations for my library
double sin(double x);
double cosin(double x);
double tan(double x);
// ...more prototype declarations...

Any program that wanted to make use of one of these math functions would #include that file, enclosing the name of the include file either in brackets or quotes as in

#include <mytrig>

or

#include "mytrig"

image The difference between the two forms of #include is a matter of where the preprocessor goes to look for the mytrig file. When the file is enclosed in quotes, the preprocessor assumes that the include file is locally grown, so it starts looking for the file in the same directory in which it found the source file. If it doesn’t find the file there, it starts looking in its own include file directories. The preprocessor assumes that include files in angle brackets are from the C++ library, so it skips looking in the source file directory and goes straight to the standard include file folders. Use quotes for any include file that you create and angle brackets for C++ library include files.

Thus, you might write a source file like the following:

// MyProgram - is very intelligent
#include "mytrig"

int main(int nArgc, char* pArguments[])
{
cout << "The sin of .5 is " << sin(0.5) << endl;
return 0;
}

The C++ compiler sees the following intermediary file after the preprocessor gets finished expanding the #include:

// MyProgram - is very intelligent
// include prototype declarations for my library
double sin(double x);
double cosin(double x);
double tan(double x);
// ...more prototype declarations...

int main(int nArgc, char* pArguments[])
{
cout << "The sin of .5 is " << sin(0.5) << endl;
return 0;
}

image Historically, the convention was to end include files with .h. C still uses that standard. However, C++ dropped the extension when it revamped the include file structure. Now, C++ standard include files have no extension.


image Playing in your own name sandbox

(This is truly technical, so feel free to skip this sidebar and come back to it later.) The authors of the C++ standard worry a lot about name collisions. For example, besides my mathematical function log(x) that returns the logarithm of x, suppose in another context I had written a function log(x) that writes status information to a system log. Clearly, two different functions with the same arguments can’t coexist in one program. This is known as a name collision.

To avoid this, C++ allows the programmer to bundle declarations into a namespace using the keyword of the same name:

namespace Mathematics
{
double log(double x)
{
// ...the definition of the function...
}
}
namespace SystemLog
{
int log(double x)
{
// ...log the value to file...
}
}

The namespace becomes part of the extended name of the function. Thus, the following code snippet actually logs the logarithm of a value:

void myFunc(double x)
{
// invoke the logarithm function...
double dl = Mathematics::log(x);

// ...now log it to disk
SystemLog::log(dl);
}

Fortunately, you don’t have to specify the namespace every single time. The keyword using allows the programmer to specify a default namespace for a given function:

using double Mathematics::log(double);
void myFunc(double x)
{
// the default is the mathematics version...
double dl = log(x);

// ...however, the other version is still accessible by
// explicitly specifying the namespace
SystemLog::log(dl);
}

You can automatically default every declaration within a namespace:

using namespace Mathematics;
void myFunc(double x)
{
// look in the Mathematics namespace first...
double dl = log(x);

// ...however, the other version is still accessible by
// explicitly specifying the namespace
SystemLog::log(dl);
}

See the program NamespaceExample in the extras at www.dummies.com/extras/cplusplus for an example of the use of namespaces.

The standard library functions reside in the std namespace; the statement using namespace std; included at the beginning of each of the programs in this book gives the programs access to the standard library functions without the need to specify the namespace explicitly.


#Defining Things

The preprocessor also allows the programmer to #define expressions that get expanded during the preprocessor step. For example, you can #define a constant to be used throughout the program.

image In usage, you pronounce the # sign as “pound,” so you say “pound-define a constant” to distinguish from defining a constant in some other way.

#define TWO_PI 6.2831852

This makes the following statement much easier to understand:

double diameter = TWO_PI * radius;

than the equivalent expression, which is actually what the C++ compiler sees after the preprocessor has replaced TWO_PI with its definition:

double diameter = 6.2831852 * radius;

Another advantage is the ability to #define a constant in one place and use it everywhere. For example, I might include the following #define in an include file:

#define MAX_NAME_LENGTH 512

Throughout the program, I can truncate the names that I read from the keyboard to a common and consistent MAX_NAME_LENGTH. Not only is this easier to read, but it also provides a single place in the program to change should I want to increase or decrease the maximum name length that I choose to process.

The preprocessor also allows the program to #define function-like macros with arguments that are expanded when the definition is used:

#define SQUARE(X) X * X

In use, such macro definitions look a lot like functions:

// calculate the area of a circle
double dArea = HALF_PI * SQUARE(dRadius);

Remember that the C++ compiler actually sees the file generated from the expansion of all macros. This can lead to some unexpected results. Consider the following code snippets (these are all taken from the program MacroConfusion, which is included among the extra programs atwww.dummies.com/extras/cplusplus):

int nSQ = SQUARE(2);
cout << "SQUARE(2) = " << nSQ << endl;

Reassuringly, this generates the expected output:

SQUARE(2) = 4

However, the following lines

int nSQ = SQUARE(1 + 2);
cout << "SQUARE(1 + 2) = " << nSQ << endl;

generate the surprising result

SQUARE(1 + 2) = 5

The preprocessor simply replaced X in the macro definition with 1 + 2. What the C++ compiler actually sees is

int nSQ = 1 + 2 * 1 + 2;

Since multiplication has higher precedence than addition, this is turned into 1 + 2 + 2 which, of course, is 5. This confusion could be solved by liberal use of parentheses in the macro definition:

#define SQUARE(X) ((X) * (X))

This version generates the expected

SQUARE(1 + 2) → ((1 + 2) * (1 + 2)) → 9

However, some unexpected results cannot be fixed no matter how hard you try. Consider the following snippet:

int i = 2;
cout << "i = " << i << endl;
int nSQ = SQUARE(i++);
cout << "SQUARE(i++) = " << nSQ << endl;
cout << "now i = " << i << endl;

This generates the following:

i = 3;
SQUARE(i++) = 9
now i = 5

The value generated by SQUARE is correct, but the variable i has been incremented twice. The reason is obvious when you consider the expanded macro:

int i = 3;
nSQ = i++ * i++;

Since autoincrement has precedence, the two i++ operations are performed first. Both return the current value of i, which is 3. These two values are then multiplied together to return the expected value of 9. However, i is then incremented twice to generate a resulting value of 5.

Okay, how about not #defining things?

The sometimes unexpected results from the preprocessor have created heartburn for the fathers (and mothers) of C++ almost from the beginning. C++ has included features over the years to make most uses of #define unnecessary.

For example, C++ defines the inline function to replace the macro. This looks just like any other function declaration with the addition of the keyword inline tacked to the front:

inline int SQUARE(int x) { return x * x; }

This inline function definition looks very much like the previous macro definition for SQUARE() (I have written this definition on one line to highlight the similarities). However, an inline function is processed by the C++ compiler rather than by the preprocessor. This definition ofSQUARE() does not suffer from any of the strange effects noted previously.

image The inline keyword is supposed to suggest to the compiler that it “expand the function inline” rather than generate a call to some code somewhere to perform the operation. This was to satisfy the speed freaks, who wanted to avoid the overhead of performing a function call compared to a macro definition that generates no such call. The best that can be said is that inline functions may be expanded in place, but then again, they may not. There’s no way to be sure without performing detailed timing analysis or examining the machine code output by the compiler.

image C++ allows programmers to use a variable declared const to take the place of a #define constant so long as the value of the constant is spelled out at compile time:

const int MAX_NAME_LENGTH = 512;
int szName[MAX_NAME_LENGTH];

image The ’11 standard goes so far as to allow you to declare a function to be a constexpr:

constexpr int square(int n1, int n2)
{return n1 * n1 + n2 * n2;}

This makes a declaration like the following legal:

int matrix[square(5)];

However, '11 puts a lot of significant restrictions on what can go into a const expression. For example, such a function is pretty much limited to a single line.

image The '14 standard loosens the rules concerning const expressions quite a bit. In general, a function can be declared a constexpr if all of the sub-expressions can be calculated at compile time.

Enumerating other options

C++ provides a mechanism for defining constants of a separate, user-defined type. Suppose, for example, that I were writing a program that manipulated States of the Union. I could refer to the states by their name, such as “Texas” or “North Dakota.” In practice, this is not convenient since repetitive string comparisons are computationally intensive and subject to error.

I could define a unique value for each state as follows:

#define DC_OR_TERRITORY 0
#define ALABAMA 1
#define ALASKA 2
#define ARKANSAS 3
// ...and so on...

Not only does this avoid the clumsiness of comparing strings; it allows me to use the name of the state as an index into an array of properties such as population:

// increment the population of ALASKA (they need it)
population[ALASKA]++;

A statement such as this is much easier to understand than the semantically identical population[2]++. This is such a common thing to do that C++ allows the programmer to define what’s known as an enumeration:

enum STATE {DC_OR_TERRITORY, // gets 0
ALABAMA, // gets 1
ALASKA, // gets 2
ARKANSAS,
// ...and so on...

Each element of this enumeration is assigned a value starting at 0, so DC_OR_TERRITORY is defined as 0, ALABAMA is defined as 1, and so on. You can override this incremental sequencing by using as assign statement as follows:

enum STATE {DC,
TERRITORIES = 0,
ALABAMA,
ALASKA,
// ...and so on...

This version of STATE defines an element DC, which is given the value 0. It then defines a new element TERRITORIES, which is also assigned the value 0. ALABAMA picks up with 1 just as before.

image The ’11 standard extends enumerations by allowing the programmer to create a user-defined enumerated type as follows (note the addition of the keyword class in the snippet):

enum class STATE {DC,
TERRITORIES = 0,
ALABAMA,
ALASKA,
// ...and so on...

This declaration creates a new type STATE and assigns it 52 members (ALABAMA through WYOMING plus DC and TERRITORIES). The programmer can now use STATE as she would any other variable type. A variable can be declared to be of type STATE:

STATE s = STATE::ALASKA;

Function calls can be differentiated by this new type:

int getPop(STATE s); // return population
int setPop(STATE s, int pop); // set the population

The type STATE is not just another word for int: Arithmetic is not defined for members of type STATE. The following attempt to use STATE as an index into an array is not legal:

int getPop(STATE s)
{
return population[s]; // not legal
}

However, the members of STATE can be converted to their integer equivalent (0 for DC and TERRITORIES, 1 for ALABAMA, 2 for ALASKA, and so on) through the application of a cast:

int getPop(STATE s)
{
return population[(int)s]; // is legal
}

Including Things #if I Say So

The third major class of preprocessor statement is the #if, which is a preprocessor version of the C++ if statement:

#if constexpression
// included if constexpression evaluates to other than 0
#else
// included if constexpression evaluates to 0
#endif

This is known as conditional compilation because the set of statements between the #if and the #else or #endif are included in the compilation only if a condition is true. The constexpression phrase is limited to simple arithmetic and comparison operators. That’s okay because anything more than an equality comparison and the occasional addition is rare.

For example, the following is a common use for #if. I can include the following definition within an include file with a name such as LogMessage:

#if DEBUG == 1
inline void logMessage(const char *pMessage)
{ cout << pMessage << endl; }
#else
#define logMessage(X) (0)
#endif

I can now sprinkle error messages throughout my program wherever I need them:

#define DEBUG 1
#include "LogMessage"
void testFunction(char *pArg)
{
logMessage(pArg);
// ...function continues...

With DEBUG set to 1, the logMessage() is converted into a call to an inline function that outputs the argument to the display. Once the program is working properly, I can remove the definition of DEBUG. Now the references to logMessage() invoke a macro that does nothing.

A second version of the conditional compilation is the #ifdef (which is pronounced “if def”):

#ifdef DEBUG
// included if DEBUG has been #defined
#else
// included if DEBUG has not been #defined
#endif

There is also an #ifndef (pronounced “if not def”), which is the logical reverse of #ifdef.

Intrinsically Defined Objects

C++ defines a set of intrinsic constants, which are shown in Table 10-1. These are constants that C++ thinks are just too cool to be without — and that you would have trouble defining for yourself anyway.

Table 10-1 Predefined Preprocessor Constants

Constant

Type

Meaning

__FILE__

const char const *

The name of the source file.

__LINE__

const int

The current line number.

__func__

const char const *

The name of the current function (C++ ’11 only).

__DATE__

const char const *

The current date.

__TIME__

const char const *

The current time.

__TIMESTAMP__

const char const *

The current date and time.

__STDC__

int

Set to 1 if the C++ compiler is compliant with the standard.

__cplusplus

int

Set to 1 if the compiler is a C++ compiler (as opposed to a C compiler). This allows include files to be shared across environments.

These internal macros are particularly useful when generating error messages. You would think that C++ generates plenty of error messages on its own and doesn’t need any more help, but sometimes you want to create your own compiler errors. For you, C++ offers not one, not two, but three options: #error, assert(), and static_assert(). Each of these three mechanisms works slightly differently.

The #error command is a preprocessor directive (as you can tell by the fact that it starts with the # sign). It causes the preprocessor to stop and output a message. Suppose that your program just won’t work with anything but standard C++. You could add the following to the beginning of your program:

#if !__cplusplus || !__STDC__
#error This is a standard C++ program.
#endif

Now if someone tries to compile your program with anything other than a C++ compiler that strictly adheres to the standards, she will get a single neat error message rather than a raft of potentially meaningless error messages from a confused non-standard compiler.

Unlike #error, assert() performs its test when the resulting program is executed. For example, suppose that I had written a factorial program that calculates N * (N - 1) * (N - 2) and so on down to 1 for whatever N I pass it. Factorial is only defined for positive integers; passing a negative number to a factorial is always a mistake. To be careful, I should add a test for a non-positive value at the beginning of the function:

int factorial(int N)
{
assert(N > 0);
// ...program continues...

The program now checks the argument to factorial() each time it is called. At the first sign of negativity, assert() halts the program with a message to the operator that the assertion failed, along with the file and line number.

Liberal use of assert() throughout your program is a good way to detect problems early during development, but constantly testing for errors that have already been found and removed during testing slows the program needlessly. To avoid this, C++ allows the programmer to “remove” the tests when creating the version of the program to be shipped to users: #define the constant NDEBUG (for “not debug mode”). This causes the preprocessor to convert all the calls to assert() in your module to “do nothings” (universally known as NO-OPs).

image The preprocessor cannot perform certain compile-time tests. For example, suppose that your program works properly only if the default integer size is 32 bits. The preprocessor is of no help since it knows nothing about integers or floating points. To address this situation, C++ introduced the keyword static_assert(), which is interpreted by the compiler (rather than the preprocessor). It accepts two arguments: a const expression and a string, as in the following example:

static_assert(sizeof(int) == 4, "int is not 32-bits.");

If the const expression evaluates to 0 or false during compilation, the compiler outputs the string and stops. The static_assert() does not generate any run-time code. Remember, however, that the expression is evaluated at compile time, so it cannot contain function calls or references to things that are known only when the program executes.

Typedef

The typedef keyword allows the programmer to create a shorthand name for a declaration. The careful application of typedef can make the resulting program easier to read. (Note that typedef is not actually a preprocessor command, but it’s largely associated with include files and the preprocessor.)

typedef int* IntPtr;
typedef const IntPtr IntConstPtr;

int i;
int *const ptr1 = &i;
IntConstPtr ptr2= ptr1; // ptr1 and ptr2 are the same type

The first two declarations in this snippet give a new name to existing types. Thus, the second declaration declares IntConstPtr to be another name for int const*. When this new type is used in the declaration of ptr2, it has the same effect as the more complicated declaration of ptr1.

Although typedef does not introduce any new capability, it can make some complicated declarations a lot easier to read.