Taking a First Look at C++ Pointers - Becoming a Functional C++Programmer - C++ For Dummies (2014)

C++ For Dummies (2014)

Part II

Becoming a Functional C++Programmer

Chapter 8

Taking a First Look at C++ Pointers

In This Chapter

arrow Addressing variables in memory

arrow Declaring and using pointer variables

arrow Recognizing the inherent dangers of pointers

arrow Passing pointers to functions

arrow Allocating objects off the heap (whatever that is)

So far, the C++ language has been fairly conventional compared with other programming languages. Sure, some computer languages lack (il-)logical operators like those in Chapter 4, and C++ has its own unique symbols for things, but there’s been nothing new in the way of concepts. C++ really separates itself from the crowd in its use of pointer variables. A pointer is a variable that “points at” other variables. I realize that’s a circular argument, but suspend your disbelief at least until you can get into the chapter.

This chapter introduces the pointer variable type. It begins with some concept definitions, flows through pointer syntax, and then introduces some of the reasons for the pointer mania that grips the C++ programming world.

Variable Size

My weight goes up and down all the time, but here I’m really referring to the size of a variable, not my own variable size. Memory is measured in bytes or bits. The keyword sizeof returns the size of its argument in bytes. The following program uses this to determine the size of the different variable types:

// VariableSize - output the size of each type of variable
#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;

int main(int nNumberofArgs, char* pszArgs[])
{
bool b; char c; int n; long l;
long long ll; float f; double d; long double ld;

cout << "sizeof a bool = " << sizeof b << endl;
cout << "sizeof a char = " << sizeof c << endl;
cout << "sizeof an int = " << sizeof n << endl;
cout << "sizeof a long = " << sizeof l << endl;
cout << "sizeof a long long = " << sizeof ll<< endl;
cout << "sizeof a float = " << sizeof f << endl;
cout << "sizeof a double = " << sizeof d << endl;
cout << "sizeof a long double = " << sizeof ld<< endl;

// wait until user is ready before terminating program
// to allow the user to see the program results
cout << "Press Enter to continue..." << endl;
cin.ignore(10, '\n');
cin.get();
return 0;
}

The VariableSize program generates the following output:

sizeof a bool = 1
sizeof a char = 1
sizeof an int = 4
sizeof a long = 4
sizeof a long long = 8
sizeof a float = 4
sizeof a double = 8
sizeof a long double = 12
Press Enter to continue...

image As they say, “Your results may vary.” You may get different results if using a compiler other than gcc for Windows. For example, you may find that an int is smaller than a long. C++ doesn’t say exactly how big a variable type must be; it just says that a long is the same size as or larger than an int and that a double is the same size as or larger than a float. The sizes shown here are typical for a 32-bit 80-x-86 processor.

What’s in an Address?

Like the saying goes, “Everyone has to be somewhere.” Every C++ variable is stored somewhere in the computer’s memory. Memory is broken into individual bytes, with each byte carrying its own address numbered 0, 1, 2, and so on.

A variable intReader might be at address 0x100, whereas floatReader might be over at location 0x180. (By convention, memory addresses are expressed in hexadecimal.) Of course, intReader and floatReader might be somewhere else in memory entirely — only the computer knows for sure and only at the time that the program is executed.

This is somewhat analogous to a hotel. When you make your reservation, you may be assigned room 0x100. (I know that suite numbers are normally not expressed in hexadecimal, but bear with me.) Your buddy may be assigned 80 doors down in room 0x180. Each variable is assigned an address when it is created (more on that in this chapter when we talk about scope).

Address Operators

The two pointer-related operators are shown in Table 8-1. The & operator says “tell me your address,” and * says “the value at the following address.”

Table 8-1 Pointer Operators

Operator

Meaning

& (unary)

(In an expression) the address of

& (unary)

(In a declaration) reference to

* (unary)

(In an expression) the thing pointed at by

* (unary)

(In a declaration) pointer to

image These are not to be confused with the binary & and * operators discussed in Chapters 3 and 4.

The following Layout program demonstrates how the & operator can be used to display the layout of variables in memory:

// Layout - this program tries to give the
// reader an idea of the layout of
// local memory in her compiler
#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;

int main(int nNumberofArgs, char* pszArgs[])
{
int start;
int n; long l; long long ll;
float f; double d; long double ld;
int end;

// set output to hex mode
cout.setf(ios::hex);
cout.unsetf(ios::dec);

// output the address of each variable
// in order to get an idea of how variables are
// laid out in memory
cout << "--- = " << &start << endl;
cout << "&n = " << &n << endl;
cout << "&l = " << &l << endl;
cout << "&ll = " << &ll << endl;
cout << "&f = " << &f << endl;
cout << "&d = " << &d << endl;
cout << "&ld = " << &ld << endl;
cout << "--- = " << &end << endl;

// wait until user is ready before terminating program
// to allow the user to see the program results
cout << "Press Enter to continue..." << endl;
cin.ignore(10, '\n');
cin.get();
return 0;
}

The program declares a set of variables of different types. It then applies the & operator to each one to find out its address. The results of one execution of this program with Code::Blocks appear as follows:

--- = 0x28fefc
&n = 0x28fef8
&l = 0x28fef4
&ll = 0x28fee8
&f = 0x28fee4
&d = 0x28fed8
&ld = 0x28fec0
--- = 0x28febc
Press Enter to continue...

image Your results may vary. The absolute address of program variables depends on a lot of factors. The C++ standard certainly doesn’t specify how variables are to be laid out in memory.

Notice how the variable n is exactly 4 bytes from the first variable declared (start), which corresponds to the size of an int (4 bytes). Similarly, the variable l appears 4 bytes down from that, which is also the size of a long. However, the float variable f is a full 12 bytes from its neighboring variable d (0x28fee4 - 0x28fed8 = 0x000c). That’s way more than the 4 bytes required for a float.

image There is no requirement that the C++ compiler pack variables in memory with no spaces between them. In fact, you often see these gaps in memory when mixing variables of different size.

image The Code::Blocks/gcc compiler could be storing variables for its own use in between our variables. Or, more likely, a peculiarity in the way the variables are being laid out in memory is causing the compiler to waste a small amount of space.

Using Pointer Variables

A pointer variable is a variable that contains an address, usually the address of another variable. Returning to the analogy of hotel room numbers, I might tell my son that I will be in room 0x100 on my trip. My son can act as a pointer variable of sorts. Anyone can ask him at any time, “Where’s your father staying?” Include $5 with that question, and he’ll spill his guts without hesitation.

By the way, notice something about pointer variables: No matter where my son is, and no matter how many other people he tells of my whereabouts, I’m still in room 0x100.

The following pseudo-C++ demonstrates how the two address operators shown in Table 8-1 are used:

mySon = &DadsRoom; // tell mySon the address of Dad's Room
room = *mySon; // "Dad's room number is"

The following C++ code snippet shows these operators used correctly:

void fn()
{
int nVar;
int* pnVar;

pnVar = &nVar; // pnVar now points to nVar
*pnVar = 10; // stores 10 into the int location
} // pointed at by pnVar

The function fn() begins with the declaration of nVar. The next statement declares the variable pnVar to be a variable of type pointer to an int.

Pointer variables are declared like normal variables except for the addition of the unary * character. This * character can appear anywhere between the base type name — the following two declarations are equivalent:

int* pnVar1;
int *pnVar2;

Which you use is a matter of personal preference.

image The * character is called the asterisk character (that’s logical enough), but because asterisk is hard to say, many programmers have come to call it the star or, less commonly, the splat character. Thus, they would say “star pnVar” or “splat pnVar.”

In an expression, the unary operator & means “the address of.” Thus, we would read the assignment pnVar = &nVar; as “pnVar gets the address of nVar.”

Using different types of pointers

Every expression has a type as well as a value. The type of the expression nVar is int; the type of &nVar is “pointer to an integer,” written int*. Comparing this with the declaration of pVar, you see that the types match exactly:

int* pnVar = &nVar; // both sides of the assignment
// are of type int*

Similarly, because pnVar is of type int*, the type of *pnVar is int:

*pnVar = 10; // both sides of the assignment are
// of type int

The type of the thing pointed to by pnVar is int. This is equivalent to saying that if houseAddress is the address of a house, the thing pointed at by houseAddress must be a house. Amazing, but true.

Pointers to other types of variables are expressed the same way:

double doubleVar;
double* pdoubleVar = &doubleVar;
*pdoubleVar = 10.0;

A pointer on a Pentium class machine takes 4 bytes no matter what it points to. That is, an address on a Pentium is 4 bytes long, period.

Passing Pointers to Functions

One of the uses of pointer variables is in passing arguments to functions. To understand why this is important, you need to understand how arguments are passed to a function. (I touched on this in Chapter 6, but you're now in a much better place to understand this armed with your new understanding of pointers.)

Passing by value

By default, arguments are passed to functions by value. This has the somewhat surprising result that changing the value of a variable in a function does not normally change its value in the calling function. Consider the following example code segment:

void fn(int nArg)
{
nArg = 10;
// value of nArg at this point is 10
}

void parent(void)
{
int n1 = 0;
fn(n1);
// value of n1 at this point is still 0
}

Here the parent() function initializes the integer variable n1 to 0. The value of n1 is then passed to fn(). Upon entering the function, nArg is equal to 0, the value passed. fn() changes the value of nArg to 10 before returning to parent(). Upon returning to parent(), the value of n1 is still 0.

The reason for this behavior is that C++ doesn’t pass a variable to a function. (I’m not even sure what that would mean.) Instead, C++ passes the value contained in the variable at the time of the call. That is, the expression is evaluated, even if it is just a variable name, and the result is passed.

In the example, the value of n1, which is 0, was passed to fn(). What the function does with that value has no effect on n1.

Passing pointer values

Like any other intrinsic type, a pointer may be passed as an argument to a function:

void fn(int* pnArg)
{
*pnArg = 10;
}

void parent(void)
{
int n = 0;

fn(&n); // this passes the address of i
// now the value of n is 10
}

In this case, the address of n is passed to the function fn() rather than the value of n. The significance of this difference is apparent when you consider the assignment within fn().

Suppose n is located at address 0x100. Rather than the value 10, the call fn(&n) passes the value 0x100. Within fn(), the assignment *pnArg = 10 stores the value 10 in the int variable located at location 0x100, thereby overwriting the value 0. Upon returning to parent(), the value of n is 10 because n is just another name for 0x100.

Passing by reference

C++ provides a shorthand for passing arguments by address — a shorthand that enables you to avoid having to hassle with pointers. The following declaration creates a variable n1 and a second reference to the same n1 but with a new name, nRef:

int n1; // declare an int variable
int& nRef = n1; // declare a second reference to n1

nRef = 1; // now accessing the reference
// has the same effect as accessing n1;
// n1 is now equal to 1

A reference variable like nRef must be initialized when it is declared because every subsequent time that its name is used, C++ will assume that you mean the variable that nRef refers to.

Reference variables find their primary application in function calls:

void fn(int& rnArg)// declare reference argument
{
rnArg = 10; // change the value of the variable...
} //...that rnArg refers to

void parent(void)
{
int n1 = 0;
fn(n1); // pass a reference to n1
// here the value of n1 is 10
}

This is called passing by reference. The declaration int& rnArg declares rnArg to be a reference to an integer argument. The fn() function stores the value 10 into the int location referenced by rnArg.

image Passing by reference is the same as passing the address of a variable. The reference syntax puts the onus on C++ to apply the “address of” operator to the reference rather than requiring the programmer to do so.

image You cannot overload a pass by value function with its pass by reference equivalent. Thus, you could not define the two functions fn(int) and fn(int&) in the same program. C++ would not know which one to call.

Constant const Irritation

The keyword const means that a variable cannot be changed once it has been declared and initialized.

const double PI = 3.1415926535;

Arguments to functions can also be declared const, meaning that the argument cannot be changed within the function. However, this introduces an interesting dichotomy in the case of pointer variables. Consider the following declaration:

const int* pInt;

Exactly what is the constant here? What can we not change? Is it the variable pInt or the integer pointed at by pInt? It turns out that both are possible, but this declaration declares a variable pointer to a constant memory location. Thus the following:

const int* pInt; // declare a pointer to a const int
int nVar;
pInt = &nVar; // this is allowed
*pInt = 10; // but this is not

We can change the value of pInt, for example, assigning it the address of nVar. But the final assignment in the example snippet generates a compiler error since we cannot change the const int pointed at by pInt.

What if I had intended to create a pointer variable with a constant value? The following snippet shows this in action:

int nVar;
int * const cpInt = &nVar; // declare a constant pointer
// to a variable integer
*cpInt = 10; // now this is legal...
cpInt++; // ...but this is not

The variable cpInt is a constant pointer to a variable int. The programmer cannot change the value of the pointer, but she can change the value of the integer pointed at.

The const-ness can be added via an assignment or initialization but cannot be (readily) cast away. Thus, the following:

int nVar = 10;
int pVar = &nVar;
const int* pcVar = pVar; // this is legal
int* pVar2 = pcVar; // this is not

The assignment pcVar = pVar; is okay — this is adding the const restriction. The final assignment in the snippet is not allowed since it attempts to remove the const-ness restriction of pcVar.

A variable can be implicitly recast as part of a function call, as in the following example:

void fn(const int& nVar);

void mainFn()
{
int n;

fn(10); // calls fn(const int&)
fn(n); // calls the same function by treating n
} // as if it were const

The declaration fn(const int&) says that the function fn() does not modify the value of its argument. That’s important when passing a reference to the constant 10. It isn’t important when passing a reference to the variable n, but it doesn’t hurt anything either.

Finally, const can be used as a discriminator between functions of the same name:

void fn(const int& nVar);
void fn(int& nVar);

void mainFn()
{
int n;

fn(10); // calls the first function
fn(n); // calls the second function
}

Making Use of a Block of Memory Called the Heap

The heap is an amorphous block of memory that your program can access as necessary. This section describes why it exists and how to use it.

Just as it is possible to pass a pointer to a function, it is possible for a function to return a pointer. A function that returns the address of a double is declared as follows:

double* fn(void);

However, you must be very careful when returning a pointer. To understand the dangers, you must know something about variable scope. (No, I don’t mean a variable zoom rifle scope.)

Limited scope

Besides being a mouthwash, scope is the range over which a variable is defined. Consider the following code snippet:

// the following variable is accessible to
// all functions and defined as long as the
// program is running(global scope)
int intGlobal;

// the following variable intChild is accessible
// only to the function and is defined only
// as long as C++ is executing child() or a
// function which child() calls (function scope)
void child(void)
{
int intChild;
}

// the following variable intParent has function
// scope
void parent(void)
{
int intParent = 0;
child();

int intLater = 0;
intParent = intLater;
}

int main(int nArgs, char* pArgs[])
{
parent();
}

This program fragment starts with the declaration of a variable intGlobal. This variable exists from the time the program begins executing until it terminates. We say that intGlobal “has program scope.” We also say that the variable “goes into scope” even before the function main() is called.

The function main() immediately invokes parent(). The first thing that the processor sees in parent() is the declaration of intParent. At that point, intParent goes into scope — that is, intParent is defined and available for the remainder of the function parent().

The second statement in parent() is the call to child(). Once again, the function child() declares a local variable, this time intChild. The scope of the variable intChild is limited to the function child(). Technically, intParent is not defined within the scope of child() because child() doesn’t have access to intParent; however, the variable intParent continues to exist while child() is executing.

When child() exits, the variable intChild goes out of scope. Not only is intChild no longer accessible, it no longer exists. (The memory occupied by intChild is returned to the general pool to be used for other things.)

As parent() continues executing, the variable intLater goes into scope at the declaration. At the point that parent() returns to main(), both intParent and intLater go out of scope.

Because intGlobal is declared globally in this example, it is available to all three functions and remains available for the life of the program.

Examining the scope problem

The following code segment compiles without error but doesn’t work (don’t you just hate that?):

double* child(void)
{
double dLocalVariable;
return &dLocalVariable;
}

void parent(void)
{
double* pdLocal;
pdLocal = child();
*pdLocal = 1.0;
}

The problem with this function is that dLocalVariable is defined only within the scope of the function child(). Thus, by the time the memory address of dLocalVariable is returned from child(), it refers to a variable that no longer exists. The memory that dLocalVariable formerly occupied is probably being used for something else.

image This error is very common because it can creep up in a number of ways. Unfortunately, this error does not cause the program to instantly stop. In fact, the program may work fine most of the time — that is, the program continues to work as long as the memory formerly occupied by dLocalVariable is not reused immediately. Such intermittent problems are the most difficult ones to solve.

Providing a solution using the heap

The scope problem originated because C++ took back the locally defined memory before the programmer was ready. What is needed is a block of memory controlled by the programmer. She can allocate the memory and put it back when she wants to — not because C++ thinks it’s a good idea. Such a block of memory is called the heap.

Heap memory is allocated using the new keyword followed by the type of object to allocate. The new command breaks a chunk of memory off the heap big enough to hold the specified type of object and returns its address. For example, the following allocates a double variable off the heap:

double* child(void)
{
double* pdLocalVariable = new double;
return pdLocalVariable;
}

This function now works properly. Although the variable pdLocalVariable goes out of scope when the function child() returns, the memory to which pdLocalVariable refers does not. A memory location returned by new does not go out of scope until it is explicitly returned to the heap using the keyword delete, which is specifically designed for that purpose:

void parent(void)
{
// child() returns the address of a block
// of heap memory
double* pdMyDouble = child();

// store a value there
*pdMyDouble = 1.1;

// ...

// now return the memory to the heap
delete pdMyDouble;
pdMyDouble = 0;

// ...
}

Here the pointer returned by child() is used to store a double value. After the function is finished with the memory location, it is returned to the heap. The function parent() sets the pointer to 0 after the heap memory has been returned — this is not a requirement, but it is a very good idea. If the programmer mistakenly attempts to store something in * pdMyDouble after the delete, the program will crash immediately with (I hope) a meaningful error message.

You can use new to allocate arrays from the heap as well, but you must return an array using the delete[] keyword:

int* nArray = new int[10];

nArray[0] = 0;

delete[] nArray;

image Technically new int[10] invokes the new[] operator but it works the same as new.

I have more to say about the relationship between pointers and arrays in Chapter 9