Taking a Second Look at C++ Pointers - Becoming a Functional C++Programmer - C++ For Dummies (2014)

C++ For Dummies (2014)

Part II

Becoming a Functional C++Programmer

Chapter 9

Taking a Second Look at C++ Pointers

In This Chapter

arrow Performing arithmetic operations on character pointers

arrow Examining the relationship between pointers and arrays

arrow Increasing program performance

arrow Extending pointer operations to different pointer types

arrow Explaining the arguments to main() in our C++ program template

C++ allows the programmer to operate on pointer variables much as she would on simple types of variables. (The concept of pointer variables is introduced in Chapter 8.) How and why this is done, along with its implications, are the subjects of this chapter.

Defining Operations on Pointer Variables

Some of the same arithmetic operators I cover in Chapter 3 can be applied to pointer types. This section examines the implications of applying these operators both to pointers and to the array types (I discuss arrays in Chapter 7). Table 9-1 lists the three fundamental operations that are defined on pointers. In Table 9-1, pointer, pointer1, and pointer2 are all of some pointer type, say char*;, and offset is an integer, for example, long. C++ also supports the other operators related to addition and subtraction, such as ++ and +=., although they are not listed in Table 9-1.

Table 9-1 The Three Basic Operations Defined on Pointer Types

Operation

Result

Meaning

pointer + offset

pointer

Calculate the address of the object offset entries from pointer.

pointer - offset

pointer

The opposite of addition.

pointer2 - pointer1

offset

Calculate the number of entries between pointer2 and pointer1.

The neighborhood memory model is useful to explain how pointer arithmetic works. Consider a city block in which all houses are numbered sequentially. The house at 123 Main Street has 122 Main Street on one side and 124 Main Street on the other.

Now it’s pretty clear that the house four houses down from 123 Main Street must be 127 Main Street; thus, you can say 123 Main + 4 = 127 Main. Similarly, if I were to ask how many houses there are from 123 Main to 127 Main, the answer would be four — 127 Main - 123 Main = 4.(Just as an aside, a house is zero houses from itself: 123 Main - 123 Main = 0.)

But it makes no sense to ask how far away from 123 Main Street is 4 or what the sum of 123 Main and 127 Main is. In similar fashion, you can’t add two addresses. Nor can you multiply an address, divide an address, square an address, or take the square root — you get the idea. You can perform any operation that can be converted to addition or subtraction. For example, if you increment a pointer to 123 Main Street, it now points to the house next door (at 124 Main, of course!).

Reexamining arrays in light of pointer variables

Now return to the wonderful array for just a moment. Consider the case of an array of 32 1-byte characters called charArray. If the first byte of this array is stored at address 0x100, the array will extend over the range 0x100 through 0x11f. charArray[0] is located at address 0x100,charArray[1] is at 0x101, charArray[2] at 0x102, and so on.

After executing the expression

char* ptr = &charArray[0];

the pointer ptr contains the address 0x100. The addition of an integer offset to a pointer is defined such that the relationships shown in Table 9-2 are true. Table 9-2 also demonstrates why adding an offset n to ptr calculates the address of the nth element in charArray.

Table 9-2 Adding Offsets

Offset

Result

Is the Address Of

+ 0

0x100

charArray[0]

+ 1

0x101

charArray[1]

+ 2

0x102

charArray[2]

+ n

0x100 + n

charArray[n]

The addition of an offset to a pointer is identical to applying an index to an array.

Thus, if

char* ptr = &charArray[0];

then

*(ptr + n) ← corresponds with → charArray[n]

image Because * has higher precedence than addition, * ptr + n adds n to the character that ptr points to. The parentheses are needed to force the addition to occur before the indirection. The expression *(ptr + n) retrieves the character pointed at by the pointer ptr plus the offset n.

In fact, the correspondence between the two forms of expression is so strong that C++ considers array[n] nothing more than a simplified version of *(ptr + n), where ptr points to the first element in array.

array[n] -- C++ interprets as → *(&array[0] + n)

To complete the association, C++ takes a second shortcut. If given

char charArray[20];

charArray is defined as &charArray[0];. That is, the name of an array without a subscript present is the address of the array itself. Thus, you can further simplify the association to

array[n] -- C++ interprets as → *(array + n)

image The type of charArray is actually char const*; that is, “constant pointer to a character” since its address cannot be changed.

Applying operators to the address of an array

The correspondence between indexing an array and pointer arithmetic is useful. For example, a displayArray() function used to display the contents of an array of integers can be written as follows:

// displayArray - display the members of an
// array of length nSize
void displayArray(int intArray[], int nSize)
{
cout << "The value of the array is:\n";

for(int n = 0; n < nSize; n++)
{
cout << n << ": " << intArray[n] << "\n";
}
cout << endl;
}

This version uses the array operations with which you are familiar. A pointer version of the same appears as follows:

// displayArray - display the members of an
// array of length nSize
void displayArray(int intArray[], int nSize)
{
cout << "The value of the array is:\n";

// initialize the pointer pArray with the
// the address of the array intArray
int* pArray = intArray;
for(int n = 0; n < nSize; n++, pArray++)
{
cout << n << ": " << *pArray << "\n";
}
cout << endl;
}

The new displayArray() begins by creating a pointer to an integer pArray that points at the first element of intArray.

image The name intArray by itself is of type int* and refers to the address of the array.

The function then loops through each element of the array. On each loop, displayArray() outputs the current integer (that is, the integer pointed at by pArray) before incrementing the pointer to the next entry in intArray. displayArray() can be tested using the following version of main():

int main(int nNumberofArgs, char* pszArgs[])
{
int array[] = {4, 3, 2, 1};
displayArray(array, 4);

// wait until user is ready before terminating program
// to allow the user to see the program results
cout << "Press Enter to continue..." << endl;
cin.ignore(10, '\n');
cin.get();
return 0;
}

The output from this program is

The value of the array is:
0: 4
1: 3
2: 2
3: 1

Press Enter to continue...

You may think this pointer conversion is silly; however, the pointer version of displayArray() is actually more common than the array version among C++ programmers in the know. For some reason, C++ programmers don’t seem to like arrays, but they love pointer manipulation.

The use of pointers to access arrays is nowhere more common than in the accessing of character arrays.

Expanding pointer operations to a string

A null-terminated string is simply a constant character array whose last character is a null. C++ uses the null character at the end to serve as a terminator. This null-terminated array serves as a quasi-variable type of its own. (See Chapter 7 for an explanation of null-terminated string arrays.) Often C++ programmers use character pointers to manipulate such strings. The following code examples compare this technique to the earlier technique of indexing in the array.

Character pointers enjoy the same relationship with a character array that any other pointer and array share. However, the fact that strings end in a terminating null makes them especially amenable to pointer-based manipulation, as shown in the following DisplayString program:

// DisplayString - display an array of characters both
// using a pointer and an array index
#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;

int main(int nNumberofArgs, char* pszArgs[])
{
// declare a string
const char* szString = "Randy";
cout << "The array is '" << szString << "'" << endl;

// display szString as an array
cout << "Display the string as an array: ";
for(int i = 0; i < 5; i++)
{
cout << szString[i];
}
cout << endl;

// now using typical pointer arithmetic
cout << "Display string using a pointer: ";
const char* pszString = szString;
while(*pszString)
{
cout << *pszString;
pszString++;
}
cout << endl;

// wait until user is ready before terminating program
// to allow the user to see the program results
cout << "Press Enter to continue..." << endl;
cin.ignore(10, '\n');
cin.get();
return 0;
}

The program first makes its way through the array szString by indexing into the array of characters. The for loop chosen stops when the index reaches 5, the length of the string.

The second loop displays the same string using a pointer. The program sets the variable pszString equal to the address of the first character in the array. It then enters a loop that will continue until the char pointed at by pszString is equal to false — in other words, until the character is anull.

image The integer value 0 is interpreted as false — all other values are true.

The program outputs the character pointed at by pszString and then increments the pointer so that it points to the next character in the string before being returned to the top of the loop.

image The dereference and increment can be (and usually are) combined into a single expression as follows:

cout << *pszString++;

The output of the program appears as follows:

The array is 'Randy'
Display the string as an array: Randy
Display string using a pointer: Randy
Press Enter to continue...

Justifying pointer-based string manipulation

The sometimes-cryptic nature of pointer-based manipulation of character strings might lead the reader to wonder, “Why?” That is, what advantage does the char* pointer version have over the easier-to-read index version?

The answer is partially (pre-)historic and partially human nature. When C, the progenitor to C++, was invented, compilers were pretty simplistic. These compilers could not perform the complicated optimizations that modern compilers can. As complicated as it might appear to the human reader, a statement such as *pszString++ could be converted into an amazingly small number of machine-level instructions even by a stupid compiler.

Older computer processors were not very fast by today’s standards. In the early days of C, saving a few computer instructions was a big deal. This gave C a big advantage over other languages of the day, notably Fortran, which did not offer pointer arithmetic.

In addition to the efficiency factor, programmers like to generate clever program statements. After C++ programmers learn how to write compact and cryptic but efficient statements, there is no getting them back to accessing arrays with indices.

image Do not generate complex C++ expressions to create a more efficient program. There is no obvious relationship between the number of C++ statements and the number of machine instructions generated.

Applying operators to pointer types other than char

It is not too hard to convince yourself that szTarget + n points to szTarget [n] when szTarget is an array of chars. After all, a char occupies a single byte. If szTarget is stored at 0x100, szTarget[5] is located at 0x105.

It is not so obvious that pointer addition works in exactly the same way for an int array because an int takes 4 bytes for each char’s 1 byte (at least it does on a 32-bit Intel processor). If the first element in intArray were located at 0x100, then intArray[5] would be located at 0x114 (0x100 + (5 * 4) = 0x114) and not 0x104.

Fortunately for us, array + n points at array[n] no matter how large a single element of array might be. C++ takes care of the element size for us — it’s clever that way.

Once again, the dusty old house analogy works here as well. (I mean dusty analogy, not dusty house.) The third house down from 123 Main is 126 Main, no matter how large the buildings might be, whether they're bungalows or mansions.


Strings have me constantly confused

You may have noticed that I slipped a const declaration into the earlier DisplayString example program. This was necessary to account for differences between an array and a pointer. A string such as “this is a string” is considered a constant address of a string of constant characters. In other words, neither the address of the string nor the characters themselves can be changed. Why is that?

One problem is that you don’t know where C++ stores its local strings nor do you know how many times it reuses the same string. Often C++ stores constant strings in the same memory locations as source code, and it very often reuses the same string in several places in the program. For this reason, C++ often marks constant strings as unwritable.

The initialization of a pointer variable is similar to initializing any other simple variable:

int i = 1;
const char* pString = "this is a string";

Both declarations initialize the variable on the left with the constant value on the right. However, since pString points directly at the immutable string “this is a string” it’s important that pString be declared const char*, that is, a pointer to constant characters.

The equivalent array is more complicated than it first appears:

char sChars[] = "this is a string"; // declare and init array

This declares and allocates memory for an array sChars[] and then copies the initialization string into it. Thus, the letter t that is the first character in sChars is not the same letter t that makes up the immutable initialization string.

In fact, the preceding is shorthand for the more long-winded but descriptive

char sChars[17]; // declare the array and...
strcpy(sChars, "this is a string"); // ...then initialize it

Remember that strcpy() copies the string of characters represented by the second argument into the array pointed at by the first argument. And also remember to allocate space for the terminating null.


Contrasting a pointer with an array

There are some differences between an array and a pointer. For one, the array allocates space for the data, whereas the pointer does not, as shown here:

void arrayVsPointer()
{
// allocate storage for 128 characters
char charArray[128];

// allocate space for a pointer but not for
// the thing pointed at
char* pArray;
}

Here charArray allocates room for 128 characters. pArray allocates only 4 bytes — the amount of storage required by a pointer.

Consider the following example:

char charArray[128];
charArray[10] = '0'; // this works fine

char* pArray;
pArray[10] = '0'; // this writes into random location

The expression pArray[10] is syntactically equivalent to charArray[10], but pArray has not been initialized so pArray[10] references some random (garbage) location in memory.

image The mistake of referencing memory with an uninitialized pointer variable is generally caught by the CPU when the program executes, resulting in the dreaded segment violation error that from time to time issues from your favorite applications under your favorite, or not-so-favorite, operating system. This problem is not generally the fault of the processor or the operating system, but of the application.

image Another implication of this difference is that you can use a range-based for loop on an array where the size of the array is known but not on a pointer where the number of elements is not known:

char charArray[128];
for(char& c : charArray) { c = '\0';} // initialize array

char* pArray = charArray;
for(char& c : pArray) {c = '\0';} //not legal

The first range-based for loop can be used to initialize the charArray to null characters. The second for loop does not compile, however. Even though pArray is assigned the address of the character array with its 128 characters, C++ doesn't keep that size information with the pointer, so it doesn't know how far to iterate in the range-based for loop. (See Chapter 5 for a description of the range-based for loop.)

A second difference between a pointer and the address of an array is that charArray is a constant, whereas pArray is not. Thus, the following for loop used to initialize the array charArray does not work:

char charArray[10];
for (int i = 0; i < 10; i++)
{
*charArray = '\0'; // this makes sense...
charArray++; // ...this does not
}

The expression charArray++ makes no more sense than 10++. The following version is correct:

char charArray[10];
char* pArray = charArray;
for (int i = 0; i < 10; i++)
{
*pArray = '\0'; // this works great
pArray++; // this is ok - not a const pointer
}

When Is a Pointer Not?

C++ is completely quiet about what is and isn’t a legal address, with one exception. C++ predefines the constant nullptr with the following properties:

· It is a constant value.

· It can be assigned to any pointer type.

· It evaluates to false.

· It is never a legal address.

The constant nullptr is used to indicate when a pointer has not been initialized. It is also often used to indicate the last element in an array of pointers in much the same way that a null character is used to terminate a character string.

image Actually the keyword nullptr was introduced in the 2011 standard. Before that, the constant 0 was used to indicate a null pointer.

It is a safe practice to initialize pointers to the nullptr (or 0 if your compiler doesn’t support nullptr yet). You should also clear out the contents of a pointer to heap memory after you invoke delete to avoid deleting the same memory block twice:

delete pHeap; // return memory to the heap
pHeap = nullptr; // now clear out the pointer

image Passing the same address to delete twice will always cause your program to crash. Passing a nullptr (or 0) to delete has no effect.

Declaring and Using Arrays of Pointers

If pointers can point to arrays, it seems only fitting that the reverse should be true. Arrays of pointers are a type of array of particular interest.

Just as arrays may contain other data types, an array may contain pointers. The following declares an array of pointers to ints:

int* pInts[10];

Given the preceding declaration, pInts[0] is a pointer to an int value. Thus, the following is true:

void fn()
{
int n1;
int* pInts[3];
pInts[0] = &n1;
*pInts[0] = 1;
}

or

void fn()
{
int n1, n2, n3;
int* pInts[3] = {&n1, &n2, &n3};
for (int i = 0; i < 3; i++)
{
*pInts[i] = 0;
}
}

or even

void fn()
{
int* pInts[3] = {(new int),
(new int),
(new int)};
for (int i = 0; i < 3; i++)
{
*pInts[i] = 0;
}
}

The latter declares three int objects off the heap. This type of declaration isn’t used very often except in the case of an array of pointers to character strings. The following two examples show why arrays of character strings are useful.

Utilizing arrays of character strings

Suppose I need a function that returns the name of the month corresponding to an integer argument passed to it. For example, if the program is passed a 1, it returns a pointer to the string “January”; if 2, it reports “February”, and so on. The month 0 and any numbers greater than 12 are assumed to be invalid. I could write the function as follows:

// int2month() - return the name of the month
const char* int2month(int nMonth)
{
const char* pszReturnValue;

switch(nMonth)
{
case 1: pszReturnValue = "January";
break;
case 2: pszReturnValue = "February";
break;
case 3: pszReturnValue = "March";
break;
// ...and so forth...
default: pszReturnValue = "invalid";
}
return pszReturnValue;
}

image The switch() control command is like a sequence of if statements.

A more elegant solution uses the integer value for the month as an index into an array of pointers to the names of the months. In use, this appears as follows:

// define an array containing the names of the months
const char *const pszMonths[] = {"invalid",
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"};

// int2month() - return the name of the month
const char* int2month(int nMonth)
{
// first check for a value out of range
if (nMonth < 1 || nMonth > 12)
{
return "invalid";
}

// nMonth is valid - return the name of the month
return pszMonths[nMonth];
}

Here int2month() first checks to make sure that nMonth is a number between 1 and 12, inclusive (the default clause of the switch statement handled that in the previous example). If nMonth is valid, the function uses it as an offset into an array containing the names of the months.

image This technique of referring to character strings by index is especially useful when writing your program to work in different languages. For example, a program may declare a ptrMonths of pointers to Julian months in different languages. The program would initialize ptrMonth to the proper names, be they in English, French, or German (for example), at execution time. In that way, ptrMonth[1] points to the correct name of the first Julian month, irrespective of the language.

A program that demonstrates int2Month() is included in the extras at www.dummies.com/extras/cplusplus as DisplayMonths.

Accessing the arguments to main()

Now the truth can be told — what are all those funny argument declarations to main() in our program template? The second argument to main() is an array of pointers to null-terminated character strings. These strings contain the arguments to the program. The arguments to a program are the strings that appear with the program name when you launch it. These arguments are also known as parameters. The first argument to main() is the number of parameters passed to the program. For example, suppose that I entered the following command at the command prompt:

MyProgram file.txt /w

The operating system executes the program contained in the file MyProgram (or MyProgram.exe on a Windows machine), passing it the arguments file.txt and /w.

Consider the following simple program:

// PrintArgs - write the arguments to the program
// to the standard output
#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;

int main(int nNumberofArgs, char* pszArgs[])
{
// print a warning banner
cout << "The arguments to "
<< pszArgs[0] << " are:\n";

// now write out the remaining arguments
for (int i = 1; i < nNumberofArgs; i++)
{
cout << i << ":" << pszArgs[i] << "\n";
}

// that's it
cout << "That's it" << endl;

// wait until user is ready before terminating program
// to allow the user to see the program results
cout << "Press Enter to continue..." << endl;
cin.ignore(10, '\n');
cin.get();
return 0;
}

As always, the function main() accepts two arguments. The first argument is an int that I have been calling (quite descriptively, as it turns out) nNumberofArgs. This variable is the number of arguments passed to the program. The second argument is an array of pointers of type char* that I have been calling pszArgs.

Accessing program arguments DOS-style

If I were to execute the PrintArgs program from the command prompt window as

PrintArgs arg1 arg2 arg3 /w

nArgs would be 5 (one for each argument). The first argument is the name of the program itself. This could be anywhere from the simple “PrintArgs” to the slightly more complicated “PrintArgs.exe” to the full path — the C++ standard doesn’t specify. The environment can even supply a null string “ ” if it doesn’t have access to the name of the program.

The remaining elements in pszArgs point to the program arguments. For example, the element pszArgs[1] points to “arg1” and pszArgs[2] to “arg2”. Because Windows does not place any significance on “/w”, this string is also passed as an argument to be processed by the program.

image Actually, C++ includes one final value. The last value in the array, the one after the pointer to the last argument of the program, contains nullptr.

To demonstrate how argument passing works, you need to build the program from within Code::Blocks and then execute the program directly from a command prompt. First ensure that Code::Blocks has built an executable by opening the PrintArgs projects and choosing Build⇒Rebuild.

Next, open a command prompt window. If you are running Unix or Linux, you’re already there. If you are running Windows, choose Programs⇒Accessories⇒Command Prompt to open an 80-character-wide window with a command prompt.

Now you need to use the CD command to navigate to the directory where Code::Blocks placed the PrintArgs program. If you used the default settings when installing Code::Blocks, that directory will be C:\CPP_Programs_from_Book\Chap09\PrintArgs\bin\Debug.

You can now execute the program by typing its name followed by your arguments. The following shows what happened when I did it in Windows 7:

C:\Users\Randy>cd \cpp_programs_from_book\chap09\printargs\bin\debug

C:\CPP_Programs_from_book\Chap09\PrintArgs\bin\Debug>PrintArgs arg1 arg2 arg3 /n
The arguments to PrintArgs are:
1:arg1
2:arg2
3:arg3
4:/n
That's it
Press Enter to continue...

Wild cards such as *.* may or may not be expanded before being passed to the program — the standard is silent on this point. The Code::Blocks/gcc compiler does perform such expansion on Windows, as the following example shows:

C:\CPP_Programs_from_book\Chap09\PrintArgs>bin\debug\PrintArgs *.*
The arguments to bin\debug\PrintArgs are:
1:bin
2:main.cpp
3:obj
4:PrintArgs.cbp
That's it
Press Enter to continue...

Here you see the names of the files in the current directory in place of the *.* that I entered.

image Wild-card expansion is performed under all forms of Linux, as well as on the Macintosh.

Accessing program arguments Code::Blocks-style

You can add arguments to your program when you execute it from Code::Blocks as well. Choose ProjectSet programs' arguments from within Code::Blocks. Enter the command line you would like in the Program arguments window.

Accessing program arguments Windows-style

Windows passes arguments as a means of communicating with your program as well. Try the following experiment: Build your program as you would normally. Find the executable file using Windows Explorer. (As noted earlier, the default location for the PrintArgs program isC:\CPP_Programs_from_book\Chap09\PrintArgs\bin\Debug.) Now grab a file and drop it onto the filename. (It doesn’t matter what file you choose because the program won’t hurt it anyway.) Bam! The PrintArgs program starts right up, and the name of the file that you dropped on the program appears.

Now try again, but drop several files at once. Select multiple filenames while pressing the Ctrl key or by using the Shift key to select a group. Now drag the lot of them onto PrintArgs.exe and let go. The name of each file appears as output.

I dropped a few of the files that appear in my \Program Files\WinZip folder onto PrintArgs as an example:

The arguments to C:\CPP_Programs_from_book\Chap09\PrintArgs\bin\Debug\PrintArgs.exe are:
1:C:\Program Files\WinZip\VENDOR.TXT
2:C:\Program Files\WinZip\WHATSNEW.TXT
3:C:\Program Files\WinZip\WINZIP.CHM
4:C:\Program Files\WinZip\WINZIP.TXT
5:C:\Program Files\WinZip\WINZIP32.EXE
6:C:\Program Files\WinZip\WZ.COM
That's it
Press Enter to continue...

Notice that the name of each file appears as a single argument, even though the filename may include spaces. Also note that Windows passes the full pathname of the file.