Understanding and Using C Pointers (2013)

Chapter 7. Security Issues and the Improper Use of Pointers

Few applications exist where security and reliability are not significant concerns. This concern is reinforced by frequent reports of security breaches and application failures. The responsibility of securing an application largely falls on the developer. In this chapter, we will examine practices to make applications more secure and reliable.

Writing secure applications in C can be difficult because of several inherent aspects of the language. For example, C does not prevent the programmer from writing outside an array’s bounds. This can result in corrupted memory and introduce potential security risks. In addition, the improper use of pointers is often at the root of many security problems.

When an application behaves in unpredictable ways, it may not seem to be a security issue, at least in terms of unauthorized access. However, it is sometimes possible to take advantage of this behavior, which can result in a denial of service and thus compromise the application. Unpredictable behavior that results from improper use of pointers has been illustrated elsewhere in this book. In this chapter, we will identify additional improper usages of pointers.

The CERT organization is a good source for a more comprehensive treatment of security issues in C and other languages. This organization studies Internet security vulnerabilities. We will focus on those security issues related to the use of pointers. Many of the CERT organization’s security concerns can be traced back to the improper use of pointers. Understanding pointers and the proper ways to use them is an important tool for developing secure and reliable applications. Some of these topics have been addressed in earlier chapters, not necessarily from a security standpoint but rather from a programming practice standpoint.

There have been improvements in security introduced by operating systems (OS). Some of these improvements are reflected in how memory is used. Although improvements are typically beyond the control of developers, they will affect the program. Understanding these issues will help explain an application’s behavior. We will focus on Address Space Layout Randomization and Data Execution Prevention.

The Address Space Layout Randomization (ASLR) process arranges an application’s data region randomly in memory. These data regions include the code, stack, and heap. Randomizing the placement of these regions makes it more difficult for attackers to predict where memory will be placed and thus more difficult to use them. Certain types of attacks, such as the return-to-libc attack, overwrite portions of the stack and transfer control to this region. This area is frequently the shared C library, libc. If the location of the stack and libc are not known, then such attacks will be less likely to succeed.

The Data Execution Prevention (DEP) technique prevents the execution of code if it is in a nonexecutable region of memory. In some types of attacks, a region of memory is overwritten with a malicious code and then control is transferred to it. If this region of code is nonexecutable, such as the stack or heap, then it is prevented from executing. This technique can be implemented either in hardware or in software.

In this chapter, we will examine security issues from several perspectives:

§ Declaration and initialization of pointers

§ Improper pointer usage

§ Deallocation problems

Pointer Declaration and Initialization

Problems can arise with the declaration and initialization of pointers or, more correctly, the failure to initialize pointers. In this section, we will examine situations where these types of problems can occur.

Improper Pointer Declaration

Consider the following declaration:

int* ptr1, ptr2;

There is nothing necessarily wrong with the declaration; however, it may not be what was intended. This declaration declared ptr1 as a pointer to an integer and ptr2 as an integer. The asterisk was purposely placed next to the data type, and a space was placed before ptr1. This placement makes no difference to the compiler, but to the reader, it may imply that both ptr1 and ptr2 are declared as pointers to integers. However, only ptr1 is a pointer.

The correct approach is to declare them both as pointers using a single line, as shown below:

int *ptr1, *ptr2;

NOTE

It is an even better practice to declare each variable on its own line.

Another good practice involves using type definitions instead of macro definitions. These definitions allow the compiler to check scoping rules, which is not always true with macro definitions.

Variables may be declared with the assistance of a directive, as shown below. Here, a pointer to an integer is wrapped in a define directive and then used to declare variables:

#define PINT int*

PINT ptr1, ptr2;

However, the result is the same problem as described above. A better approach is shown below using a type definition:

typedef int* PINT;

PINT ptr1, ptr2;

Both variables are declared as pointers to integers.

Failure to Initialize a Pointer Before It Is Used

Using a pointer before it is initialized can result in a run-time error. This is sometimes referred to as a wild pointer. A simple example follows where a pointer to an integer is declared but is never assigned a value before it is used:

int *pi;

...

printf(“%d\n”,*pi);

Figure 7-1 illustrates how memory is allocated at this point.

Wild pointer

Figure 7-1. Wild pointer

The variable pi has not been initialized and will contain garbage, indicated by the ellipses. Most likely this sequence will terminate during execution if the memory address stored in pi is outside the valid address space for the application. Otherwise, the value displayed will be whatever happens to be at that address and will be presented as an integer. If we use a pointer to a string instead, we will frequently see a series of strange characters displayed until the terminating zero is reached.

Dealing with Uninitialized Pointers

Nothing inherent in a pointer tells us whether it is valid. Thus, we cannot simply examine its contents to determine whether it is valid. However, three approaches are used to deal with uninitialized pointers:

§ Always initialize a pointer with NULL

§ Use the assert function

§ Use third-party tools

Initializing a pointer to NULL will make it easier to check for proper usage. Even then, checking for a null value can be tedious, as shown below:

int *pi = NULL;

...

if(pi == NULL) {

// pi should not be dereferenced

} else {

// pi can be used

}

The assert function can also be used to test for null pointer values. In the following example, the pi variable is tested for a null value. If the expression is true, then nothing happens. If the expression is false, then the program terminates. Thus, the program will terminate if the pointer is null.

assert(pi != NULL);

For debug versions of the application, this approach may be acceptable. If the pointer is null, then the output will appear similar to the following:

Assertion failed: pi != NULL

The assert function is found in the assert.h header file.

Third-party tools can also be used to help identify these types of problems. In addition, certain compiler options can be useful, as addressed in the section Using Static Analysis Tools.

Pointer Usage Issues

In this section, we will examine misuse of the dereference operator and array subscripts. We will also examine problems related to strings, structures, and function pointers.

Many security issues revolve around the concept of a buffer overflow. Buffer overflow occurs when memory outside the object’s bounds is overwritten. This memory may be part of the program’s address space or another process. When the memory is outside of the program address space, most operating systems will issue a segmentation fault and terminate the program. Termination for this reason constitutes a denial of service attack when done maliciously. This type of attack does not attempt to gain unauthorized access but tries to take down the application and potentially a server.

If the buffer overflow occurs within the application’s address space, then it can result in unauthorized access to data and/or the transfer of control to another segment of code, thereby potentially compromising the system. This is of particular concern if the application is executing with supervisor privileges.

Buffer overflow can happen by:

§ Not checking the index values used when accessing an array’s elements

§ Not being careful when performing pointer arithmetic with array pointers

§ Using functions such as gets to read in a string from standard input

§ Using functions such as strcpy and strcat improperly

When buffer overflow occurs with a stack frame element, it is possible to overwrite the return address portion of the stack frame with a call to malicious code created at the same time. See Program Stack and Heap for more detail about the stack frame. When the function returns, it will transfer control to the malicious function. This function can then perform any operation, restrained only by the current user’s privilege level.

Test for NULL

Always check the return value when using a malloc type function. Failure to do so can result in abnormal termination of the program. The following illustrates the general approach:

float *vector = malloc(20 * sizeof(float));

if(vector == NULL) {

// malloc failed to allocate memory

} else {

// Process vector

}

Misuse of the Dereference Operator

A common approach for declaring and initializing a pointer is shown below:

int num;

int *pi = &num;

Another seemingly equivalent declaration sequence follows:

int num;

int *pi;

*pi = &num;

However, this is not correct. Notice the use of the dereference operator on the last line. We are attempting to assign the address of num not to pi but rather to the memory location specified by the contents of pi. The pointer, pi, has not been initialized yet. We have made a simple mistake of misusing the dereference operator. The correct sequence follows:

int num;

int *pi;

pi = &num;

In the original declaration, int *pi = &num, the asterisk declared the variable to be a pointer. It was not used as the dereference operator.

Dangling Pointers

A dangling pointer occurs when a pointer is freed but still references that memory. This problem is described in detail in Dangling Pointers. If an attempt is made to access this memory later, then its contents may well have changed. A write operation against this memory may corrupt memory, and a read operation may return invalid data. Either could potentially result in the termination of the program.

This has not been considered a security concern until recently. As explained in Dangling Pointer, there exists a potential for exploiting a dangling pointer. However, this approach is based on the exploitation of the VTable (Virtual Table) in C++. A VTable is an array of function pointers used to support virtual methods in C++. Unless you are using a similar approach involving function pointers, this should not be a concern in C.

Accessing Memory Outside the Bounds of an Array

Nothing can prevent a program from accessing memory outside of the space allocated for an array. In this example, we declare and initialize three arrays to demonstrate this behavior. The arrays are assumed to be allocated in consecutive memory locations.

char firstName[8] = "1234567";

char middleName[8] = "1234567";

char lastName[8] = "1234567";

middleName[-2] = 'X';

middleName[0] = 'X';

middleName[10] = 'X';

printf("%p %s\n",firstName,firstName);

printf("%p %s\n",middleName,middleName);

printf("%p %s\n",lastName,lastName);

To illustrate how memory is overwritten, three arrays are initialized to a simple sequence of numbers. While the behavior of the program will vary by compiler and machine, this will normally execute and overwrite characters in firstName and lastName. The output is shown below.Figure 7-2 illustrates how memory is allocated:

116 12X4567

108 X234567

100 123456X

Using invalid array indexes

Figure 7-2. Using invalid array indexes

As explained in Chapter 4, the address calculated using subscripts does not check the index values. This is a simple case of buffer overflow.

Calculating the Array Size Incorrectly

When passing an array to a function, always pass the size of the array at the same time. This information will help the function avoid exceeding the bounds of the array. In the replace function shown below, the string’s address is passed along with a replacement character and the buffer’s size. The function’s purpose is to replace all of the characters in the string up to the NUL character with the replacement character. The size argument prevents the function from writing past the end of the buffer:

void replace(char buffer[], char replacement, size_t size) {

size_t count = 0;

while(*buffer != NUL && count++<size) {

*buffer = replacement;

buffer++;

}

In the following sequence, the name array can only hold up to seven characters plus the NUL termination character. However, we purposely write past the end of the array to demonstrate the replace function. In the following sequence, the replace function is passed to the name and a replacement character of +:

char name[8];

strcpy(name,"Alexander");

replace(name,'+',sizeof(name));

printf("%s\n", name);

When this code is executed, we get the following output:

++++++++r

Only eight plus-sign characters were added to the array. While the strcpy function permitted buffer overflow, the replace function did not. This assumes that the size passed is valid. Functions like strcpy that do not pass the buffer’s size should be used with caution. Passing the buffer’s size provides an additional layer of protection.

Misusing the sizeof Operator

An example of misusing the sizeof operator occurs when we attempt to check our pointer bounds but do it incorrectly. In the following example, we allocate memory for an integer array and then initialize each element to 0.

int buffer[20];

int *pbuffer = buffer;

for(int i=0; i<sizeof(buffer); i++) {

*(pbuffer++) = 0;

}

However, the sizeof(buffer) expression returns 80 since the size of the buffer in bytes is 80 (20 multiplied by 4 byte elements). The for loop is executed 80 times instead of 20 and will frequently result in a memory access exception terminating the application. Avoid this by using the expression sizeof(buffer)/sizeof(int) in the test condition of the for statement.

Always Match Pointer Types

It is a good idea to always use the appropriate pointer type for the data. To demonstrate one possible pitfall, consider the following sequence. A pointer to an integer is assigned to a pointer to a short:

int num = 2147483647;

int *pi = &num;

short *ps = (short*)pi;

printf("pi: %p Value(16): %x Value(10): %d\n", pi, *pi, *pi);

printf("ps: %p Value(16): %hx Value(10): %hd\n",

ps, (unsigned short)*ps, (unsigned short)*ps);

The output of the snippet follows:

pi: 100 Value(16): 7fffffff Value(10): 2147483647

ps: 100 Value(16): ffff Value(10): -1

Notice that it appears that the first hexadecimal digit stored at address 100 is 7 or f, depending on whether it is displayed as an integer or as a short. This apparent contradiction is an artifact of executing this sequence on a little endian machine. The layout of memory for the constant at address 100 is illustrated in Figure 7-3.

Mismatched pointer types

Figure 7-3. Mismatched pointer types

If we treat this as a short number and only use the first two bytes, then we get the short value of –1. If we treat this as an integer and use all four bytes, then we get 2,147,483,647. These types of subtle problems are what make C and pointers such a challenging subject.

Bounded Pointers

The term bounded pointers describes pointers whose use is restricted to only valid regions. For example, with an array declared with 32 elements, a pointer used with this array would be restricted from accessing any memory before or after the array.

C does not provide any direct support for this approach. However, it can be enforced explicitly by the programmer, as shown below:

#define SIZE 32

char name[SIZE];

char *p = name;

if(name != NULL) {

if(p >= name && p < name+SIZE) {

// Valid pointer - continue

} else {

// Invalid pointer - error condition

}

This approach can get tedious. Instead, static analysis as discussed in the section Using Static Analysis Tools can be helpful.

An interesting variation is to create a pointer validation function. For this to happen, the initial location and range must be known.

Another approach is to use the Bounded Model Checking for ANSI-C and C++ (CBMC). This application checks for various safety and security issues within C programs and finds array bounds and buffer overflow problems.

NOTE

Smart pointers, available in C++, provide a way of simulating a pointer and support bounds checking. Unfortunately, they are not available in C.

String Security Issues

Security issues related to a string generally occur when we write past the end of a string. In this section, we will focus on the “standard” functions that contribute to this problem.

The use of string functions such as strcpy and strcat can result in buffer overflow if they are not used carefully. Several approaches have been suggested to replace these methods, but none have become widely accepted. The strncpy and strncat functions can provide some support for this operation where a size_t parameter specifies the maximum number of characters to copy. However, they can also be error prone if the number of characters is not calculated correctly.

In C11 (Annex K), the strcat_s and strcpy_s functions have been added. They return an error if buffer overflow occurs. Currently, they are only supported by Microsoft Visual C++. The following example illustrates the use of the strcpy_s function. It takes three parameters: a destination buffer, the size of the destination buffer, and a source buffer. If the return value is zero, then no errors occurred. However, in this example, an error will result since the source is too large to fit into the destination buffer:

char firstName [8];

int result;

result = strcpy_s(firstName,sizeof(firstName),"Alexander");

The scanf_s and wscanf_s functions are also available to protect against buffer overflow.

The gets function reads a string from standard input and stores the character in a designated buffer. It can write past the buffer’s declared length. If the string is too long, then buffer overflow will occur.

Also, the strlcpy and srtlcat functions are supported on some Linux systems but not by GNU C library. They are thought by some to create more problems than they solve and are not well documented.

The use of some functions can result in an attacker accessing memory using a technique known as format string attacks. In these attacks, a user-supplied format string, illustrated below, is crafted to enable access to memory and potentially the ability to inject code. In this simple program, the second command line argument is used as the first parameter of the printf function:

int main(int argc, char** argv) {

printf(argv[1]);

...

}

This program can be executed using a command similar to the following:

main.exe "User Supplied Input"

Its output will appear as:

User Supplied Input

Although this program is innocuous, a more sophisticated attack can do real damage. Comprehensive coverage of this topic is not provided here; however, more detail on how to effect such an attack can be found at hackerproof.org.

Functions such as printf, fprintf, snprintf, and syslog all have a format string as an argument. The simplest defense against this type of attack is to never use a user-supplied format string with these functions.

Pointer Arithmetic and Structures

Pointer arithmetic should only be used with arrays. Because arrays are guaranteed to be allocated in a contiguous block of memory, pointer arithmetic will result in a valid offset. However, they should not be used within structures, as the structure’s fields may not be allocated in consecutive regions of memory.

This is illustrated with the following structure. The name field is allocated 10 bytes, and is followed by an integer. However, since the integer will be aligned on a four-byte boundary, there will be a gap between the two fields. Gaps of this type are explained in the sectionHow Memory Is Allocated for a Structure.

typedef struct _employee {

char name[10];

int age;

} Employee;

The following sequence attempts to use a pointer to access the age field of the structure:

Employee employee;

// Initialize eployee

char *ptr = employee.name;

ptr += sizeof(employee.name);

The pointer will contain the address 110, which is the address of the two bytes found between the two fields. Dereferencing the pointer will interpret the four bytes at address 110 as an integer. This is illustrated in Figure 7-4.

Structure padding example

Figure 7-4. Structure padding example

WARNING

Improperly aligned pointers can result in an abnormal program termination or retrieval of bad data. In addition, slower pointer access is possible if the compiler is required to generate additional machine code to compensate for the improper alignment.

Even if the memory within a structure is contiguous, it is not a good practice to use pointer arithmetic with the structure’s fields. The following structure defines an Item consisting of three integers. While the three integer fields will normally be allocated in consecutive memory locations, there is no guarantee that they will be:

typedef struct _item {

int partNumber;

int quantity;

int binNumber;

}Item;

The following code sequence declares a part and then uses pointer arithmetic to access each field:

Item part = {12345, 35, 107};

int *pi = &part.partNumber;

printf("Part number: %d\n",*pi);

pi++;

printf("Quantity: %d\n",*pi);

pi++;

printf("Bin number: %d\n",*pi);

Normally, the output will be as expected, but it is not guaranteed to work. A better approach is to assign each field to pi:

int *pi = &part.partNumber;

printf("Part number: %d\n",*pi);

pi = &part.quantity;

printf("Quantity: %d\n",*pi);

pi = &part.binNumber;

printf("Bin number: %d\n",*pi);

Even better, do not use pointers at all, as shown below:

printf("Part number: %d\n",part.partNumber);

printf("Quantity: %d\n",part.quantity);

printf("Bin number: %d\n",part.binNumber);

Function Pointer Issues

Functions and function pointers are used to control a program’s execution sequence, but they can be misused, resulting in unpredictable behavior. Consider the use of the function getSystemStatus. This function returns an integer value that reflects the system’s status:

int getSystemStatus() {

int status;

...

return status;

}

The best way to determine whether the system status is zero follows:

if(getSystemStatus() == 0) {

printf("Status is 0\n");

} else {

printf("Status is not 0\n");

}

In the next example, we forget to use the open and close parentheses. The code will not execute properly:

if(getSystemStatus == 0) {

printf("Status is 0\n");

} else {

printf("Status is not 0\n");

}

The else clause will always be executed. In the logical expression, we compared the address of the function with 0 instead of calling the function and comparing its return value to 0. Remember, when a function name is used by itself, it returns the address of the function.

A similar mistake is using a function return value directly without comparing its result to some other value. The address is simply returned and evaluated as true or false. The address of the function is not likely to be zero. As a result, the address returned will be evaluated as true since C treats any nonzero value as true:

if(getSystemStatus) {

// Will always be true

}

We should have written the function call as follows to determine whether the status is zero.

if(getSystemStatus()) {

Do not assign a function to a function pointer when their signatures differ. This can result in undefined behavior. An example of this misuse is shown below:

int (*fptrCompute)(int,int);

int add(int n1, int n2, int n3) {

return n1+n2+n3;

}

fptrCompute = add;

fptrCompute(2,5);

We attempted to invoke the add function with only two arguments when it expected three arguments. This will compile, but the output is indeterminate.

A function pointer executes different functions, depending on the address assigned to it. For example, we may want to use the printf function for normal operations but change it to a different function for specialized logging purposes. Declaring and using such a function pointer is shown below:

int (*fptrIndirect)(const char *, ...) = printf;

fptrIndirect("Executing printf indirectly");

It may be possible for an attacker to use buffer overflow to overwrite the function pointer’s address. When this happens, control can be transferred to an arbitrary location in memory.

Memory Deallocation Issues

Even when memory has been deallocated, we are not necessarily through with the pointer or the deallocated memory. One concern deals with what happens when we try to free the same memory twice. In addition, once memory is freed, we may need to be concerned with protecting any residual data. We will examine these issues in this section.

Double Free

Freeing a block of memory twice is referred to as double free, as explained in Double Free. The following illustrates how this can be done:

char *name = (char*)malloc(...);

...

free(name); // First free

...

free(name); // Double free

In an earlier version of the zlib compression library, it was possible for a double-free operation to result in a denial of service attack or possibly to insert code into the program. However, this is extremely unlikely and the vulnerability has been addressed in newer releases of the library. More information about this vulnerability can be found at cert.org.

A simple technique to avoid this type of vulnerability is to always assign NULL to a pointer after it has been freed. Subsequent attempts to free a null pointer will be ignored by most heap managers.

char *name = (char*)malloc(...);

...

free(name);

name = NULL;

In the section Writing your own free function, we developed a function to achieve this effect.

Clearing Sensitive Data

It is a good idea to overwrite sensitive data in memory once it is no longer needed. When your application terminates, most operating systems do not zero out or otherwise manipulate the memory used by your application. Your old space may be allocated to another program, which will have access to its contents. Overwriting sensitive data will make it more difficult for another program to extract useful information from program address space previously used to hold sensitive data. The following sequence illustrates zeroing out of sensitive data in a program:

char name[32];

int userID;

char *securityQuestion;

// assign values

...

// Delete sensitive information

memset(name,0,sizeof(name));

userID = 0;

memset(securityQuestion,0,strlen(securityQuestion));

If name has been declared as a pointer, then we should clear its memory before we deallocate it, as shown below:

char *name = (char*)malloc(...);

...

memset(name,0,sizeof(name));

free(name);

Using Static Analysis Tools

Numerous static analysis tools are available to detect improper use of pointers. In addition, most compilers possess options to detect many of the issues addressed in this chapter. For example, the GCC compiler’s -Wall option enables the reporting of all compiler warnings.

The following illustrates the warnings produced by some of the examples included in this chapter. Here we forget to use open and close parentheses for a function call:

if(getSystemStatus == 0) {

The result is the following warning:

warning: the address of 'getSystemStatus' will never be NULL

We make essentially the same mistake here:

if(getSystemStatus) {

However, the warning is different:

warning: the address of 'getSystemStatus' will always evaluate as 'true'

Using incompatible pointer types will result in a warning:

int (*fptrCompute)(int,int);

int addNumbers(int n1, int n2, int n3) {

return n1+n2+n3;

}

...

fptrCompute = addNumbers;

The warning follows:

warning: assignment from incompatible pointer type

Failure to initialize a pointer is usually a problem:

char *securityQuestion;

strcpy(securityQuestion,"Name of your home town");

The warning generated is surprisingly lucid:

warning: 'securityQuestion' is used uninitialized in this function

Numerous static analysis tools are also available. Some are free, and others are available for a fee. They generally provide enhanced diagnostic capabilities beyond those provided by most compilers. Because of their complex nature, examples are beyond the scope of this book.

Summary

In this chapter, we investigated how pointers can affect an application’s security and reliability. These issues were organized around the declaration and initialization of pointers, the use of pointers, and memory deallocation problems. For example, it is important to initialize a pointer before it is used and to potentially clean up the memory used by a string once the memory is no longer needed. Setting a pointer to NULL can be an effective technique in many of these situations.

Pointers can be misused in several ways. Many of these involve overwriting memory outside the string, a form of buffer overflow. The misuse of pointers can cause undefined behavior in several areas, including mismatching pointer types and incorrect pointer arithmetic.

We illustrated various techniques to avoid these types of problems. Many involved simply understanding how pointers and strings are supposed to be used. We also touch on how compilers and static analysis tools can be used to identify potential problem areas.