Miscellaneous and Advanced Features - Programming in C (Fourth Edition) (2015)

Programming in C (Fourth Edition) (2015)

16. Miscellaneous and Advanced Features

This chapter discusses some miscellaneous features of the C language that have not yet been covered and provides a discussion of some more advanced topics, such as command-line arguments and dynamic memory allocation. The topics presented in this chapter are varied, but they are important to know as you will see many of these concepts in C programs you encounter. Topics covered include

Image Understanding the goto statement, and why you should avoid it

Image Maximizing space by using unions

Image Adding the null statement to your programs

Image Implementing statements that include the comma operator

Image Using command-line arguments with your programs

Image Dynamically allocating memory with malloc() and calloc(), and cleaning it up with free()

Miscellaneous Language Statements

This section discusses two statements you haven’t encountered to this point: the goto and the null statements.

The goto Statement

Anyone who has learned about structured programming knows of the bad reputation afforded to the goto statement. Virtually every computer language has such a statement.

Execution of a goto statement causes a direct branch to be made to a specified point in the program. This branch is made immediately and unconditionally upon execution of the goto. To identify where in the program the branch is to be made, a label is needed. A label is a name that is formed with the same rules as variable names and must be immediately followed by a colon. The label is placed directly before the statement to which the branch is to be made and must appear in the same function as the goto.

So, for example, the statement

goto out_of_data;

causes the program to branch immediately to the statement that is preceded by the label out_of_data:. This label can be located anywhere in the function, before or after the goto, and might be used as shown:

out_of_data: printf ("Unexpected end of data.\n");
...

Programmers who are lazy frequently abuse the goto statement to branch to other portions of their code. The goto statement interrupts the normal sequential flow of a program. As a result, programs are harder to follow. Using many gotos in a program can make it impossible to decipher. This style of programming is often derisively referred to as “spaghetti code.” For this reason, goto statements are not considered part of good programming style.

The null Statement

C permits a solitary semicolon to be placed wherever a normal program statement can appear. The effect of such a statement, known as the null statement, is that nothing is done. Although this might seem useless, it is often used by C programmers in while, for, and do loops. For example, the purpose of the following statement is to store all the characters read in from the standard input into the character array pointed to by text until a newline character is encountered.

while ( (*text++ = getchar ()) != '\n' )
;

All of the operations are performed inside the looping-conditions part of the while statement. The null statement is needed because the compiler takes the statement that follows the looping expression as the body of the loop. Without the null statement, whatever statement that follows in the program is treated as the body of the program loop by the compiler.

The following for statement copies characters from the standard input to the standard output until the end of file is encountered:

for ( ; (c = getchar ()) != EOF; putchar (c) )
;

The next for statement counts the number of characters that appear in the standard input:

for ( count = 0; getchar () != EOF; ++count )
;

As a final example illustrating the null statement, the following loop copies the character string pointed to by from to the one pointed to by to.

while ( (*to++ = *from++) != '\0' )
;

The reader is advised that there is a tendency among certain programmers to try to squeeze as much as possible into the condition part of the while or into the condition or looping part of the for. Try not to become one of those programmers. In general, only those expressions involved with testing the condition of a loop should be included inside the condition part. Everything else should form the body of the loop. The only case to be made for forming such complex expressions might be one of execution efficiency. Unless execution speed is that critical, you should avoid using these types of expressions.

The preceding while statement is easier to read when written like this:

while ( *from != '\0' )
*to++ = *from++;

*to = '\0';

Working with Unions

One of the more unusual constructs in the C programming language is the union. This construct is used mainly in more advanced programming applications in which it is necessary to store different types of data in the same storage area. For example, if you want to define a single variable called x, which could be used to store a single character, a floating-point number, or an integer, you could first define a union called, perhaps, mixed:

union mixed
{
char c;
float f;
int i;
};

The declaration for a union is identical to that of a structure, except the keyword union is used where the keyword struct is otherwise specified. The real difference between structures and unions has to do with the way memory is allocated. Declaring a variable to be of type union mixed, as in

union mixed x;

does not define x to contain three distinct members called c, f, and i; rather, it defines x to contain a single member that is called either c, f, or i. In this way, the variable x can be used to store either a char or a float or an int, but not all three (or not even two of the three). You can store a character in the variable x with the following statement:

x.c = 'K';

The character stored in x can subsequently be retrieved in the same manner. So, to display its value at the terminal, for example, the following could be used:

printf ("Character = %c\n", x.c);

To store a floating-point value in x, the notation x.f is used:

x.f = 786.3869;

Finally, to store the result of dividing an integer count by 2 in x, the following statement can be used:

x.i = count / 2;

Because the float, char, and int members of x all exist in the same place in memory, only one value can be stored in x at a time. Furthermore, it is your responsibility to ensure that the value retrieved from a union is consistent with the way it was last stored in the union.

A union member follows the same rules of arithmetic as the type of the member that is used in the expression. So in

x.i / 2

the expression is evaluated according to the rules of integer arithmetic because x.i and 2 are both integers.

A union can be defined to contain as many members as desired. The C compiler ensures that enough storage is allocated to accommodate the largest member of the union. Structures can be defined that contain unions, as can arrays. When defining a union, the name of the union is not required, and variables can be declared at the same time that the union is defined. Pointers to unions can also be declared, and their syntax and rules for performing operations are the same as for structures.

One of the members of a union variable can be initialized. If no member name is specified, the first member of the union is set to the specified value, as in:

union mixed x = { '#' };

This sets the first member of x, which is c, to the character #.

By specifying the member name, any member of the union can be initialized like so:

union mixed x = { .f = 123.456; };

This sets the floating member f of the union mixed variable x to the value 123.456.

An automatic union variable can also be initialized to another union variable of the same type:

void foo (union mixed x)
{
union mixed y = x;
...
}

Here, the function foo assigns to the automatic union variable y the value of the argument x.

The use of a union enables you to define arrays that can be used to store elements of different data types. For example, the statement

struct
{
char *name;
enum symbolType type;
union
{
int i;
float f;
char c;
} data;
} table [kTableEntries];

sets up an array called table, consisting of kTableEntries elements. Each element of the array contains a structure consisting of a character pointer called name, an enumeration member called type, and a union member called data. Each data member of the array can contain either an int, a float, or a char. The member type might be used to keep track of the type of value stored in the member data. For example, you could assign it the value INTEGER if it contained an int, FLOATING if it contained a float, and CHARACTER if it contained a char. This information would enable you to know how to reference the particular data member of a particular array element.

To store the character '#' in table[5], and subsequently set the type field to indicate that a character is stored in that location, the following two statements could be used:

table[5].data.c = '#';
table[5].type = CHARACTER;

When sequencing through the elements of table, you could determine the type of data value stored in each element by setting up an appropriate series of test statements. For example, the following loop would display each name and its associated value from table at the terminal:

enum symbolType { INTEGER, FLOATING, CHARACTER };

...

for ( j = 0; j < kTableEntries; ++j ) {
printf ("%s ", table[j].name);

switch ( table[j].type ) {
case INTEGER:
printf ("%i\n", table[j].data.i);
break;
case FLOATING:
printf ("%f\n", table[j].data.f);
break;
case CHARACTER:
printf ("%c\n", table[j].data.c);
break;
default:
printf ("Unknown type (%i), element %i\n", table[j].type, j );
break;
}
}

The type of application illustrated might be practical for storage of a symbol table, for example, which might contain the name of each symbol, its type, and its value (and perhaps other information about the symbol as well).

The Comma Operator

At first glance, you might not realize that a comma can be used in expressions as an operator. The comma operator is at the bottom of the precedence totem pole, so to speak. In Chapter 4, “Program Looping,” you learned that inside a for statement you could include more than one expression in any of the fields by separating each expression with a comma. For example, the for statement that begins

for ( i = 0, j = 100; i != 10; ++i, j -= 10 )
...

initializes the value of i to 0 and j to 100 before the loop begins, and increments the value of i and subtracts 10 from the value of j each time after the body of the loop is executed.

The comma operator can be used to separate multiple expressions anywhere that a valid C expression can be used. The expressions are evaluated from left to right. So, in the statement

while ( i < 100 )
sum += data[i], ++i;

the value of data[i] is added into sum and then i is incremented. Note that you don’t need braces here because just one statement follows the while statement. (It consists of two expressions separated by the comma operator.)

Because all operators in C produce a value, the value of the comma operator is that of the rightmost expression.

Note that a comma, used to separate arguments in a function call, or variable names in a list of declarations, for example, is a separate syntactic entity and is not an example of the use of the comma operator.

Type Qualifiers

The following qualifiers can be used in front of variables to give the compiler more information about the intended use of the variable and, in some cases, to help it generate better code.

The register Qualifier

If a function uses a particular variable heavily, you can request that access to the variable be made as fast as possible by the compiler. Typically, this means requesting that it be stored in one of the machine’s registers when the function is executed. This is done by prefixing the declaration of the variable by the keyword register, as follows:

register int index;
register char *textPtr;

Both local variables and formal parameters can be declared as register variables. The types of variables that can be assigned to registers vary among machines. The basic data types can usually be assigned to registers, as well as pointers to any data type.

Even if your compiler enables you to declare a variable as a register variable, it is still not guaranteed that it will do anything with that declaration. It is up to the compiler.

You might want to also note that you cannot apply the address operator to a register variable. Other than that, register variables behave just as ordinary automatic variables.

The volatile Qualifier

The volatile qualifier is sort of the inverse to const. It tells the compiler explicitly that the specified variable will change its value. It’s included in the language to prevent the compiler from optimizing away seemingly redundant assignments to a variable, or repeated examination of a variable without its value seemingly changing. A good example is to consider an I/O port. Suppose you have an output port that’s pointed to by a variable in your program called outPort. If you want to write two characters to the port, for example an O followed by an N, you might have the following code:

*outPort = 'O';
*outPort = 'N';

A smart compiler might notice two successive assignments to the same location and, because outPort isn’t being modified in between, simply remove the first assignment from the program. To prevent this from happening, you declare outPort to be a volatile pointer, as follows:

volatile char *outPort;

The restrict Qualifier

Like the register qualifier, restrict is an optimization hint for the compiler. As such, the compiler can choose to ignore it. It is used to tell the compiler that a particular pointer is the only reference (either indirect or direct) to the value it points to throughout its scope. That is, the same value is not referenced by any other pointer or variable within that scope.

The lines

int * restrict intPtrA;
int * restrict intPtrB;

tell the compiler that, for the duration of the scope in which intPtrA and intPtrB are defined, they will never access the same value. Their use for pointing to integers inside an array, for example, is mutually exclusive.

Command-line Arguments

Many times, a program is developed that requires the user to enter a small amount of information at the terminal. This information might consist of a number indicating the triangular number that you want to have calculated or a word that you want to have looked up in a dictionary.

Rather than having the program request this type of information from the user, you can supply the information to the program at the time the program is executed. This capability is provided by what is known as command-line arguments.

As pointed out previously, the only distinguishing quality of the function main() is that its name is special; it specifies where program execution is to begin. In fact, the function main() is actually called upon at the start of program execution by the C system (known more formally as theruntime system), just as you call a function from within your own C program. When main() completes execution, control is returned to the runtime system, which then knows that your program has completed execution.

When main() is called by the runtime system, two arguments are actually passed to the function. The first argument, which is called argc by convention (for argument count), is an integer value that specifies the number of arguments typed on the command line. The second argument tomain() is an array of character pointers, which is called argv by convention (for argument vector). There are argc + 1 character pointers contained in this array, where argc always has a minimum value of 0. The first entry in this array is a pointer to the name of the program that is executing or is a pointer to a null string if the program name is not available on your system. Subsequent entries in the array point to the values that were specified in the same line as the command that initiated execution of the program. The last pointer in the argv array, argv[argc], is defined to be null.

To access the command-line arguments, the main() function must be appropriately declared as taking two arguments. The conventional declaration that is used appears as follows:

int main (int argc, char *argv[])
{
...
}

Remember, the declaration of argv defines an array that contains elements of type “pointer to char.” As a practical use of command-line arguments, recall Program 9.10, which looked up a word inside a dictionary and printed its meaning. You can make use of command-line arguments so that the word whose meaning you want to find can be specified at the same time that the program is executed, as in the following command:

lookup aerie

This eliminates the need for the program to prompt the user to enter a word because it is entered on the command line.

If the previous command is executed, the system automatically passes to main() a pointer to the character string "aerie" in argv[1]. Recall that argv[0] contains a pointer to the name of the program, which in this case is "lookup".

The main() routine might appear as follows:

#include <stdlib.h>
#include <stdio.h>

int main (int argc, char *argv[])
{
const struct entry dictionary[100] =
{ { "aardvark", "a burrowing African mammal" },
{ "abyss", "a bottomless pit" },
{ "acumen", "mentally sharp; keen" },
{ "addle", "to become confused" },
{ "aerie", "a high nest" },
{ "affix", "to append; attach" },
{ "agar", "a jelly made from seaweed" },
{ "ahoy", "a nautical call of greeting" },
{ "aigrette", "an ornamental cluster of feathers" },
{ "ajar", "partially opened" } };

int entries = 10;
int entryNumber;
int lookup (const struct entry dictionary [], const char search[],
const int entries);

if ( argc != 2 )
{
fprintf (stderr, "No word typed on the command line.\n");
return EXIT_FAILURE;
}

entryNumber = lookup (dictionary, argv[1], entries);

if ( entryNumber != -1 )
printf ("%s\n", dictionary[entryNumber].definition);
else
printf ("Sorry, %s is not in my dictionary.\n", argv[1]);

return EXIT_SUCCESS;
}

The main() routine tests to make certain that a word was typed after the program name when the program was executed. If it wasn’t, or if more than one word was typed, the value of argc is not equal to 2. In this case, the program writes an error message to standard error and terminates, returning an exit status of EXIT_FAILURE.

If argc is equal to 2, the lookup function is called to find the word pointed to by argv[1] in the dictionary. If the word is found, its definition is displayed.

As another example of command-line arguments, Program 15.3 was a file-copy program. Program 16.1, which follows, takes the two filenames from the command line rather than prompting the user to type them in.

Program 16.1 File Copy Program Using Command-line Arguments


// Program to copy one file to another — version 2


#include <stdio.h>

int main (int argc, char *argv[])
{
FILE *in, *out;
int c;

if ( argc != 3 ) {
fprintf (stderr, "Need two files names\n");
return 1;
}

if ( (in = fopen (argv[1], "r")) == NULL ) {
fprintf (stderr, "Can't read %s.\n", argv[1]);
return 2;
}

if ( (out = fopen (argv[2], "w")) == NULL ) {
fprintf (stderr, "Can't write %s.\n", argv[2]);
return 3;
}

while ( (c = getc (in)) != EOF )
putc (c, out);

printf ("File has been copied.\n");

fclose (in);
fclose (out);

return 0;
}


The program first checks to make certain that two arguments were typed after the program name. If so, the name of the input file is pointed to by argv[1], and the name of the output file by argv[2]. After opening the first file for reading and the second file for writing, and after checking both opens to make certain they succeeded, the program copies the file character by character as before.

Note that there are four different ways for the program to terminate: incorrect number of command-line arguments, can’t open the file to be copied for reading, can’t open the output file for writing, and successful termination. Remember, if you’re going to use the exit status, you shouldalways terminate the program with one. If your program terminates by falling through the bottom of main(), it returns an undefined exit status.

If Program 16.1 were called copyf and the program was executed with the following command line:

copyf foo foo1

then the argv array would look like Figure 16.1 when main is entered.

Image

Figure 16.1 argv array on startup of copyf.

Remember that command-line arguments are always stored as character strings. Execution of the program power with the command-line arguments 2 and 16, as in

power 2 16

stores a pointer to the character string "2" inside argv[1], and a pointer to the string "16" inside argv[2]. If the arguments are to be interpreted as numbers by the program (as you might suspect is the case in the power program), they must be converted by the program itself. Several routines are available in the program library for doing such conversions, such as sscanf(), atof(), atoi(), strtod(), and strtol(). These are described in Appendix B, “The Standard C Library.”

Dynamic Memory Allocation

Whenever you define a variable in C—whether it is a simple data type, an array, or a structure—you are effectively reserving one or more locations in the computer’s memory to contain the values that will be stored in that variable. The C compiler automatically allocates the correct amount of storage for you.

It is frequently desirable, if not necessary, to be able to dynamically allocate storage while a program is running. Suppose you have a program that is designed to read in a set of data from a file into an array in memory. Suppose, however, that you don’t know how much data is in the file until the program starts execution. You have three choices:

Image Define the array to contain the maximum number of possible elements at compile time.

Image Use a variable-length array to dimension the size of the array at runtime.

Image Allocate the array dynamically using one of C’s memory allocation routines.

Using the first approach, you have to define your array to contain the maximum number of elements that would be read into the array, as in the following:

#define kMaxElements 1000

struct dataEntry dataArray [kMaxElements];

Now, as long as the data file contains 1,000 elements or less, you’re in business. But if the number of elements exceeds this amount, you must go back to the program, change the value of kMaxElements, and recompile it. Of course, no matter what value you select, you always have the chance of running into the same problem again in the future.

With the second approach, if you can determine the number of elements you need before you start reading in the data (perhaps from the size of the file, for example), you can then define a variable-length array as follows:

struct dateEntry dataArray [dataItems];

Here, it is assumed that the variable dataItems contains the aforementioned number of data items to read in.

Using the dynamic memory allocation functions, you can get storage as you need it. That is, this approach enables you to allocate memory as the program is executing. To use dynamic memory allocation, you must first learn about three functions and one new operator.

The calloc() and malloc() Functions

In the standard C library, two functions, called calloc() and malloc(), can be used to allocate memory at runtime. The calloc() function takes two arguments that specify the number of elements to be reserved and the size of each element in bytes. The function returns a pointer to the beginning of the allocated storage area in memory. The storage area is also automatically set to 0.

The calloc() function returns a pointer to void, which is C’s generic pointer type. Before storing this returned pointer inside a pointer variable in your program, it can be converted into a pointer of the appropriate type using the type cast operator.

The malloc() function works similarly, except that it only takes a single argument—the total number of bytes of storage to allocate—and also doesn’t automatically set the storage area to 0.

The dynamic memory allocation functions are declared in the standard header file <stdlib.h>, which should be included in your program whenever you want to use these routines.

The sizeof Operator

To determine the size of data elements to be reserved by calloc() or malloc() in a machine-independent way, the C sizeof operator should be used. The sizeof operator returns the size of the specified item in bytes. The argument to the sizeof operator can be a variable, an array name, the name of a basic data type, the name of a derived data type, or an expression. For example, writing

sizeof (int)

gives the number of bytes needed to store an integer. On a Pentium 4 machine, this has the value 4 because an integer occupies 32 bits on that machine. If x is defined to be an array of 100 integers, the expression

sizeof (x)

gives the amount of storage required for the 100 integers of x (or the value 400 on a Pentium 4). The expression

sizeof (struct dataEntry)

has as its value the amount of storage required to store one dataEntry structure. Finally, if data is defined as an array of struct dataEntry elements, the expression

sizeof (data) / sizeof (struct dataEntry)

gives the number of elements contained in data (data must be a previously defined array, and not a formal parameter or externally referenced array). The expression

sizeof (data) / sizeof (data[0])

also produces the same result. The macro

#define ELEMENTS(x) (sizeof(x) / sizeof(x[0]))

simply generalizes this technique. It enables you to write code like

if ( i >= ELEMENTS (data) )
...

and

for ( i = 0; i < ELEMENTS (data); ++i )
...

You should remember that sizeof is actually an operator, and not a function, even though it looks like a function. This operator is evaluated at compile time and not at runtime, unless a variable-length array is used in its argument. If such an array is not used, the compiler evaluates the value of the sizeof expression and replaces it with the result of the calculation, which is treated as a constant.

Use the sizeof operator wherever possible to avoid having to calculate and hard-code sizes into your program.

Getting back to dynamic memory allocation, if you want to allocate enough storage in your program to store 1,000 integers, you can call calloc() as follows:

#include <stdlib.h>
...
int *intPtr;
...
intPtr = (int *) calloc (sizeof (int), 1000);

Using malloc(), the function call looks like this:

intPtr = (int *) malloc (1000 * sizeof (int));

Remember that both malloc() and calloc() are defined to return a pointer to void and, as noted, this pointer should be type cast to the appropriate pointer type. In the preceding example, the pointer is type cast to an integer pointer and then assigned to intPtr.

If you ask for more memory than the system has available, calloc() (or malloc()) returns a null pointer. Whether you use calloc() or malloc(), be certain to test the pointer that is returned to ensure that the allocation succeeded.

The following code segment allocates space for 1,000 integer pointers and tests the pointer that is returned. If the allocation fails, the program writes an error message to standard error and then exits.

#include <stdlib.h>
#include <stdio.h>
...
int *intPtr;
...
intptr = (int *) calloc (sizeof (int), 1000);

if ( intPtr == NULL )
{
fprintf (stderr, "calloc failed\n");
exit (EXIT_FAILURE);
}

If the allocation succeeds, the integer pointer variable intptr can be used as if it were pointing to an array of 1,000 integers. So, to set all 1,000 elements to −1, you could write

for ( p = intPtr; p < intPtr + 1000; ++p )
*p = -1;

assuming p is declared to be an integer pointer.

To reserve storage for n elements of type struct dataEntry, you first need to define a pointer of the appropriate type

struct dataEntry *dataPtr;

and could then proceed to call the calloc() function to reserve the appropriate number of elements

dataPtr = (struct dataEntry *) calloc (n, sizeof (struct dataEntry));

Execution of the preceding statement proceeds as follows:

1. The calloc() function is called with two arguments, the first specifying that storage for n elements is to be dynamically allocated and the second specifying the size of each element.

2. The calloc() function returns a pointer in memory to the allocated storage area. If the storage cannot be allocated, the null pointer is returned.

3. The pointer is type cast into a pointer of type “pointer to struct dataEntry” and is then assigned to the pointer variable dataPtr.

Once again, the value of dataPtr should be subsequently tested to ensure that the allocation succeeded. If it did, its value is nonnull. This pointer can then be used in the normal fashion, as if it were pointing to an array of n dataEntry elements. For example, if dataEntry contains an integer member called index, you can assign 100 to this member as pointed to by dataPtr with the following statement:

dataPtr->index = 100;

The free Function

When you have finished working with the memory that has been dynamically allocated by calloc() or malloc(), you should give it back to the system by calling the free() function. The single argument to the function is a pointer to the beginning of the allocated memory, as returned by a calloc() or malloc() call. So, the call

free (dataPtr);

returns the memory allocated by the calloc() call shown previously, provided that the value of dataPtr still points to the beginning of the allocated memory.

The free() function does not return a value.

The memory that is released by free() can be reused by a later call to calloc() or malloc(). For programs that need to allocate more storage space than would otherwise be available if it were all allocated at once, this is worth remembering. Make certain you give the free()function a valid pointer to the beginning of some previously allocated space.

Dynamic memory allocation is invaluable when dealing with linked structures, such as linked lists. When you need to add a new entry to the list, you can dynamically allocate storage for one entry in the list and link it into the list with the pointer returned by calloc() or malloc(). For example, assume that listEnd points to the end of a singly linked list of type struct entry, defined as follows:

struct entry
{
int value;
struct entry *next;
};

Here is a function called addEntry() that takes as its argument a pointer to the start of the linked list and that adds a new entry to the end of the list.

#include <stdlib.h>
#include <stddef.h>

// add new entry to end of linked list

struct entry *addEntry (struct entry *listPtr)
{
// find the end of the list

while ( listPtr->next != NULL )
listPtr = listPtr->next;

// get storage for new entry

listPtr->next = (struct entry *) malloc (sizeof (struct entry));

// add null to the new end of the list

if ( listPtr->next != NULL )
(listPtr->next)->next = (struct entry *) NULL;

return listPtr->next;
}

If the allocation succeeds, a null pointer is placed in the next member of the newly allocated linked-list entry (pointed to by listPtr->next).

The function returns a pointer to the new list entry, or the null pointer if the allocation fails (verify that this is, in fact, what happens). If you draw a picture of a linked list and trace through the execution of addEntry(), it will help you to understand how the function works.

Another function, called realloc(), is associated with dynamic memory allocation. It can be used to shrink or expand the size of some previously allocated storage. For more details, consult Appendix B.

This chapter concludes coverage of the features of the C language. In Chapter 17, “Debugging Programs,” you learn some techniques that will help you to debug your C programs. One involves using the preprocessor. The other involves the use of a special tool, called an interactive debugger.

Exercises

1. Type in and run the program presented in this chapter. Check the program’s results by comparing the original file you chose to copy with the filename you entered to copy and ensure the two are the same.

2. Finish the program that takes a word as a command-line argument and looks up the word to see whether it is in the array of terms and definitions, providing the definition if it is found, or informs the user that the term is not in the program’s glossary if it isn’t found.