C 101 - 21st Century C (2015)

21st Century C (2015)

Appendix A. C 101

This appendix covers the basics of the language. It’s not for everyone.

§ If you already have experience writing code in a common scripting language, like Python, Ruby, or Visual Basic, this appendix will be at your level. I don’t have to explain to you what variables, functions, loops, or other basic building blocks are, so the main headings of this appendix are about the big differences between C and typical scripting languages.

§ If you learned C a long time ago and are feeling rusty, skimming this tutorial should remind you of the quirks that make C different and unique.

§ If you already work with C on a regular basis, don’t bother reading this appendix. You may also want to skip or skim the early parts of Part II as well, which are aimed at common errors and misunderstandings about the core of the language.

Don’t expect to be an expert in C by the end of this tutorial—there’s no substitute for real experience with the language. But you will be in a position to get started with Part II of this book and find out about the nuances and useful customs of the language.

The Structure

I’ll kick off the tutorial the way Kernighan & Ritchie did in their 1978 blockbuster book: with a program to say hello.

//tutorial/hello.c

#include <stdio.h>

int main(){

printf("Hello, world.\n");

}

The double slashes on the first line indicate a comment that the compiler will ignore. All of the code samples in this appendix marked with a file name like this are available online at: https://github.com/b-k/21st-Century-Examples.

Even this much reveals a few key points about C. Structurally, almost everything in a C program is:

§ A preprocesser directive, like #include <stdio.h>

§ A declaration of a variable or a type (though this program has none)

§ A function block, like main, containing expressions to evaluate (like printf)

But before going into detail about the definition and use of preprocessor directives, declarations, blocks, and expressions, we have to work out how to run this program so the computer can greet us.

C requires a compilation step, which consists of running a single command

A scripting language comes with a program that parses the text of your scripts; C has a compiler that takes in your program text and produces a program directly executed by the operating system. Using the compiler is something of a pain, so there are programs to run the compiler for you. Your integrated development environments (IDEs) typically have a compile-and-run button, and on the command line, a POSIX-standard program named make will run the compiler for you.

If you don’t have a compiler and make, then go to “Use a Package Manager” and read about how to obtain them. The short version: ask your package manager to install gcc or clang, and make.

With a compiler and make installed, if you saved the above program as hello.c then you can use make to run the compiler via this command:

make hello

This produces the hello program, which you can run from the command line or click on in your file browser to verify that it prints what we expect it to.

The sample code repository includes a makefile, which instructs make to send some compilation flags to the compiler. The workings of make and the contents of the makefile are discussed at length in “Using Makefiles”. For now, I’ll mention one flag: -Wall. This flag asks the compiler to list all warnings about parts of your program that are technically correct, but may not be what you meant. This is known as static analysis, and modern C compilers are very good at it. You can thus think of the compilation step not as a useless formality, but as a chance to submit your code to a team of the world’s foremost experts in C before running the program.

If you have a Mac that doesn’t like the -Wall flag, see the warning in “A Few of My Favorite Flags” on how to re-alias gcc.

A lot of bloggers see the compilation step as a big deal. On the command line, if typing make yourprogram before running via ./yourprogram is just too much effort, you can write an alias or shell script to do it. In the POSIX shell, you could define:

function crun { make $1 && ./$1; }

and then use

crun hello

to compile and, if the compilation worked, run.

There’s a standard library, and it’s part of your operating system

Programs in the present day are typically not completely standalone, but link to libraries of common functions possibly used by more than one program. The library path is a list of directories on your hard drive that the compiler searches for such libraries; see “Paths” for details. Key among these libraries is the C standard library, defined in the ISO C standard and about as close to universally available as computer code can be. This is where the printf function is defined.

There’s a preprocessor

The libraries are in binary format, executable by the computer but illegible to humans. Unless you have binary-reading superpowers, you can’t look at the compiled library to verify that you are using printf correctly. So there are companion files to a library, header files, that list plain-text declarations for the utilities in the library, giving the inputs that each function expects and the outputs they produce. If you include the appropriate header in your program, then the compiler can do consistency checks to verify that your use of a function, variable, or type is consistent with what the binary code in the library expects.

The primary activity of the preprocessor is to substitute the text of preprocessor directives (which all begin with a #) with other text. There are many other uses (see “The Preprocessor”), but the only use I’ll cover in this appendix is including other files. When the preprocessor sees

#include <stdio.h>

it will substitute the full text of stdio.h at this point. The angle brackets in <stdio.h> indicate that the library is on the include path, which is distinct from the library path (and is also discussed in detail in “Paths”). If a file is in the working directory for the project, use #include "myfile.h".

The .h ending indicates that the file is a header file. Header files are plain code, and the compiler doesn’t know a header from other code files, but the custom is to put only declarations in header files.

After the preprocessor has done its work, almost everything in the file will either be a declaration of a variable or type, or the definition of a function.

There are two types of comment

/* Multiline comments run between a slash-star

and a star-slash. */

//Single-line comments run from a double-slash to the end of the line.

There is no print keyword

The printf function from the standard library prints text to the screen. It has its own sublanguage for precisely expressing how variables are printed. I won’t give you a detailed explanation of its working because there are comprehensive descriptions of the printf sublanguage everywhere (try man 3 printf from your command line), and because you’ll see examples throughout this tutorial and throughout the book. The sublanguage consists of plain text interspersed with insert variable here markers and codes for invisible characters like tabs and newlines. Here are the six elements that will get you by as you read examples of printf-family functions in the rest of the tutorial:

\n

A newline

\t

A tab

%i

Insert an integer value here

%g

Insert a real number in general format here

%s

Insert a string of text here

%%

Insert a plain percent sign here

Variable Declarations

Declarations are a big difference between C and a lot of scripting languages that infer the type of a variable—and even its existence—via the first use. Above, I suggested that the compilation step is really a chance to do prerun checks to verify that your code has some chance of doing what you promised it does; declaring the type of each variable gives the compiler much more of an opportunity to check that your writing is coherent. There is also a declaration syntax for functions and new types.

Variables have to be declared

The hello program didn’t have any variables, but here is a program that declares a few variables and demonstrates the use of printf. Notice how the first argument to printf (the format specifier) has three insert variable here markers, so it is followed by the three variables to insert.

//tutorial/ten_pi.c

#include <stdio.h>

int main(){

double pi= 3.14159265; //POSIX defines the constant M_PI in math.h, by the way.

int count= 10;

printf("%g times %i = %g.\n", pi, count, pi*count);

}

This program outputs:

3.14159 times 10 = 31.4159.

There are three basic types that I use throughout the book: int, double, and char, which are short for integer, double-precision floating-point real number, and character.

There are bloggers who characterize the work of declaring a variable as a fate worse than death, but as in the example above, the only work required is often just putting a type name before the first use of the variable. And when reading unfamiliar code, having every variable’s type and having a marker for its first use are nice guideposts.

If you have multiple variables of the same type, you can even declare them all on one line, like replacing the above declaration with:

int count=10, count2, count3=30; //count2 is uninitialized.

Even functions have to be declared or defined

The definition of a function describes the full working of the function, like this trivial function:

int add_two_ints(int a, int b){

return a+b;

}

This function takes in two integers, which the function will refer to as a and b, and return a single integer, which is the sum of a and b.

We can also split off the declaration as its own statement, which gives the name, the input types (in parens) and the output type (in front):

int add_two_ints(int a, int b);

This doesn’t tell us what add_two_ints actually does, but it is sufficient for the compiler to consistency-check every use of the function, verifying that every use sends in two integers, and uses the result as an integer. As with all declarations, this might be in a code file as-is, or it might be in a header file inserted via a line like #include "mydeclarations.h".

A block is a unit of code to be treated as a unit, surrounded by curly braces. Thus, a function definition is a declaration immediately followed by a single block of code to be executed when the function runs.

If the full definition of the function is in your code before the use of the function, then the compiler has what it needs to do consistency checks, and you don’t need to bother with a separate declaration. Because of this, a lot of C code is written and read in a bottom-up style, with main as the last thing in the file, and above that the definition of functions called by main, and above those the definitions of functions called by those functions, and so on up to the headers at the top of the file declaring all the library functions used.

By the way, your functions can have void type, meaning that they return nothing. This is useful for functions that don’t output or change variables but have other effects. For example, here is a program largely consisting of a function to write error messages to a file (which will be created on your hard drive) in a fixed format, using the FILE type and related functions all declared in stdio.h. You’ll see why char* is the type that specifies a string of text below:

//tutorial/error_print.c

#include <stdio.h>

void error_print(FILE *ef, int error_code, char *msg){

fprintf(ef, "Error #%i occurred: %s.\n", error_code, msg);

}

int main(){

FILE *error_file = fopen("example_error_file", "w"); //open for writing

error_print(error_file, 37, "Out of karma");

}

Basic types can be aggregated into arrays and structs

How can one get any work done with only three basic types? By compounding them into arrays of homogeneous types, and structures of heterogeneous types.

An array is a list of identically typed elements. Here is a program that declares a list of 10 integers and a 20-character string, and uses part of both:

//tutorial/item_seven.c

#include <stdio.h>

int intlist[10];

int main(){

int len=20;

char string[len];

intlist[7] = 7;

snprintf(string, len, "Item seven is %i.", intlist[7]);

printf("string says: <<%s>>\n", string);

}

The snprintf function prints to a string whose maximum length you provide, using the same syntax that plain printf used to write to the screen. More on handling strings of characters, and why intlist could be declared outside of a function but string had to be declared inside one, below.

The index is an offset from the first element. The first element is zero steps from the head of the array, so it is intlist[0]; the last element of the 10-item array is intlist[9]. This is another cause of panic and flame wars, but it has its own sense.

You can find a zeroth symphony from various composers (Bruckner, Schnittke). But in most situations, we use counting words like first, second, seventh that clash with offset numbering: the seventh item in the array is intlist[6]. I try to stick with language like element 6 of the array.

For reasons that will become clear, the type of an array can also be written with a star, like:

int *intlist;

You saw an example above, where a sequence of characters was declared via char *msg.

New structure types can be defined

Heterogeneous types can be combined into a structured list (herein a struct) that can then be treated as a unit. Here is an example which declares and makes use of a ratio_s type, describing a fraction with a numerator, denominator, and decimal value. The type definition is basically a list of declarations inside curly braces.

When using the defined struct, you’ll see that there are a lot of dots: given a ratio_s struct r, r.numerator is the numerator element of that struct. The expression (double)den is a type cast, converting the integer den to a double (for reasons explained below). The means of setting up a new struct outside a declaration line looks like a type cast, with a type name in parens, followed by the dotted elements in curly braces. There are other more terse (i.e., less legible) ways to initialize a struct.

//tutorial/ratio_s.c

#include <stdio.h>

typedef struct {

int numerator, denominator;

double value;

} ratio_s;

ratio_s new_ratio(int num, int den){

return (ratio_s){.numerator=num, .denominator=den, .value=num/(double)den};

}

void print_ratio(ratio_s r){

printf("%i/%i = %g\n", r.numerator, r.denominator, r.value);

}

ratio_s ratio_add(ratio_s left, ratio_s right){

return (ratio_s){

.numerator=left.numerator*right.denominator

+ right.numerator*left.denominator,

.denominator=left.denominator * right.denominator,

.value=left.value + right.value

};

}

int main(){

ratio_s twothirds= new_ratio(2, 3);

ratio_s aquarter= new_ratio(1, 4);

print_ratio(twothirds);

print_ratio(aquarter);

print_ratio(ratio_add(twothirds, aquarter));

}

You can find out how much space a type takes

The sizeof operator can take a type name, and will tell you how much memory is required to write down an instance of that type. This is sometimes handy.

This short program compares the size of two ints and a double to the size of the ratio_s defined above. The %zu format specifier for printf exists solely for the type of output produced by sizeof.

//tutorial/sizeof.c

#include <stdio.h>

typedef struct {

int numerator, denominator;

double value;

} ratio_s;

int main(){

printf("size of two ints: %zu\n", 2*sizeof(int));

printf("size of two ints: %zu\n", sizeof(int[2]));

printf("size of a double: %zu\n", sizeof(double));

printf("size of a ratio_s struct: %zu\n", sizeof(ratio_s));

}

There is no special string type

Both the integer 5100 and the integer 51 take up sizeof(int) space. But "Hi" and "Hello" are strings of different numbers of characters. Scripting languages typically have a dedicated string type, which manages lists of an indeterminate number of characters for you. A string in C is an array of chars, pure and simple.

The end of a string is marked with a NUL character, '\0', though it is never printed and is usually taken care of for you. (Note that single characters are given single-ticks, like 'x', while strings are given double-ticks, like "xx" or the one-character string "x".) The functionstrlen(mystring) will count the number of characters up to (but not including) that NUL character. How much space was allocated for the string is another matter entirely: you could easily declare char pants[1000] = "trousers", though you are wasting 991 bytes after the NULcharacter.

Some things are surprisingly easy thanks to the array nature of strings. Given

char* str="Hello";

you can turn a Hello into Hell by inserting a NUL character:

str[4]='\0';

But most of what you want to do with a string involves calling a library function to do the byte-twiddling for you. Here are a few favorites:

#include <string.h>

char *str1 = "hello", str2[100];

strlen(str1); //get the length up to but excluding the ’\0’

strncpy(str2, 100, str1); //copy at most 100 bytes from str1 to str2

strncat(str2, 100, str1); //append at most 100 bytes from str1 onto str2

strcmp(str1, str2); //are str1 and str2 different? zero=no, nonzero=yes

snprintf(str2, 100, "str1 says: %s", str1); //write to a string, as above.

In Chapter 9, I discuss a few other functions for making life easier with strings, because with enough intelligent functions, string handling can be pleasant again.

Expressions

A program that does nothing but declare types, functions, and variables would just be a list of nouns, so it is time to move on to some verbs making use of our nouns. C’s mechanism for executing any sort of action is evaluation of an expression, and expressions are always grouped into functions.

The scoping rules for C are very simple

The scope of a variable is the range of the program over which it can be used.

If a variable is declared outside of a function, then it can be used by any expression from the declaration until the end of the file. Any function in that range can make use of that variable. Such variables are initialized at the start of the program and persist until the program terminates. They are referred to as static variables, perhaps because they sit in one place for the entire program.

If a variable is declared inside a block (including the block defining a function), then the variable is created at the declaration line and destroyed at the closing curly brace of the block.

See “Persistent State Variables” for further notes on static variables, including how we can have long-lived variables inside a function.

The main function is special

When a program runs, the first thing that happens is the setup of the file-global variables as above. No math happens yet, so they can be assigned either a given constant value (if declared like int gv=24;), or the default value of zero (if declared like int gv;).

Scripting languages usually allow some instructions to be inside functions, and some loose in the main body of the script. Any C expression that needs to be evaluated is in the body of a function, and the evaluations start with the main function. In the snprintf example above, the array with length len had to be inside of main, because getting the value of len is already too much math for the startup phase of the program.

Because the main function is effectively called by the operating system, its declaration must have one of two forms that the OS knows how to use: either

int main(void);

//which can be written as

int main();

or

int main(int, char**)

//where the two inputs are customarily named:

int main(int argc, char** argv)

You have already seen examples of the first version, where nothing comes in but a single integer comes out. That integer is generally treated as an error code, interpreted to indicate trouble if it is nonzero, and OK execution (reaching the end of main and exiting normally) if it is zero. This is such an ingrained custom that the C standard specifies that there is an implied return 0; at the end of main (see “Don’t Bother Explicitly Returning from main” for discussion). For a simple example of the second form, see Example 8-6.

Most of what a C program actually does is evaluate expressions

So the global variables have been set up, the operating system has prepared the inputs to main, and the program is starting to actually execute code in the main function block.

From here on out, everything will be the declaration of a local variable, flow control (branching on an if-else, looping through a while loop), or the evaluation of an expression.

To borrow from an earlier example, consider what the system has to do to evaluate this sequence:

int a1, two_times;

a1 = (2+3)*7;

two_times = add_two_ints(a1, a1);

After the declarations, the line a1=(2+3)*7 requires first evaluating the expression (2+3), which can be replaced with 5, then evaluating the expression 5*7, which can be replaced with 35. This is exactly how we humans do it when facing an expression like this, but C carries this evaluate-and-substitute principle further.

In the evaluation of the expression a1=35, two things occur. The first is the replacement of the expression with its value: 35. The second is a side effect that a state has changed: the value of the variable a1 is changed to 35. There are languages that strive to be more pure in evaluation, but C allows evaluations to have side effects that change state. You saw another example several times above: in the evaluation of printf("hello\n"), the expression is replaced by a zero on success, but the evaluation is useful for the side effect of changing the state of the screen.

After all those substitutions, we’d be left with only 35; on the line. With nothing left to evaluate, the system moves on to the next line.

Functions are evaluated using copies of the inputs

That line of the above snippet, two_times = add_two_ints(a1, a1) first requires evaluating a1 twice, then evaluating add_two_ints with the evaluated inputs, 35 and 35. So a copy of the value of a1 is handed to the function, not a1 itself. That means that the function has no way to modify the value of a1 itself. If you have function code that looks like it is modyfing an input, it is really modifying a copy of the input’s value. A workaround for when we want to modify the variables sent to a function call will be presented below.

Expressions are delimited by semicolons

Yes, C uses semicolons to delimit expressions. This is a contentious stylistic choice, but it does allow you to put newlines, extra spaces, and tabs anywhere they would improve readability.

There are many shortcuts for incrementing or scaling a variable

C has a few pleasant shorthand expressions for arithmetic to modify a variable. We can shorten x=x+3 to x+=3 and x=x/3 to x/=3, respectively. Incrementing a variable by one is so common that there are two ways of doing it. Both x++ and ++x have the side effect of incrementing x, but the evaluation of x++ replaces the expression with the preincrement value of x, while the evaluation of ++x replaces the expression with the postincrement value of x+1.

x++; //increment x. Evaluates to x.

++x; //increment x. Evaluates to x+1.

x--; //decrement x. Evaluates to x.

--x; //decrement x. Evaluates to x-1.

x+=3; //add three to x.

x-=7; //subtract seven from x.

x*=2; //multiply x by two.

x/=2; //divide x by two.

x%=2; //replace x with modulo

C has an expansive definition of truth

We will sometimes need to know whether an expression is true or false, such as deciding which branch to choose in an if-else construction. There are no true and false keywords in C, though they are commonly defined as in “True and False”. Instead, if the expression is zero (or the NULcharacter ’\0’, or a NULL pointer), then the expression is taken to be false; if it is anything else at all, it is taken to be true.

Conversely, all of these expressions evaluate to either zero or one:

!x //not x

x==y //x equals y

x != y //x is not equal to y

x < y //x is less than y

x <= y //x is less than or equal to y

x || y //x or y

x && y //x and y

x > y || y >= z //x is greater than y or y is greater than or equal to z

For example, if x is any nonzero value, then !x evaluates to zero, and !!x evaluates to one.

The && and || are lazy, and will evaluate only as much of the expression as is necessary to establish the truth or falsehood of the whole. For example, consider the expression (a < 0 || sqrt(a) < 10). The square root of an int or double -1 is an error (but see “_Generic” for discussion of C support of imaginary numbers). But if a==-1, then we know that (a < 0 || sqrt(a) < 10) evaluates to true without even looking at the second half of the expression. So sqrt(a) < 10 is left ignored and unevaluated, and disaster is averted.

Dividing two integers always produces an integer

Many authors prefer to avoid floating-point real numbers to the greatest extent possible, because integers are processed faster and without roundoff errors. C facilitates this by having three distinct operators: real division, integer division, and modulo. The first two happen to look identical.

//tutorial/divisions.c

#include <stdio.h>

int main(){

printf("13./5=%g\n", 13./5);

printf("13/5=%i\n", 13/5);

printf("13%%5=%i\n", 13%5);

}

Here’s the output:

13./5=2.6

13/5=2

13%5=3

The expression 3. is a floating-point real number, and if there is a real number in the numerator or denominator, then floating-point division happens, producing a floating-point result. If both numerator and denominator are integers, then the result is the integer you would get from doing the division with real numbers and then rounding toward zero to an integer. The modulo operator, %, gives the remainder.

The difference between floating-point and integer division is why the new_ratio example above typecast the denominator via num/(double)den. For further discussion, see “Cast Less”.

C has a trinary conditional operator

The expression

x ? a : b

evaluates to a if x is true, and to b if x is false.

I used to think this was illegible, and few scripting languages have such an operator, but it has grown on me for its great utility. Being just another expression, we can put it anywhere; for example:

//tutorial/sqrt.c

#include <math.h> //The square root function is declared here.

#include <stdio.h>

int main(){

double x = 49;

printf("The truncated square root of x is %g.\n",

x > 0 ? sqrt(x) : 0);

}

The trinary conditional operator has the same short-circuit behavior as && and || above: if x<=0, then sqrt(x) is never evaluated.

Branching and looping expressions are not very different from any other language

Probably the only unique point about if-else statements in C is that there is no then keyword. Parens mark the condition to be evaluated, and then the following expression or block is run through if the condition is true. A few sample uses:

//tutorial/if_else.c

#include <stdio.h>

int main(){

if (6 == 9)

printf("Six is nine.\n");

int x=3;

if (x==1)

printf("I found x; it is one.\n");

else if (x==2)

printf("x is definitely two.\n");

else

printf("x is neither one nor two.\n");

}

The while loop repeats a block until the given condition is false. For example, this program regreets the user 10 times:

//tutorial/while.c

#include <stdio.h>

int main(){

int i=0;

while (i < 10){

printf("Hello #%i\n", i);

i++;

}

}

If the controlling condition in parens after the while keyword is false on the first try, then the body of the while loop will be skipped entirely. But the do-while loop is guaranteed to run at least once:

//tutorial/do_while.c

#include <stdio.h>

void loops(int max){

int i=0;

do {

printf("Hello #%i\n", i);

i++;

} while (i < max); //Note the semicolon.

}

int main(){

loops(3); //prints three greetings

loops(0); //prints one greeting

}

The for loop is just a compact version of the while loop

Traffic control for the while loop had three parts:

§ The initializer (int i=0);

§ The test condition (i < 10);

§ The stepper (i++).

The for loop encapsulates all of these into one place. This for loop is otherwise equivalent to the while loop above:

//tutorial/for_loop.c

#include <stdio.h>

int main(){

for (int i=0; i < 10; i++){

printf("Hello #%i\n", i);

}

}

Because this block is one line, even the curly braces are optional, and we could get away with:

//tutorial/for_loop2.c

#include <stdio.h>

int main(){

for (int i=0; i < 10; i++) printf("Hello #%i\n", i);

}

People often worry about fencepost errors, wherein they want 10 steps and get 9 or 11. The form above (start at i=0, test i< 10) correctly counts 10 steps, and is the standard boilerplate for stepping through an array. For example:

int len=10;

double array[len];

for (int i=0; i< len; i++) array[i] = 1./(i+1);

There is no additional special syntax for counting through a sequence or applying an operation to every element of an array (though such syntax would be easy to write via macros or functions), which means that you’ll be seeing this sort of (int i=0; i< len; i++) boilerplate a lot.

On the other hand, this form is easy to modify for different situations. If you need to step by two, you want for (int i=0; i< len; i+=2). If you need to step until you hit a zero array element, you want for (int i=0; array[i]!=0; i++). You can leave any of the elements blank, so if you are not initializing a new variable, you might wind up with something like for ( ; array[i]!=0; i++).

Pointers

Pointers to variables are sometimes called aliases, references, or labels (though C has unrelated things called labels, which are rarely used; I discuss them in “Labels, gotos, switches, and breaks”).

A pointer or alias to a double does not itself hold a double, but it points to some location that does. Now you have two names for the same thing. If the thing is changed, then both versions see the change. This is in contrast to a full copy of a thing, where a change to the original does not affect the copy.

You can directly request a block of memory

The malloc function allocates memory for use by the program. For example, we might allocate enough space for 3,000 integers via:

malloc(3000*sizeof(int));

This is the first mention of memory allocation in this tutorial because the declarations above, like int list[100], auto-allocate memory. When the scope in which the declaration was made comes to a close, auto-allocated memory is auto-deallocated. Conversely, memory you manually allocated via malloc exists until you manually free it (or the end of the program). This sort of longevity is sometimes desirable. Also, an array cannot be resized after it is initialized, whereas manually allocated memory can be. Other differences between manually and automatically allocated memory are discussed in “Automatic, Static, and Manual Memory”.

Now that we’ve allocated this space, how do we refer to it? This is where pointers come in, because we can assign an alias to the malloced space:

int *intspace = malloc(3000*sizeof(int));

The star on the declaration (int *) indicates that we are declaring a pointer to a location.

Memory is a finite resource, so indiscriminate use will eventually cause the sort of out-of-memory errors that have bothered us all at one time or another. Free memory back to the system via the free function; e.g., free(intspace). Or just wait until the end of the program, when the operating system deallocates all memory used by your program for you.

Arrays are just blocks of memory; any block of memory can be used like an array

In Chapter 6, I discuss exactly how arrays and pointers are and are not identical, but they certainly have a lot in common.

In memory, an array is a contiguous span set aside for one data type. If you request element 6 of an array declared as int list[100], the system would start at wherever the list is located, then step 6*sizeof(int) bytes down.

So the square-bracket notation like list[6] is really just a notation about offsetting from the position pointed to by the named variable, and this happens to be the operation we need to work with an array. If we have a pointer to any contiguous span of memory, the same operations of finding the location and stepping forward could be done with the pointer.

Here is an example that fills a manually allocated array and then prints it to a file. This example could more easily be done using an automatically allocated array, but for demonstration purposes, here it is:

//tutorial/manual_memory.c

#include <stdlib.h> //malloc and free

#include <stdio.h>

int main(){

int *intspace = malloc(3000*sizeof(int));

for (int i=0; i < 3000; i++)

intspace[i] = i;

FILE *cf = fopen("counter_file", "w");

for (int i=0; i < 3000; i++)

fprintf(cf, "%i\n", intspace[i]);

free(intspace);

fclose(cf);

}

Memory reserved via malloc can be reliably used by the program, but it is not initialized and so may contain any sort of unknown junk. Allocate and clear to all zeros with:

int *intspace = calloc(3000, sizeof(int));

Notice that this takes two numbers as input, while malloc takes one.

A pointer to a scalar is really just a one-item array

Say that we have a pointer named i to a single integer. It is an array of length 1, and if you request i[0], finding the location pointed to by i and stepping forward 0 steps works exactly as it did for longer arrays.

But we humans don’t really think of single values as arrays of length 1, so there is a notational convenience for the common case of a one-item array: outside of a declaration line, i[0] and *i are equivalent. This can be confusing, because on the declaration line, the star seems to mean something different. There are rationales for why this makes sense (see “The Fault Is in Our Stars”), but for now remember that a star on a declaration line indicates a new pointer; a star on any other line indicates the value being pointed to.

Here is a block of code that sets the first value of the list array to 7. The last line checks this, and halts the program with an error if I’m wrong.

//tutorial/assert.c

#include <assert.h>

int main(){

int list[100];

int *list2 = list; //Declares list2 as a pointer-to-int,

//pointing to the same block of memory list points to.

*list2 = 7; //list2 is a pointer-to-int, so *list2 is an int.

assert(list[0] == 7);

}

There is a special notation for elements of pointed-to structs

Given the declaration

ratio_s *pr;

we know that pr is a pointer to a ratio_s, not a ratio_s itself. The size of pr in memory is exactly as much as is required to hold a single pointer, not a full ratio_s structure.

One could get the numerator at the pointed-to struct via (*pr).numerator, because (*pr) is just a plain ratio_s, and the dot notation gets a subelement. There is an arrow notation that saves the trouble of the parens-and-star combination. For example:

ratio_s *pr = malloc(sizeof(ratio_s));

pr->numerator = 3;

The two forms pr->numerator and (*pr).numerator are exactly identical, but the first is generally preferred as more legible.

Pointers let you modify function inputs

Recall that copies of input variables are sent to a function, not the variables themselves. When the function exits, the copies are destroyed, and the original function inputs are entirely unmodified.

Now say that a pointer is sent in to a function. The copy of a pointer refers to the same space that the original pointer refers to. Here is a simple program using this strategy to modify what the input refers to:

//tutorial/pointer_in.c

#include <stdlib.h>

#include <stdio.h>

void double_in(int *in){

*in *= 2;

}

int main(){

int x[1]; // declare a one-item array, for demonstration purposes

*x= 10;

double_in(x);

printf("x now points to %i.\n", *x);

}

The double_in function doesn’t change in, but it does double the value pointed to by in, *in. Therefore, the value x points to has been doubled by the double_in function.

This workaround is common, so you will find many functions that take in a pointer, not a plain value. But sometimes you will want to use those functions to operate on a plain value. In these cases, you can use the ampersand (&) to get the address of the variable. That is, if x is a variable, &xis a pointer to that variable. This simplifies the above sample code:

//tutorial/address_in.c

#include <stdlib.h>

#include <stdio.h>

void double_in(int *in){

*in *= 2;

}

int main(){

int x= 10;

double_in(&x);

printf("x is now %i.\n", x);

}

Everything is somewhere, so everything can be pointed to

You can’t send a function to another function, and you can’t have arrays of functions. But you can send a pointer to a function to a function, and you can have arrays of pointers to functions. I won’t go into details of the syntax here, but see “Typedef as a teaching tool”.

Functions that don’t really care what data is present, but only handle pointers to data, are surprisingly common. For example, a function that builds a linked list doesn’t care what data it is linking together, only where it is located. To give another example, we can pass pointers to functions, so you could have a function whose sole purpose is to run other functions, and the inputs to those called functions can be pointed to without regard to their content. In these cases, C provides an out from the type system, the void pointer. Given the declaration

void *x;

the pointer x can be pointing to a function, a struct, an integer, or anything else. See “The Void Pointer and the Structures It Points To” for examples of how void pointers can be used for all sorts of purposes.