The C++ Programming Language (2013)

Part II: Basic Facilities

9. Statements

A programmer is a machine for turning caffeine into code.

– A programmer

• Introduction

• Statement Summary

• Declarations as Statements

• Selection Statements

if Statements; switch Statements; Declarations in Conditions

• Iteration Statements

Range-for Statements; for Statements; while Statements; do Statements; Loop exit

• goto Statements

• Comments and Indentation

• Advice

9.1. Introduction

C++ offers a conventional and flexible set of statements. Basically all that is either interesting or complicated is found in expressions and declarations. Note that a declaration is a statement and that an expression becomes a statement when you add a semicolon at its end.

Unlike an expression, a statement does not have a value. Instead, statements are used to specify the order of execution. For example:

a = b+c; // expression statement
if (a==7) // if-statement
b = 9; // execute if and only if a==9

Logically, a=b+c is executed before the if, as everyone would expect. A compiler may reorder code to improve performance as long as the result is identical to that of the simple order of execution.

9.2. Statement Summary

Here is a summary of C++ statements:

statement:
declaration
expression_opt ;
{ statement-list_opt }
try { statement-list_opt } handler-list

case constant-expression : statement
default : statement
break ;
continue ;

return expression_opt ;
goto identifier ;
identifier : statement

selection-statement
iteration-statement

selection-statement:
if ( condition ) statement
if ( condition ) statement else statement
switch ( condition ) statement

iteration-statement:
while ( condition ) statement
do statement while ( expression ) ;
for ( for-init-statement condition_opt ; expression_opt ) statement
for ( for-init-declaration : expression ) statement

statement-list:
statement statement-list_opt

condition:
expression
type-specifier declarator = expression
type-specifier declarator { expression }

handler-list:
handler handler-list_opt

handler:
catch ( exception-declaration ) { statement-list_opt}

A semicolon is by itself a statement, the empty statement.

A (possibly empty) sequence of statements within “curly braces” (i.e., { and }) is called a block or a compound statement. A name declared in a block goes out of scope at the end of its block (§6.3.4).

A declaration is a statement and there is no assignment statement or procedure-call statement; assignments and function calls are expressions.

A for-init-statement must be either a declaration or an expression-statement. Note that both end with a semicolon.

A for-init-declaration must be the declaration of a single uninitialized variable. The statements for handling exceptions, try-blocks, are described in §13.5.

9.3. Declarations as Statements

A declaration is a statement. Unless a variable is declared static, its initializer is executed whenever the thread of control passes through the declaration (see also §6.4.2). The reason for allowing declarations wherever a statement can be used (and a few other places; §9.4.3, §9.5.2) is to enable the programmer to minimize the errors caused by uninitialized variables and to allow better locality in code. There is rarely a reason to introduce a variable before there is a value for it to hold. For example:

void f(vector<string>& v, int i, const char* p)
{
if (p==nullptr) return;
if (i<0 || v.size()<=i)
error("bad index");
string s = v[i];
if (s == p) {
// ...
}
// ...
}

The ability to place declarations after executable code is essential for many constants and for single-assignment styles of programming where a value of an object is not changed after initialization. For user-defined types, postponing the definition of a variable until a suitable initializer is available can also lead to better performance. For example:

void use()
{
string s1;
s1 = "The best is the enemy of the good.";
// ...
}

This requests a default initialization (to the empty string) followed by an assignment. This can be slower than a simple initialization to the desired value:

string s2 {"Voltaire"};

The most common reason to declare a variable without an initializer is that it requires a statement to give it its desired value. Input variables are among the few reasonable examples of that:

void input()
{
int buf[max];
int count = 0;
for (int i; cin>>i;) {
if (i<0) error("unexpected negative value");
if (count==max) error("buffer overflow");
buf[count++] = i;
}
// ...
}

I assume that error() does not return; if it does, this code may cause a buffer overflow. Often, push_back() (§3.2.1.3, §13.6, §31.3.6) provides a better solution to such examples.

9.4. Selection Statements

A value can be tested by either an if-statement or a switch-statement:

if ( condition ) statement
if ( condition ) statement else statement
switch ( condition ) statement

A condition is either an expression or a declaration (§9.4.3).

9.4.1. if Statements

In an if-statement, the first (or only) statement is executed if the condition is true and the second statement (if it is specified) is executed otherwise. If a condition evaluates to something different from a Boolean, it is – if possible – implicitly converted to a bool. This implies that any arithmetic or pointer expression can be used as a condition. For example, if x is an integer, then

if (x) // ...

means

if (x != 0) // ...

For a pointer p,

if (p) // ...

is a direct statement of the test “Does p point to a valid object (assuming proper initialization)?” and is equivalent to

if (p != nullptr) // ...

Note that a “plain” enum can be implicitly converted to an integer and then to a bool, whereas an enum class cannot (§8.4.1). For example:

enum E1 { a, b };
enum class E2 { a, b };

void f(E1 x, E2 y)
{
if (x) // OK
// ...
if (y) // error: no conversion to bool
// ...
if (y==E2::a) // OK
// ...
}

The logical operators

&& || !

are most commonly used in conditions. The operators && and || will not evaluate their second argument unless doing so is necessary. For example,

if (p && 1<p–>count) // ...

This tests 1<p–>count only if p is not nullptr.

For choosing between two alternatives each of which produces a value, a conditional expression (§11.1.3) is a more direct expression of intent than an if-statement. For example:

int max(int a, int b)
{
return (a>b)?a:b; // return the larger of a and b
}

A name can only be used within the scope in which it is declared. In particular, it cannot be used on another branch of an if-statement. For example:

void f2(int i)
{
if (i) {
int x = i+2;
++x;
// ...
}
else {
++x; // error: x is not in scope
}
++x; // error: x is not in scope
}

A branch of an if-statement cannot be just a declaration. If we need to introduce a name in a branch, it must be enclosed in a block (§9.2). For example:

void f1(int i)
{
if (i)
int x = i+2; // error: declaration of if-statement branch
}

9.4.2. switch Statements

A switch-statement selects among a set of alternatives (case-labels). The expression in the case labels must be a constant expression of integral or enumeration type. A value may not be used more than once for case-labels in a switch-statement. For example:

void f(int i)
{
switch (i) {
case 2.7: // error: floating point uses for case
// ...
case 2:
// ...
case 4–2: // error: 2 used twice in case labels
// ...
};

A switch-statement can alternatively be written as a set of if-statements. For example:

switch (val) {
case 1:
f();
break;
case 2:
g();
break;
default:
h();
break;
}

This could be expressed as:

if (val == 1)
f();
else if (val == 2)
g();
else
h();

The meaning is the same, but the first (switch) version is preferred because the nature of the operation (testing a single value against a set of constants) is explicit. This makes the switch-statement easier to read for nontrivial examples. It typically also leads to the generation of better code because there is no reason to repeatedly check individual values. Instead, a jump table can be used.

Beware that a case of a switch must be terminated somehow unless you want to carry on executing the next case. Consider:

switch (val) { // beware
case 1:
cout << "case 1\n";
case 2:
cout << "case 2\n";
default:
cout << "default: case not found\n";
}

Invoked with val==1, the output will greatly surprise the uninitiated:

case 1
case 2
default: case not found

It is a good idea to comment the (rare) cases in which a fall-through is intentional so that an uncommented fall-through can be assumed to be an error. For example:

switch (action) { // handle (action,value) pair
case do_and_print:
act(value);
// no break: fall through to print
case print:
print(value);
break;
// ...
}

A break is the most common way of terminating a case, but a return is often useful (§10.2.1).

When should a switch-statement have a default? There is no single answer that covers all situations. One use is for the default to handle the most common case. Another common use is the exact opposite: the default: action is simply a way to catch errors; every valid alternative is covered by the cases. However, there is one case where a default should not be used: if a switch is intended to have one case for each enumerator of an enumeration. If so, leaving out the default gives the compiler a chance to warn against a set of cases that almost but not quite match the set of enumerators. For example, this is almost certainly an error:

enum class Vessel { cup, glass, goblet, chalice };

void problematic(Vessel v)
{
switch (v) {
case Vessel::cup: /* ... */ break;
case Vessel::glass: /* ... */ break;
case Vessel::goblet: /* ... */ break;
}
}

Such a mistake can easily occur when a new enumerator is added during maintenance.

Testing for an “impossible” enumerator value is best done separately.

9.4.2.1. Declarations in Cases

It is possible, and common, to declare variables within the block of a switch-statement. However, it is not possible to bypass an initialization. For example:

void f(int i)
{
switch (i) {
case 0:
int x; // uninitialized
int y = 3; // error: declaration can be bypassed (explicitly initialized)
string s; // error: declaration can be bypassed (implicitly initialized)
case 1:
++x; // error: use of uninitialized object
++y;
s = "nasty!";
}
}

Here, if i==1, the thread of execution would bypass the initializations of y and s, so f() will not compile. Unfortunately, because an int needn’t be initialized, the declaration of x is not an error. However, its use is an error: we read an uninitialized variable. Unfortunately, compilers often give just a warning for the use of an uninitialized variable and cannot reliably catch all such misuses. As usual, avoid uninitialized variables (§6.3.5.1).

If we need a variable within a switch-statement, we can limit its scope by enclosing its declaration and its use in a block. For an example, see prim() in §10.2.1.

9.4.3. Declarations in Conditions

To avoid accidental misuse of a variable, it is usually a good idea to introduce the variable into the smallest scope possible. In particular, it is usually best to delay the definition of a local variable until one can give it an initial value. That way, one cannot get into trouble by using the variable before its initial value is assigned.

One of the most elegant applications of these two principles is to declare a variable in a condition. Consider:

if (double d = prim(true)) {
left /= d;
break;
}

Here, d is declared and initialized and the value of d after initialization is tested as the value of the condition. The scope of d extends from its point of declaration to the end of the statement that the condition controls. For example, had there been an else-branch to the if-statement, d would be in scope on both branches.

The obvious and traditional alternative is to declare d before the condition. However, this opens the scope (literally) for the use of d before its initialization or after its intended useful life:

double d;
// ...
d2 = d; // oops!
// ...
if (d = prim(true)) {
left /= d;
break;
}
// ...
d = 2.0; // two unrelated uses of d

In addition to the logical benefits of declaring variables in conditions, doing so also yields the most compact source code.

A declaration in a condition must declare and initialize a single variable or const.

9.5. Iteration Statements

A loop can be expressed as a for-, while-, or do-statement:

while ( condition ) statement
do statement while ( expression ) ;
for ( for-init-statement condition_opt ; expression_opt ) statement
for ( for-declaration : expression ) statement

A for-init-statement must be either a declaration or an expression-statement. Note that both end with a semicolon.

The statement of a for-statement (called the controlled statement or the loop body) is executed repeatedly until the condition becomes false or the programmer breaks out of the loop some other way (such as a break, a return, a throw, or a goto).

More complicated loops can be expressed as an algorithm plus a lambda expression (§11.4.2).

9.5.1. Range-for Statements

The simplest loop is a range-for-statement; it simply gives the programmer access to each element of a range. For example:

int sum(vector<int>& v)
{
int s = 0;
for (int x : v)
s+=x;
return s;
}

The for (int x : v) can be read as “for each element x in the range v” or just “for each x in v.” The elements of v are visited in order from the first to the last.

The scope of the variable naming the element (here, x) is the for-statement.

The expression after the colon must denote a sequence (a range); that is, it must yield a value for which we can call v.begin() and v.end() or begin(v) and end(v) to obtain an iterators (§4.5):

[1] the compiler first looks for members begin and end and tries to use those. If a begin or an end is found that cannot be used as a range (e.g., because a member begin is a variable rather than a function), the range-for is an error.

[2] Otherwise, the compiler looks for a begin/end member pair in the enclosing scope. If none is found or if what is found cannot be used (e.g., because the begin did not take an argument of the sequence’s type), the range-for is an error.

The compiler uses v and v+N as begin(v) and end(v) for a built-in array T v[N]. The <iterator> header provides begin(c) and end(c) for built-in arrays and for all standard-library containers. For sequences of our own design, we can define begin() and end() in the same way as it is done for standard-library containers (§4.4.5).

The controlled variable, x in the example, that refers to the current element is equivalent to *p when using an equivalent for-statement:

int sum2(vector<int>& v)
{
int s = 0;
for (auto p = begin(v); p!=end(v); ++p)
s+=*p;
return s;
}

If you need to modify an element in a range-for loop, the element variable should be a reference. For example, we can increment each element of a vector like this:

void incr(vector<int>& v)
{
for (int& x : v)
++x;
}

References are also appropriate for elements that might be large, so that copying them to the element value could be costly. For example:

template<class T> T accum(vector<T>& v)
{
T sum = 0;
for (const T& x : v)
sum += x;
return sum;
}

Note that a range-for loop is a deliberately simple construct. For example, using it you can’t touch two elements at the same time and can’t effectively traverse two ranges simultaneously. For that we need a general for-statement.

9.5.2. for Statements

There is also a more general for-statement allowing greater control of the iteration. The loop variable, the termination condition, and the expression that updates the loop variable are explicitly presented “up front” on a single line. For example:

void f(int v[], int max)
{
for (int i = 0; i!=max; ++i)
v[i] = i*i;
}

This is equivalent to

void f(int v[], int max)
{
int i = 0; // introduce loop variable
while (i!=max) { // test termination condition
v[i] = i*i; // execute the loop body
++i; // increment loop variable
}
}

A variable can be declared in the initializer part of a for-statement. If that initializer is a declaration, the variable (or variables) it introduced is in scope until the end of the for-statement.

It is not always obvious what is the right type to use for a controlled variable in a for loop, so auto often comes in handy:

for (auto p = begin(c); c!=end(c); ++p) {
// ... use iterator p for elements in container c ...
}

If the final value of an index needs to be known after exit from a for-loop, the index variable must be declared outside the for-loop (e.g., see §9.6).

If no initialization is needed, the initializing statement can be empty.

If the expression that is supposed to increment the loop variable is omitted, we must update some form of loop variable elsewhere, typically in the body of the loop. If the loop isn’t of the simple “introduce a loop variable, test the condition, update the loop variable” variety, it is often better expressed as a while-statement. However, consider this elegant variant:

for (string s; cin>>s;)
v.push_back(s);

Here, the reading and testing for termination and combined in cin>>s, so we don’t need an explicit loop variable. On the other hand, the use of for, rather than while, allows us to limit the scope of the “current element,” s, to the loop itself (the for-statement).

A for-statement is also useful for expressing a loop without an explicit termination condition:

for (;;) { // "forever"
// ...
}

However, many consider this idiom obscure and prefer to use:

while(true) { // "forever"
// ...
}

9.5.3. while Statements

A while-statement executes its controlled statement until its condition becomes false. For example:

template<class Iter, class Value>
Iter find(Iter first, Iter last, Value val)
{
while (first!=last && *first!=val)
++first;
return first;
}

I tend to prefer while-statements over for-statements when there isn’t an obvious loop variable or where the update of a loop variable naturally comes in the middle of the loop body.

A for-statement (§9.5.2) is easily rewritten into an equivalent while-statement and vice versa.

9.5.4. do Statements

A do-statement is similar to a while-statement except that the condition comes after the body. For example:

void print_backwards(char a[], int i) // i must be positive
{
cout << '{';
do {
cout << a[––i];
} while (i);
cout << '}';
}

This might be called like this: print_backwards(s,strlen(s)); but it is all too easy to make a horrible mistake. For example, what if s was the empty string?

In my experience, the do-statement is a source of errors and confusion. The reason is that its body is always executed once before the condition is evaluated. However, for the body to work correctly, something very much like the condition must hold even the first time through. More often than I would have guessed, I have found that condition not to hold as expected either when the program was first written and tested or later after the code preceding it has been modified. I also prefer the condition “up front where I can see it.” Consequently, I recommend avoiding do-statements.

9.5.5. Loop Exit

If the condition of an iteration statement (a for-, while-, or do-statement) is omitted, the loop will not terminate unless the user explicitly exits it by a break, return (§12.1.4), goto (§9.6), throw (§13.5), or some less obvious way such as a call of exit() (§15.4.3). A break “breaks out of” thenearest enclosing switch-statement (§9.4.2) or iteration-statement. For example:

void f(vector<string>& v, string terminator)
{
char c;
string s;
while (cin>>c) {
// ...
if (c == '\n') break;
// ...
}
}

We use a break when we need to leave the loop body “in the middle.” Unless it warps the logic of a loop (e.g., requires the introduction of an extra varible), it is usually better to have the complete exit condition as the condition of a while-statement or a for-statement.

Sometimes, we don’t want to exit the loop completely, we just want to get to the end of the loop body. A continue skips the rest of the body of an iteration-statement. For example:

void find_prime(vector<string>& v)
{
for (int i = 0; i!=v.size(); ++i) {
if (!prime(v[i]) continue;
return v[i];
}
}

After a continue, the increment part of the loop (if any) is executed, followed by the loop condition (if any). So find_prime() could equivalently have been written as:

void find_prime(vector<string>& v)
{
for (int i = 0; i!=v.size(); ++i) {
if (!prime(v[i]) {
return v[i];
}
}
}

9.6. goto Statements

C++ possesses the infamous goto:

goto identifier ;
identifier : statement

The goto has few uses in general high-level programming, but it can be very useful when C++ code is generated by a program rather than written directly by a person; for example, gotos can be used in a parser generated from a grammar by a parser generator.

The scope of a label is the function it is in (§6.3.4). This implies that you can use goto to jump both into and out of blocks. The only restriction is that you cannot jump past an initializer or into an exception handler (§13.5).

One of the few sensible uses of goto in ordinary code is to break out from a nested loop or switch-statement (a break breaks out of only the innermost enclosing loop or switch-statement). For example:

void do_something(int i, int j)
// do something to a two-dimensional matrix called mn
{
for (i = 0; i!=n; ++i)
for (j = 0; j!=m; ++j)
if (nm[i][j] == a)
goto found;
// not found
// ...
found:
// nm[i][j] == a
}

Note that this goto just jumps forward to exit its loop. It does not introduce a new loop or enter a new scope. That makes it the least troublesome and least confusing use of a goto.

9.7. Comments and Indentation

Judicious use of comments and consistent use of indentation can make the task of reading and understanding a program much more pleasant. Several different consistent styles of indentation are in use. I see no fundamental reason to prefer one over another (although, like most programmers, I have my preferences, and this book reflects them). The same applies to styles of comments.

Comments can be misused in ways that seriously affect the readability of a program. The compiler does not understand the contents of a comment, so it has no way of ensuring that a comment

• is meaningful,

• describes the program, and

• is up to date.

Most programs contain comments that are incomprehensible, ambiguous, and just plain wrong. Bad comments can be worse than no comments.

If something can be stated in the language itself, it should be, and not just mentioned in a comment. This remark is aimed at comments such as these:

// variable "v" must be initialized

// variable "v" must be used only by function "f()"

// call function "init()" before calling any other function in this file

// call function "cleanup()" at the end of your program

// don't use function "weird()"

// function "f(int ...)" takes two or three arguments

Such comments can typically be rendered unnecessary by proper use of C++.

Once something has been stated clearly in the language, it should not be mentioned a second time in a comment. For example:

a = b+c; // a becomes b+c
count++; // increment the counter

Such comments are worse than simply redundant. They increase the amount of text the reader has to look at, they often obscure the structure of the program, and they may be wrong. Note, however, that such comments are used extensively for teaching purposes in programming language textbooks such as this. This is one of the many ways a program in a textbook differs from a real program.

A good comment states what a piece of code is supposed to do (the intent of the code), whereas the code (only) states what it does (in terms of how it does it). Preferably, a comment is expressed at a suitably high level of abstraction so that it is easy for a human to understand without delving into minute details.

My preference is for:

• A comment for each source file stating what the declarations in it have in common, references to manuals, the name of the programmer, general hints for maintenance, etc.

• A comment for each class, template, and namespace

• A comment for each nontrivial function stating its purpose, the algorithm used (unless it is obvious), and maybe something about the assumptions it makes about its environment

• A comment for each global and namespace variable and constant

• A few comments where the code is nonobvious and/or nonportable

• Very little else

For example:

// tbl.c: Implementation of the symbol table.

/*
Gaussian elimination with partial pivoting.
See Ralston: "A first course ..." pg 411.
*/

// scan(p,n,c) requires that p points to an array of at least n elements

// sort(p,q) sorts the elements of the sequence [p:q) using < for comparison.

// Revised to handle invalid dates. Bjarne Stroustrup, Feb 29 2013

A well-chosen and well-written set of comments is an essential part of a good program. Writing good comments can be as difficult as writing the program itself. It is an art well worth cultivating.

Note that /* */ style comments do not nest. For example:

/*
remove expensive check
if (check(p,q)) error("bad p q") /* should never happen */
*/

This nesting should give an error for an unmatched final */.

9.8. Advice

[1] Don’t declare a variable until you have a value to initialize it with; §9.3, §9.4.3, §9.5.2.

[2] Prefer a switch-statement to an if-statement when there is a choice; §9.4.2.

[3] Prefer a range-for-statement to a for-statement when there is a choice; §9.5.1.

[4] Prefer a for-statement to a while-statement when there is an obvious loop variable; §9.5.2.

[5] Prefer a while-statement to a for-statement when there is no obvious loop variable; §9.5.3.

[6] Avoid do-statements; §9.5.

[7] Avoid goto; §9.6.

[8] Keep comments crisp; §9.7.

[9] Don’t say in comments what can be clearly stated in code; §9.7.

[10] State intent in comments; §9.7.

[11] Maintain a consistent indentation style; §9.7.