The C++ Programming Language (2013)

Part II: Basic Facilities

This part describes C++’s built-in types and the basic facilities for constructing programs out of them. The C subset of C++ is presented together with C++’s additional support for traditional styles of programming. It also discusses the basic facilities for composing a C++ program out of logical and physical parts.

Chapters

6 Types and Declarations

7 Pointers, Arrays, and References

8 Structures, Unions, and Enumerations

9 Statements

10 Expressions

11 Select Operations

12 Functions

13 Exception Handling

14 Namespaces

15 Source Files and Programs

“... I have long entertained a suspicion, with regard to the decisions of philosophers upon all subjects, and found in myself a greater inclination to dispute, than assent to their conclusions. There is one mistake, to which they seem liable, almost without exception; they confine too much their principles, and make no account of that vast variety, which nature has so much affected in all her operations. When a philosopher has once laid hold of a favourite principle, which perhaps accounts for many natural effects, he extends the same principle over the whole creation, and reduces to it every phænomenon, though by the most violent and absurd reasoning. ...”

- David Hume,
Essays, Moral, Political, and Literary. PART I. (1752)

6. Types and Declarations

Perfection is achieved only on the point of collapse.

- C. N. Parkinson

• The ISO C++ Standard

Implementations; The Basic Source Character Set

• Types

Fundamental Types; Booleans; Character Types; Integer Types; Floating-Point Types; Prefixes and Suffixes; void; Sizes; Alignment

• Declarations

The Structure of Declarations; Declaring Multiple Names; Names; Scope; Initialization; Deducing a Type: auto and decltype()

• Objects and Values

Lvalues and Rvalues; Lifetimes of Objects

• Type Aliases

• Advice

6.1. The ISO C++ Standard

The C++ language and standard library are defined by their ISO standard: ISO/IEC 14882:2011. In this book, references to the standard are of the form §iso.23.3.6.1. In cases where the text of this book is considered imprecise, incomplete, or possibly wrong, consult the standard. But don’t expect the standard to be a tutorial or to be easily accessible by non-experts.

Strictly adhering to the C++ language and library standard doesn’t by itself guarantee good code or even portable code. The standard doesn’t say whether a piece of code is good or bad; it simply says what a programmer can and cannot rely on from an implementation. It is easy to write perfectly awful standard-conforming programs, and most real-world programs rely on features that the standard does not guarantee to be portable. They do so to access system interfaces and hardware features that cannot be expressed directly in C++ or require reliance on specific implementation details.

Many important things are deemed implementation-defined by the standard. This means that each implementation must provide a specific, well-defined behavior for a construct and that behavior must be documented. For example:

unsigned char c1 = 64; // well defined: a char has at least 8 bits and can always hold 64
unsigned char c2 = 1256; // implementation-defined: truncation if a char has only 8 bits

The initialization of c1 is well defined because a char must be at least 8 bits. However, the behavior of the initialization of c2 is implementation-defined because the number of bits in a char is implementation-defined. If the char has only 8 bits, the value 1256 will be truncated to 232(§10.5.2.1). Most implementation-defined features relate to differences in the hardware used to run a program.

Other behaviors are unspecified; that is, a range of possible behaviors are acceptable, but the implementer is not obliged to specify which actually occur. Usually, the reason for deeming something unspecified is that the exact behavior is unpredictable for fundamental reasons. For example, the exact value returned by new is unspecified. So is the value of a variable assigned to from two threads unless some synchronization mechanism has been employed to prevent a data race (§41.2).

When writing real-world programs, it is usually necessary to rely on implementation-defined behavior. Such behavior is the price we pay for the ability to operate effectively on a large range of systems. For example, C++ would have been much simpler if all characters had been 8 bits and all pointers 32 bits. However, 16-bit and 32-bit character sets are not uncommon, and machines with 16-bit and 64-bit pointers are in wide use.

To maximize portability, it is wise to be explicit about what implementation-defined features we rely on and to isolate the more subtle examples in clearly marked sections of a program. A typical example of this practice is to present all dependencies on hardware sizes in the form of constants and type definitions in some header file. To support such techniques, the standard library provides numeric_limits (§40.2). Many assumptions about implementation-defined features can be checked by stating them as static assertions (§2.4.3.3). For example:

static_assert(4<=sizeof(int),"sizeof(int) too small");

Undefined behavior is nastier. A construct is deemed undefined by the standard if no reasonable behavior is required by an implementation. Typically, some obvious implementation technique will cause a program using an undefined feature to behave very badly. For example:

const int size = 4*1024;
char page[size];

void f()
{
page[size+size] = 7; // undefined
}

Plausible outcomes of this code fragment include overwriting unrelated data and triggering a hardware error/exception. An implementation is not required to choose among plausible outcomes. Where powerful optimizers are used, the actual effects of undefined behavior can become quite unpredictable. If a set of plausible and easily implementable alternatives exist, a feature is deemed unspecified or implementation-defined rather than undefined.

It is worth spending considerable time and effort to ensure that a program does not use something deemed unspecified or undefined by the standard. In many cases, tools exist to help do this.

6.1.1. Implementations

A C++ implementation can be either hosted or freestanding (§iso.17.6.1.3). A hosted implementation includes all the standard-library facilities as described in the standard (§30.2) and in this book. A freestanding implementation may provide fewer standard-library facilities, as long as the following are provided:

Freestanding implementations are meant for code running with only the most minimal operating system support. Many implementations also provide a (non-standard) option for not using exceptions for really minimal, close-to-the-hardware, programs.

6.1.2. The Basic Source Character Set

The C++ standard and the examples in this book are written using the basic source character set consisting of the letters, digits, graphical characters, and whitespace characters from the U.S. variant of the international 7-bit character set ISO 646-1983 called ASCII (ANSI3.4-1968). This can cause problems for people who use C++ in an environment with a different character set:

• ASCII contains punctuation characters and operator symbols (such as ], {, and !) that are not available in some character sets.

• We need a notation for characters that do not have a convenient character representation (such as newline and “the character with value 17”).

• ASCII doesn’t contain characters (such as ñ, þ, and Æ) that are used for writing languages other than English.

To use an extended character set for source code, a programming environment can map the extended character set into the basic source character set in one of several ways, for example, by using universal character names (§6.2.3.2).

6.2. Types

Consider:

x = y+f(2);

For this to make sense in a C++ program, the names x, y, and f must be suitably declared. That is, the programmer must specify that entities named x, y, and f exist and that they are of types for which = (assignment), + (addition), and () (function call), respectively, are meaningful.

Every name (identifier) in a C++ program has a type associated with it. This type determines what operations can be applied to the name (that is, to the entity referred to by the name) and how such operations are interpreted. For example:

float x; // x is a floating-point variable
int y = 7; // y is an integer variable with the initial value 7
float f(int); // f is a function taking an argument of type int and returning a floating-point number

These declarations would make the example meaningful. Because y is declared to be an int, it can be assigned to, used as an operand for +, etc. On the other hand, f is declared to be a function that takes an int as its argument, so it can be called given the interger 2.

This chapter presents fundamental types (§6.2.1) and declarations (§6.3). Its examples just demonstrate language features; they are not intended to do anything useful. More extensive and realistic examples are saved for later chapters. This chapter simply provides the most basic elements from which C++ programs are constructed. You must know these elements, plus the terminology and simple syntax that go with them, in order to complete a real project in C++ and especially to read code written by others. However, a thorough understanding of every detail mentioned in this chapter is not a requirement for understanding the following chapters. Consequently, you may prefer to skim through this chapter, observing the major concepts, and return later as the need for understanding more details arises.

6.2.1. Fundamental Types

C++ has a set of fundamental types corresponding to the most common basic storage units of a computer and the most common ways of using them to hold data:

§6.2.2 A Boolean type (bool)

§6.2.3 Character types (such as char and wchar_t)

§6.2.4 Integer types (such as int and long long)

§6.2.5 Floating-point types (such as double and long double)

§6.2.7 A type, void, used to signify the absence of information

From these types, we can construct other types using declarator operators:

§7.2 Pointer types (such as int*)

§7.3 Array types (such as char[])

§7.7 Reference types (such as double& and vector<int>&&)

In addition, a user can define additional types:

§8.2 Data structures and classes (Chapter 16)

§8.4 Enumeration types for representing specific sets of values (enum and enum class)

The Boolean, character, and integer types are collectively called integral types. The integral and floating-point types are collectively called arithmetic types. Enumerations and classes (Chapter 16) are called user-defined types because they must be defined by users rather than being available for use without previous declaration, the way fundamental types are. In contrast, fundamental types, pointers, and references are collectively referred to as built-in types. The standard library provides many user-defined types (Chapter 4, Chapter 5).

The integral and floating-point types are provided in a variety of sizes to give the programmer a choice of the amount of storage consumed, the precision, and the range available for computations (§6.2.8). The assumption is that a computer provides bytes for holding characters, words for holding and computing integer values, some entity most suitable for floating-point computation, and addresses for referring to those entities. The C++ fundamental types together with pointers and arrays present these machine-level notions to the programmer in a reasonably implementation-independent manner.

For most applications, we could use bool for logical values, char for characters, int for integer values, and double for floating-point values. The remaining fundamental types are variations for optimizations, special needs, and compatibility that are best ignored until such needs arise.

6.2.2. Booleans

A Boolean, bool, can have one of the two values true or false. A Boolean is used to express the results of logical operations. For example:

void f(int a, int b)
{
bool b1 {a==b};
// ...
}

If a and b have the same value, b1 becomes true; otherwise, b1 becomes false.

A common use of bool is as the type of the result of a function that tests some condition (a predicate). For example:

bool is_open(File*);

bool greater(int a, int b) { return a>b; }

By definition, true has the value 1 when converted to an integer and false has the value 0. Conversely, integers can be implicitly converted to bool values: nonzero integers convert to true and 0 converts to false. For example:

bool b1 = 7; // 7!=0, so b becomes true
bool b2 {7}; // error: narrowing (§2.2.2, §10.5)

int i1 = true; // i1 becomes 1
int i2 {true}; // i2 becomes 1

If you prefer to use the {}-initializer syntax to prevent narrowing, yet still want to convert an int to a bool, you can be explicit:

void f(int i)
{
bool b {i!=0};
// ...
};

In arithmetic and logical expressions, bools are converted to ints; integer arithmetic and logical operations are performed on the converted values. If the result needs to be converted back to bool, a 0 is converted to false and a nonzero value is converted to true. For example:

bool a = true;
bool b = true;

bool x = a+b; // a+b is 2, so x becomes true
bool y = a||b; // a||b is 1, so y becomes true ("||" means "or")
bool z = a-b; // a-b is 0, so z becomes false

A pointer can be implicitly converted to a bool (§10.5.2.5). A non-null pointer converts to true; pointers with the value nullptr convert to false. For example:

void g(int* p)
{
bool b = p; // narrows to true or false
bool b2 {p!=nullptr}; // explicit test against nullptr

if (p) { // equivalent to p!=nullptr
// ...
}
}

I prefer if (p) over if (p!=nullptr) because it more directly expresses the notion “if p is valid” and also because it is shorter. The shorter form leaves fewer opportunities for mistakes.

6.2.3. Character Types

There are many character sets and character set encodings in use. C++ provides a variety of character types that reflect that - often bewildering - variety:

• char: The default character type, used for program text. A char is used for the implementation’s character set and is usually 8 bits.

• signed char: Like char, but guaranteed to be signed, that is, capable of holding both positive and negative values.

• unsigned char: Like char, but guaranteed to be unsigned.

• wchar_t: Provided to hold characters of a larger character set such as Unicode (see §7.3.2.2). The size of wchar_t is implementation-defined and large enough to hold the largest character set supported by the implementation’s locale (Chapter 39).

• char16_t: A type for holding 16-bit character sets, such as UTF-16.

• char32_t: A type for holding 32-bit character sets, such as UTF-32.

These are six distinct types (despite the fact that the _t suffix is often used to denote aliases; §6.5). On each implementation, the char type will be identical to that of either signed char or unsigned char, but these three names are still considered separate types.

A char variable can hold a character of the implementation’s character set. For example:

char ch = 'a';

Almost universally, a char has 8 bits so that it can hold one of 256 different values. Typically, the character set is a variant of ISO-646, for example ASCII, thus providing the characters appearing on your keyboard. Many problems arise from the fact that this set of characters is only partially standardized.

Serious variations occur between character sets supporting different natural languages and between character sets supporting the same natural language in different ways. Here, we are interested only in how such differences affect the rules of C++. The larger and more interesting issue of how to program in a multilingual, multi-character-set environment is beyond the scope of this book, although it is alluded to in several places (§6.2.3, §36.2.1, Chapter 39).

It is safe to assume that the implementation character set includes the decimal digits, the 26 alphabetic characters of English, and some of the basic punctuation characters. It is not safe to assume that:

• There are no more than 127 characters in an 8-bit character set (e.g., some sets provide 255 characters).

• There are no more alphabetic characters than English provides (most European languages provide more, e.g., æ, þ, and ß).

• The alphabetic characters are contiguous (EBCDIC leaves a gap between 'i' and 'j').

• Every character used to write C++ is available (e.g., some national character sets do not provide {, }, [, ], |, and \).

• A char fits in 1 byte. There are embedded processors without byte accessing hardware for which a char is 4 bytes. Also, one could reasonably use a 16-bit Unicode encoding for the basic chars.

Whenever possible, we should avoid making assumptions about the representation of objects. This general rule applies even to characters.

Each character has an integer value in the character set used by the implementation. For example, the value of 'b' is 98 in the ASCII character set. Here is a loop that outputs the the integer value of any character you care to input:

void intval()
{
for (char c; cin >> c; )
cout << "the value of '" << c << "' is " << int{c} << '\n';
}

The notation int{c} gives the integer value for a character c (“the int we can construct from c”). The possibility of converting a char to an integer raises the question: is a char signed or unsigned? The 256 values represented by an 8-bit byte can be interpreted as the values 0 to 255 or as the values -127 to 127. No, not -128 to 127 as one might expect: the C++ standard leaves open the possibility of one’s-complement hardware and that eliminates one value; thus, a use of -128 is nonportable. Unfortunately, the choice of signed or unsigned for a plain char is implementation-defined. C++ provides two types for which the answer is definite: signed char, which can hold at least the values -127 to 127, and unsigned char, which can hold at least the values 0 to 255. Fortunately, the difference matters only for values outside the 0 to 127 range, and the most common characters are within that range.

Values outside that range stored in a plain char can lead to subtle portability problems. See §6.2.3.1 if you need to use more than one type of char or if you store integers in char variables.

Note that the character types are integral types (§6.2.1) so that arithmetic and bitwise logical operations (§10.3) apply. For example:

void digits()
{
for (int i=0; i!=10; ++i)
cout << static_cast<char>('0'+i);
}

This is a way of writing the ten digits to cout. The character literal '0' is converted to its integer value and i is added. The resulting int is then converted to a char and written to cout. Plain '0'+i is an int, so if I had left out the static_cast<char>, the output would have been something like48, 49, and so on, rather than 0, 1, and so on.

6.2.3.1. Signed and Unsigned Characters

It is implementation-defined whether a plain char is considered signed or unsigned. This opens the possibility for some nasty surprises and implementation dependencies. For example:

char c = 255; // 255 is "all ones," hexadecimal 0xFF
int i = c;

What will be the value of i? Unfortunately, the answer is undefined. On an implementation with 8-bit bytes, the answer depends on the meaning of the “all ones” char bit pattern when extended into an int. On a machine where a char is unsigned, the answer is 255. On a machine where achar is signed, the answer is -1. In this case, the compiler might warn about the conversion of the literal 255 to the char value -1. However, C++ does not offer a general mechanism for detecting this kind of problem. One solution is to avoid plain char and use the specific char types only. Unfortunately, some standard-library functions, such as strcmp(), take plain chars only (§43.4).

A char must behave identically to either a signed char or an unsigned char. However, the three char types are distinct, so you can’t mix pointers to different char types. For example:

void f(char c, signed char sc, unsigned char uc)
{
char* pc = &uc; // error: no pointer conversion
signed char* psc = pc; // error: no pointer conversion
unsigned char* puc = pc; // error: no pointer conversion
psc = puc; // error: no pointer conversion
}

Variables of the three char types can be freely assigned to each other. However, assigning a toolarge value to a signed char (§10.5.2.1) is still undefined. For example:

void g(char c, signed char sc, unsigned char uc)
{
c = 255; // implementation-defined if plain chars are signed and have 8 bits
c = sc; // OK
c = uc; // implementation-defined if plain chars are signed and if uc's value is too large
sc = uc; // implementation defined if uc's value is too large
uc = sc; // OK: conversion to unsigned
sc = c; // implementation-defined if plain chars are unsigned and if c's value is too large
uc = c; // OK: conversion to unsigned
}

To be concrete, assume that a char is 8 bits:

signed char sc = -160;
unsigned char uc = sc; // uc == 116 (because 256-160==116)
cout << uc; // print 't'

char count[256]; // assume 8-bit chars
++count[sc]; // likely disaster: out-of-range access
++count[uc]; // OK

None of these potential problems and confusions occur if you use plain char throughout and avoid negative character values.

6.2.3.2. Character Literals

A character literal is a single character enclosed in single quotes, for example, 'a' and '0'. The type of a character literal is char. A character literal can be implicitly converted to its integer value in the character set of the machine on which the C++ program is to run. For example, if you are running on a machine using the ASCII character set, the value of '0' is 48. The use of character literals rather than decimal notation makes programs more portable.

A few characters have standard names that use the backslash, \ , as an escape character:

Despite their appearance, these are single characters.

We can represent a character from the implementation character set as a one-, two-, or three-digit octal number (\ followed by octal digits) or as a hexadecimal number (\x followed by hexadecimal digits). There is no limit to the number of hexadecimal digits in the sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. For example:

This makes it possible to represent every character in the machine’s character set and, in particular, to embed such characters in character strings (see §7.3.2). Using any numeric notation for characters makes a program nonportable across machines with different character sets.

It is possible to enclose more than one character in a character literal, for example, 'ab'. Such uses are archaic, implementation-dependent, and best avoided. The type of such a multicharacter literal is int.

When embedding a numeric constant in a string using the octal notation, it is wise always to use three digits for the number. The notation is hard enough to read without having to worry about whether or not the character after a constant is a digit. For hexadecimal constants, use two digits. Consider these examples:

char v1[] = "a\xah\129"; // 6 chars: 'a' '\xa' 'h' '\12' '9' '\0'
char v2[] = "a\xah\127"; // 5 chars: 'a' '\xa' 'h' '\127' '\0'
char v3[] = "a\xad\127"; // 4 chars: 'a' '\xad' '\127' '\0'
char v4[] = "a\xad\0127"; // 5 chars: 'a' '\xad' '\012' '7' '\0'

Wide character literals are of the form L'ab' and are of type wchar_t. The number of characters between the quotes and their meanings are implementation-defined.

A C++ program can manipulate character sets that are much richer than the 127-character ASCII set, such as Unicode. Literals of such larger character sets are presented as sequences of four or eight hexadecimal digits preceded by a U or a u. For example:

U'\UFADEBEEF'
u'\uDEAD'
u'\xDEAD'

The shorter notation u'\uXXXX' is equivalent to U'\U0000XXXX' for any hexadecimal digit X. A number of hexadecimal digits different from four or eight is a lexical error. The meaning of the hexadecimal number is defined by the ISO/IEC 10646 standard and such values are calleduniversal character names. In the C++ standard, universal character names are described in §iso.2.2, §iso.2.3, §iso.2.14.3, §iso.2.14.5, and §iso.E.

6.2.4. Integer Types

Like char, each integer type comes in three forms: “plain” int, signed int, and unsigned int. In addition, integers come in four sizes: short int, “plain” int, long int, and long long int. A long int can be referred to as plain long, and a long long int can be referred to as plain long long. Similarly, short is a synonym for short int, unsigned for unsigned int, and signed for signed int. No, there is no long short int equivalent to int.

The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules (§10.5.1, §10.5.2.1).

Unlike plain chars, plain ints are always signed. The signed int types are simply more explicit synonyms for their plain int counterparts, rather than different types.

If you need more detailed control over integer sizes, you can use aliases from <cstdint> (§43.7), such as int64_t (a signed integer with exactly 64 bits), uint_fast16_t (an unsigned integer with exactly 8 bits, supposedly the fastest such integer), and int_least32_t (a signed integer with at least 32 bits, just like plain int). The plain integer types have well-defined minimal sizes (§6.2.8), so the <cstdint> are sometimes redundant and can be overused.

In addition to the standard integer types, an implementation may provide extended integer types (signed and unsigned). These types must behave like integers and are considered integer types when considering conversions and integer literal values, but they usually have greater range (occupy more space).

6.2.4.1. Integer Literals

Integer literals come in three guises: decimal, octal, and hexadecimal. Decimal literals are the most commonly used and look as you would expect them to:

7 1234 976 12345678901234567890

The compiler ought to warn about literals that are too long to represent, but an error is only guaranteed for {} initializers (§6.3.5).

A literal starting with zero followed by x or X (0x or 0X) is a hexadecimal (base 16) number. A literal starting with zero but not followed by x or X is an octal (base 8) number. For example:

The letters a, b, c, d, e, and f, or their uppercase equivalents, are used to represent 10, 11, 12, 13, 14, and 15, respectively. Octal and hexadecimal notations are most useful for expressing bit patterns. Using these notations to express genuine numbers can lead to surprises. For example, on a machine on which an int is represented as a two’s complement 16-bit integer, 0xffff is the negative decimal number -1. Had more bits been used to represent an integer, it would have been the positive decimal number 65535.

The suffix U can be used to write explicitly unsigned literals. Similarly, the suffix L can be used to write explicitly long literals. For example, 3 is an int, 3U is an unsigned int, and 3L is a long int.

Combinations of suffixes are allowed. For example:

cout << 0xF0UL << ' ' << 0LU << '\n';

If no suffix is provided, the compiler gives an integer literal a suitable type based on its value and the implementation’s integer sizes (§6.2.4.2).

It is a good idea to limit the use of nonobvious constants to a few well-commented const (§7.5), constexpr (§10.4), and enumerator (§8.4) initializers.

6.2.4.2. Types of Integer Literals

In general, the type of an integer literal depends on its form, value, and suffix:

• If it is decimal and has no suffix, it has the first of these types in which its value can be represented: int, long int, long long int.

• If it is octal or hexadecimal and has no suffix, it has the first of these types in which its value can be represented: int, unsigned int, long int, unsigned long int, long long int, unsigned long long int.

• If it is suffixed by u or U, its type is the first of these types in which its value can be represented: unsigned int, unsigned long int, unsigned long long int.

• If it is decimal and suffixed by l or L, its type is the first of these types in which its value can be represented: long int, long long int.

• If it is octal or hexadecimal and suffixed by l or L, its type is the first of these types in which its value can be represented: long int, unsigned long int, long long int, unsigned long long int.

• If it is suffixed by ul, lu, uL, Lu, Ul, lU, UL, or LU, its type is the first of these types in which its value can be represented: unsigned long int, unsigned long long int.

• If it is decimal and is suffixed by ll or LL, its type is long long int.

• If it is octal or hexadecimal and is suffixed by ll or LL, its type is the first of these types in which its value can be represented: long long int, unsigned long long int.

• If it is suffixed by llu, llU, ull, Ull, LLu, LLU, uLL, or ULL, its type is unsigned long long int.

For example, 100000 is of type int on a machine with 32-bit ints but of type long int on a machine with 16-bit ints and 32-bit longs. Similarly, 0XA000 is of type int on a machine with 32-bit ints but of type unsigned int on a machine with 16-bit ints. These implementation dependencies can be avoided by using suffixes: 100000L is of type long int on all machines and 0XA000U is of type unsigned int on all machines.

6.2.5. Floating-Point Types

The floating-point types represent floating-point numbers. A floating-point number is an approximation of a real number represented in a fixed amount of memory. There are three floating-point types: float (single-precision), double (double-precision), and long double (extended-precision).

The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don’t have that understanding, get advice, take the time to learn, or use double and hope for the best.

6.2.5.1. Floating-Point Literals

By default, a floating-point literal is of type double. Again, a compiler ought to warn about floating-point literals that are too large to be represented. Here are some floating-point literals:

1.23 .23 0.23 1. 1.0 1.2e10 1.23e-15

Note that a space cannot occur in the middle of a floating-point literal. For example, 65.43 e-21 is not a floating-point literal but rather four separate lexical tokens (causing a syntax error):

65.43 e - 21

If you want a floating-point literal of type float, you can define one using the suffix f or F:

3.14159265f 2.0f 2.997925F 2.9e-3f

If you want a floating-point literal of type long double, you can define one using the suffix l or L:

3.14159265L 2.0L 2.997925L 2.9e-3L

6.2.6. Prefixes and Suffixes

There is a minor zoo of suffixes indicating types of literals and also a few prefixes:

Note that “string” here means “string literal” (§7.3.2) rather than “of type std::string.”

Obviously, we could also consider . and e as infix and R" and u8" as the first part of a set of delimiters. However, I consider the nomenclature less important than giving an overview of the bewildering variety of literals.

The suffixes l and L can be combined with the suffixes u and U to express unsigned long types. For example:

1LU // unsigned long
2UL // unsigned long
3ULL // unsigned long long
4LLU // unsigned long long
5LUL // error

The suffixes l and L can be used for floating-point literals to express long double. For example:

1L // long int
1.0L // long double

Combinations of R, L, and u prefixes are allowed, for example, uR"**(foo\(bar))**". Note the dramatic difference in the meaning of a U prefix for a character (unsigned) and for a string UTF-32 encoding (§7.3.2.2).

In addition, a user can define new suffixes for user-defined types. For example, by defining a user-defined literal operator (§19.2.6), we can get

"foo bar"s // a literal of type std::string
123_km // a literal of type Distance

Suffixes not starting with _ are reserved for the standard library.

6.2.7. void

The type void is syntactically a fundamental type. It can, however, be used only as part of a more complicated type; there are no objects of type void. It is used either to specify that a function does not return a value or as the base type for pointers to objects of unknown type. For example:

void x; // error: there are no void objects
void& r; // error: there are no references to void
void f(); // function f does not return a value (§12.1.4)
void* pv; // pointer to object of unknown type (§7.2.1)

When declaring a function, you must specify the type of the value returned. Logically, you would expect to be able to indicate that a function didn’t return a value by omitting the return type. However, that would make a mess of the grammar (§iso.A). Consequently, void is used as a “pseudo return type” to indicate that a function doesn’t return a value.

6.2.8. Sizes

Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation-defined (§6.1). I point out these dependencies and often recommend avoiding them or taking steps to minimize their impact. Why should you bother? People who program on a variety of systems or use a variety of compilers care a lot because if they don’t, they are forced to waste time finding and fixing obscure bugs. People who claim they don’t care about portability usually do so because they use only a single system and feel they can afford the attitude that “the language is what my compiler implements.” This is a narrow and shortsighted view. If your program is a success, it will be ported, so someone will have to find and fix problems related to implementation-dependent features. In addition, programs often need to be compiled with other compilers for the same system, and even a future release of your favorite compiler may do some things differently from the current one. It is far easier to know and limit the impact of implementation dependencies when a program is written than to try to untangle the mess afterward.

It is relatively easy to limit the impact of implementation-dependent language features. Limiting the impact of system-dependent library facilities is far harder. Using standard-library facilities wherever feasible is one approach.

The reason for providing more than one integer type, more than one unsigned type, and more than one floating-point type is to allow the programmer to take advantage of hardware characteristics. On many machines, there are significant differences in memory requirements, memory access times, and computation speed among the different varieties of fundamental types. If you know a machine, it is usually easy to choose, for example, the appropriate integer type for a particular variable. Writing truly portable low-level code is harder.

Here is a graphical representation of a plausible set of fundamental types and a sample string literal (§7.3.2):

On the same scale (.2 inch to a byte), a megabyte of memory would stretch about 3 miles (5 km) to the right.

Sizes of C++ objects are expressed in terms of multiples of the size of a char, so by definition the size of a char is 1. The size of an object or type can be obtained using the sizeof operator (§10.3). This is what is guaranteed about sizes of fundamental types:

• 1 ≡ sizeof(char) ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long) ≤ sizeof(long long)

• 1 ≤ sizeof(bool) ≤ sizeof(long)

• sizeof(char) ≤ sizeof(wchar_t) ≤ sizeof(long)

• sizeof(float) ≤ sizeof(double) ≤ sizeof(long double)

• sizeof(N) ≡ sizeof(signed N) ≡ sizeof(unsigned N)

In that last line, N can be char, short, int, long, or long long. In addition, it is guaranteed that a char has at least 8 bits, a short at least 16 bits, and a long at least 32 bits. A char can hold a character of the machine’s character set. The char type is supposed to be chosen by the implementation to be the most suitable type for holding and manipulating characters on a given computer; it is typically an 8-bit byte. Similarly, the int type is supposed to be chosen to be the most suitable for holding and manipulating integers on a given computer; it is typically a 4-byte (32-bit) word. It is unwise to assume more. For example, there are machines with 32-bit chars. It is extremely unwise to assume that the size of an int is the same as the size of a pointer; many machines (“64-bit architectures”) have pointers that are larger than integers. Note that it is not guaranteed that sizeof(long)<sizeof(long long) or that sizeof(double)<sizeof(long double).

Some implementation-defined aspects of fundamental types can be found by a simple use of sizeof, and more can be found in <limits>. For example:

#include <limits> // §40.2
#include <iostream>

int main()
{
cout << "size of long " << sizeof(1L) << '\n';
cout << "size of long long " << sizeof(1LL) << '\n';

cout << "largest float == " << std::numeric_limits<float>::max() << '\n';
cout << "char is signed == " << std::numeric_limits<char>::is_signed << '\n';
}

The functions in <limits> (§40.2) are constexpr (§10.4) so that they can be used without run-time overhead and in contexts that require a constant expression.

The fundamental types can be mixed freely in assignments and expressions. Wherever possible, values are converted so as not to lose information (§10.5).

If a value v can be represented exactly in a variable of type T, a conversion of v to T is value-preserving. Conversions that are not value-preserving are best avoided (§2.2.2, §10.5.2.6).

If you need a specific size of integer, say, a 16-bit integer, you can #include the standard header <cstdint> that defines a variety of types (or rather type aliases; §6.5). For example:

int16_t x {0xaabb}; // 2 bytes
int64_t xxxx {0xaaaabbbbccccdddd}; // 8 bytes
int_least16_t y; // at least 2 bytes (just like int)
int_least32_t yy // at least 4 bytes (just like long)
int_fast32_t z; // the fastest int type with at least 4 bytes

The standard header <cstddef> defines an alias that is very widely used in both standard-library declarations and user code: size_t is an implementation-defined unsigned integer type that can hold the size in bytes of every object. Consequently, it is used where we need to hold an object size. For example:

void* allocate(size_t n); // get n bytes

Similarly, <cstddef> defines the signed integer type ptrdiff_t for holding the result of subtracting two pointers to get a number of elements.

6.2.9. Alignment

An object doesn’t just need enough storage to hold its representation. In addition, on some machine architectures, the bytes used to hold it must have proper alignment for the hardware to access it efficiently (or in extreme cases to access it at all). For example, a 4-byte int often has to be aligned on a word (4-byte) boundary, and sometimes an 8-byte double has to be aligned on a word (8-byte) boundary. Of course, this is all very implementation specific, and for most programmers completely implicit. You can write good C++ code for decades without needing to be explicit about alignment. Where alignment most often becomes visible is in object layouts: sometimes structs contain “holes” to improve alignment (§8.2.1).

The alignof() operator returns the alignment of its argument expression. For example:

auto ac = alignof('c'); // the alignment of a char
auto ai = alignof(1); // the alignment of an int
auto ad = alignof(2.0); // the alignment of a double

int a[20];
auto aa = alignof(a); // the alignment of an int

Sometimes, we have to use alignment in a declaration, where an expression, such as alignof(x+y) is not allowed. Instead, we can use the type specifier alignas: alignas(T) means “align just like a T.” For example, we can set aside uninitialized storage for some type X like this:

void user(const vector<X>& vx)
{
constexpr int bufmax = 1024;
alignas(X) buffer[bufmax]; // uninitialized

const int max = min(vx.size(),bufmax/sizeof(X));
uninitialized_copy(vx.begin(),vx.begin()+max,buffer);
// ...
}

6.3. Declarations

Before a name (identifier) can be used in a C++ program, it must be declared. That is, its type must be specified to inform the compiler what kind of entity the name refers to. For example:

char ch;
string s;
auto count = 1;
const double pi {3.1415926535897};
extern int error_number;

const char* name = "Njal";
const char* season[] = { "spring", "summer", "fall", "winter" };
vector<string> people { name, "Skarphedin", "Gunnar" };

struct Date
{ int d, m, y; }; int day(Date* p) { return p->d; }
double sqrt(double);
template<class T> T abs(T a) { return a<0 ? -a: a; }

constexpr int fac(int n) { return (n<2)?1:n* fac(n-1); } // possible compile-time evaluation (§2.2.3)
constexpr double zz { ii*fac(7) }; // compile-time initialization

using Cmplx = std::complex<double>; // type alias (§3.4.5, §6.5)
struct User; // type name
enum class Beer { Carlsberg, Tuborg, Thor };
namespace NS { int a; }

As can be seen from these examples, a declaration can do more than simply associate a type with a name. Most of these declarations are also definitions. A definition is a declaration that supplies all that is needed in a program for the use of an entity. In particular, if it takes memory to represent something, that memory is set aside by its definition. A different terminology deems declarations parts of an interface and definitions parts of an implementation. When taking that view, we try to compose interfaces out of declarations that can be replicated in separate files (§15.2.2); definitions that set aside memory do not belong in interfaces.

Assuming that these declarations are in the global scope (§6.3.4), we have:

char ch; // set aside memory for a char and initialize it to 0
auto count = 1; // set aside memory for an int initialized to 1
const char* name = "Njal"; // set aside memory for a pointer to char
// set aside memory for a string literal "Njal"
// initialize the pointer with the address of that string literal

struct Date { int d, m, y; }; // Date is a struct with three members
int day(Date* p) { return p->d; } // day is a function that executes the specified code

using Point = std::complex<short>;// Point is a name for std::complex<short>

Of the declarations above, only three are not also definitions:

double sqrt(double); // function declaration
extern int error_number; // variable declaration
struct User; // type name declaration

That is, if used, the entity they refer to must be defined elsewhere. For example:

double sqrt(double d) { /* ... */ }
int error_number = 1;
struct User { /* ... */ };

There must always be exactly one definition for each name in a C++ program (for the effects of #include, see §15.2.3). However, there can be many declarations.

All declarations of an entity must agree on its type. So, this fragment has two errors:

int count;
int count; // error: redefinition
extern int error_number;
extern short error_number; // error: type mismatch

This has no errors (for the use of extern, see §15.2):

extern int error_number;
extern int error_number; // OK: redeclaration

Some definitions explicitly specify a “value” for the entities they define. For example:

struct Date { int d, m, y; };
using Point = std::complex<short>; // Point is a name for std::complex<short>
int day(Date* p) { return p->d; }
const double pi {3.1415926535897};

For types, aliases, templates, functions, and constants, the “value” is permanent. For non-const data types, the initial value may be changed later. For example:

void f()
{
int count {1}; // initialize count to 1
const char* name {"Bjarne"}; // name is a variable that points to a constant (§7.5)
count = 2; // assign 2 to count
name = "Marian";
}

Of the definitions, only two do not specify values:

char ch;
string s;

See §6.3.5 and §17.3.3 for explanations of how and when a variable is assigned a default value. Any declaration that specifies a value is a definition.

6.3.1. The Structure of Declarations

The structure of a declaration is defined by the C++ grammar (§iso.A). This grammar evolved over four decades, starting with the early C grammars, and is quite complicated. However, without too many radical simplifications, we can consider a declaration as having five parts (in order):

• Optional prefix specifiers (e.g., static or virtual)

• A base type (e.g., vector<double> or const int)

• A declarator optionally including a name (e.g., p[7], n, or *(*)[])

• Optional suffix function specifiers (e.g., const or noexcept)

• An optional initializer or function body (e.g., ={7,5,3} or {return x;})

Except for function and namespace definitions, a declaration is terminated by a semicolon. Consider a definition of an array of C-style strings:

const char* kings[] = { "Antigonus", "Seleucus", "Ptolemy" };

Here, the base type is const char, the declarator is * kings[], and the initializer is the = followed by the {}-list.

A specifier is an initial keyword, such as virtual (§3.2.3, §20.3.2), extern (§15.2), or constexpr (§2.2.3), that specifies some non-type attribute of what is being declared.

A declarator is composed of a name and optionally some declarator operators. The most common declarator operators are:

Their use would be simple if they were all either prefix or postfix. However, *, [], and () were designed to mirror their use in expressions (§10.3). Thus, * is prefix and [] and () are postfix. The postfix declarator operators bind tighter than the prefix ones. Consequently, char*kings[] is an array of pointers to char, whereas char(*kings)[] is a pointer to an array of char. We have to use parentheses to express types such as “pointer to array” and “pointer to function”; see the examples in §7.2.

Note that the type cannot be left out of a declaration. For example:

const c = 7; // error: no type

gt(int a, int b) // error: no return type
{
return (a>b) ? a : b;
}

unsigned ui; // OK: "unsigned" means"unsigned int"
long li; // OK: "long" means "long int"

In this, standard C++ differs from early versions of C and C++ that allowed the first two examples by considering int to be the type when none was specified (§44.3). This “implicit int” rule was a source of subtle errors and much confusion.

Some types have names composed out of multiple keywords, such as long long and volatile int. Some type names don’t even look much like names, such as decltype(f(x)) (the return type of a call f(x); §6.3.6.3).

The volatile specifier is described in §41.4.

The alignas() specifier is described in §6.2.9.

6.3.2. Declaring Multiple Names

It is possible to declare several names in a single declaration. The declaration simply contains a list of comma-separated declarators. For example, we can declare two integers like this:

int x, y; // int x; int y;

Operators apply to individual names only - and not to any subsequent names in the same declaration. For example:

int* p, y; // int* p; int y; NOT int* y;
int x, *q; // int x; int* q;
int v[10], *pv; // int v[10]; int* pv;

Such declarations with multiple names and nontrivial declarators make a program harder to read and should be avoided.

6.3.3. Names

A name (identifier) consists of a sequence of letters and digits. The first character must be a letter. The underscore character, _, is considered a letter. C++ imposes no limit on the number of characters in a name. However, some parts of an implementation are not under the control of the compiler writer (in particular, the linker), and those parts, unfortunately, sometimes do impose limits. Some run-time environments also make it necessary to extend or restrict the set of characters accepted in an identifier. Extensions (e.g., allowing the character $ in a name) yield nonportable programs. A C++ keyword (§6.3.3.1), such as new or int, cannot be used as a name of a user-defined entity. Examples of names are:

hello this_is_a_most_unusually_long_identifier_that_is_better_avoided
DEFINED foO bAr u_name HorseSense
var0 var1 CLASS _class ___

Examples of character sequences that cannot be used as identifiers are:

012 a fool $sys class 3var
pay.due foo~bar .name if

Nonlocal names starting with an underscore are reserved for special facilities in the implementation and the run-time environment, so such names should not be used in application programs. Similarly, names starting with a double underscore (__) or an underscore followed by an uppercase letter (e.g., _Foo) are reserved (§iso.17.6.4.3).

When reading a program, the compiler always looks for the longest string of characters that could make up a name. Hence, var10 is a single name, not the name var followed by the number 10. Also, elseif is a single name, not the keyword else followed by the keyword if.

Uppercase and lowercase letters are distinct, so Count and count are different names, but it is often unwise to choose names that differ only by capitalization. In general, it is best to avoid names that differ only in subtle ways. For example, in some fonts, the uppercase “o” (O) and zero (0) can be hard to tell apart, as can the lowercase “L” (l), uppercase “i” (I), and one (1). Consequently, l0, lO, l1, ll, and I1l are poor choices for identifier names. Not all fonts have the same problems, but most have some.

Names from a large scope ought to have relatively long and reasonably obvious names, such as vector, Window_with_border, and Department_number. However, code is clearer if names used only in a small scope have short, conventional names such as x, i, and p. Functions (Chapter 12), classes (Chapter 16), and namespaces (§14.3.1) can be used to keep scopes small. It is often useful to keep frequently used names relatively short and reserve really long names for infrequently used entities.

Choose names to reflect the meaning of an entity rather than its implementation. For example, phone_book is better than number_vector even if the phone numbers happen to be stored in a vector (§4.4). Do not encode type information in a name (e.g., pcname for a name that’s a char*or icount for a count that’s an int) as is sometimes done in languages with dynamic or weak type systems:

• Encoding types in names lowers the abstraction level of the program; in particular, it prevents generic programming (which relies on a name being able to refer to entities of different types).

• The compiler is better at keeping track of types than you are.

• If you want to change the type of a name (e.g., use a std::string to hold the name), you’ll have to change every use of the name (or the type encoding becomes a lie).

• Any system of type abbreviations you can come up with will become overelaborate and cryptic as the variety of types you use increases.

Choosing good names is an art.

Try to maintain a consistent naming style. For example, capitalize names of user-defined types and start names of non-type entities with a lowercase letter (for example, Shape and current_token). Also, use all capitals for macros (if you must use macros (§12.6); for example, HACK) and never for non-macros (not even for non-macro constants). Use underscores to separate words in an identifier; number_of_elements is more readable than numberOfElements. However, consistency is hard to achieve because programs are typically composed of fragments from different sources and several different reasonable styles are in use. Be consistent in your use of abbreviations and acronyms. Note that the language and the standard library use lowercase for types; this can be seen as a hint that they are part of the standard.

6.3.3.1. Keywords

The C++ keywords are:

In addition, the word export is reserved for future use.

6.3.4. Scope

A declaration introduces a name into a scope; that is, a name can be used only in a specific part of the program text.

• Local scope: A name declared in a function (Chapter 12) or lambda (§11.4) is called a local name. Its scope extends from its point of declaration to the end of the block in which its declaration occurs. A block is a section of code delimited by a {} pair. Function and lambda parameter names are considered local names in the outermost block of their function or lambda.

• Class scope: A name is called a member name (or a class member name) if it is defined in a class outside any function, class (Chapter 16), enum class (§8.4.1), or other namespace. Its scope extends from the opening { of the class declaration to the end of the class declaration.

• Namespace scope: A name is called a namespace member name if it is defined in a namespace (§14.3.1) outside any function, lambda (§11.4), class (Chapter 16), enum class (§8.4.1), or other namespace. Its scope extends from the point of declaration to the end of its namespace. A namespace name may also be accessible from other translation units (§15.2).

• Global scope: A name is called a global name if it is defined outside any function, class (Chapter 16), enum class (§8.4.1), or namespace (§14.3.1). The scope of a global name extends from the point of declaration to the end of the file in which its declaration occurs. A global name may also be accessible from other translation units (§15.2). Technically, the global namespace is considered a namespace, so a global name is an example of a namespace member name.

• Statement scope: A name is in a statement scope if it is defined within the () part of a for-, while-, if-, or switch-statement. Its scope extends from its point of declaration to the end of its statement. All names in statement scope are local names.

• Function scope: A label (§9.6) is in scope from its point of declaration until the end of the function.

A declaration of a name in a block can hide a declaration in an enclosing block or a global name. That is, a name can be redefined to refer to a different entity within a block. After exit from the block, the name resumes its previous meaning. For example:

int x; // global x

void f()
{
int x; // local x hides global x
x = 1; // assign to local x
{
int x; // hides first local x
x = 2; // assign to second local x
}
x = 3; // assign to first local x
}

int* p = & x; // take address of global x

Hiding names is unavoidable when writing large programs. However, a human reader can easily fail to notice that a name has been hidden (also known as shadowed). Because such errors are relatively rare, they can be very difficult to find. Consequently, name hiding should be minimized. Using names such as i and x for global variables or for local variables in a large function is asking for trouble.

A hidden global name can be referred to using the scope resolution operator, ::. For example:

int x;

void f2()
{
int x = 1; // hide global x
::x = 2; // assign to global x
x = 2; // assign to local x
// ...
}

There is no way to use a hidden local name.

The scope of a name that is not a class member starts at its point of declaration, that is, after the complete declarator and before the initializer. This implies that a name can be used even to specify its own initial value. For example:

int x = 97;

void f3()
{
int x = x; // perverse: initialize x with its own (uninitialized) value
}

A good compiler warns if a variable is used before it has been initialized.

It is possible to use a single name to refer to two different objects in a block without using the :: operator. For example:

int x = 11;

void f4() // perverse: use of two different objects both called x in a single scope
{
int y = x; // use global x: y = 11
int x = 22;
y = x; // use local x: y = 22
}

Again, such subtleties are best avoided.

The names of function arguments are considered declared in the outermost block of a function. For example:

void f5(int x)
{
int x; // error
}

This is an error because x is defined twice in the same scope.

Names introduced in a for-statement are local to that statement (in statement scope). This allows us to use conventional names for loop variables repeatedly in a function. For example:

void f(vector<string>& v, list<int>& lst)
{
for (const auto& x : v) cout << x << '\n';
for (auto x : lst) cout << x << '\n';
for (int i = 0, i!=v.size(), ++i) cout << v[i] << '\n';
for (auto i : {1, 2, 3, 4, 5, 6, 7}) cout << i << '\n';
}

This contains no name clashes.

A declaration is not allowed as the only statement on the branch of an if-statement (§9.4.1).

6.3.5. Initialization

If an initializer is specified for an object, that initializer determines the initial value of an object. An initializer can use one of four syntactic styles:

X a1 {v};
X a2 = {v};
X a3 = v;
X a4(v);

Of these, only the first can be used in every context, and I strongly recommend its use. It is clearer and less error-prone than the alternatives. However, the first form (used for a1) is new in C++11, so the other three forms are what you find in older code. The two forms using = are what you use in C. Old habits die hard, so I sometimes (inconsistently) use = when initializing a simple variable with a simple value. For example:

int x1 = 0;
char c1 = 'z';

However, anything much more complicated than that is better done using {}. Initialization using {}, list initialization, does not allow narrowing (§iso.8.5.4). That is:

• An integer cannot be converted to another integer that cannot hold its value. For example, char to int is allowed, but not int to char.

• A floating-point value cannot be converted to another floating-point type that cannot hold its value. For example, float to double is allowed, but not double to float.

• A floating-point value cannot be converted to an integer type.

• An integer value cannot be converted to a floating-point type.

For example:

void f(double val, int val2)
{
int x2 = val; // if val==7.9, x2 becomes 7
char c2 = val2; // if val2==1025, c2 becomes 1

int x3 {val}; // error: possible truncation
char c3 {val2}; // error: possible narrowing

char c4 {24}; // OK: 24 can be represented exactly as a char
char c5 {264}; // error (assuming 8-bit chars): 264 cannot be represented as a char

int x4 {2.0}; // error: no double to int value conversion

// ...
}

See §10.5 for the conversion rules for built-in types.

There is no advantage to using {} initialization, and one trap, when using auto to get the type determined by the initializer. The trap is that if the initializer is a {}-list, we may not want its type deduced (§6.3.6.2). For example:

auto z1 {99}; // z1 is an initializer_list<int>
auto z2 = 99; // z2 is an int

So prefer = when using auto.

It is possible to define a class so that an object can be initialized by a list of values and alternatively be constructed given a couple of arguments that are not simply values to be stored. The classical example is a vector of integers:

vector<int> v1 {99}; // v1 is a vector of 1 element with the value 99
vector<int> v2(99); // v2 is a vector of 99 elements each with the default value 0

I use the explicit invocation of a constructor, (99), to get the second meaning. Most types do not offer such confusing alternatives - even most vectors do not; for example:

vector<string> v1{"hello!"}; // v1 is a vector of 1 element with the value "hello!"
vector<string> v2("hello!"); // error: no vector constructor takes a string literal

So, prefer {} initialization over alternatives unless you have a strong reason not to.

The empty initializer list, {}, is used to indicate that a default value is desired. For example:

int x4 {}; // x4 becomes 0
double d4 {}; // d4 becomes 0.0
char* p {}; // p becomes nullptr
vector<int> v4{}; // v4 becomes the empty vector
string s4 {}; // s4 becomes ""

Most types have a default value. For integral types, the default value is a suitable representation of zero. For pointers, the default value is nullptr (§7.2.2). For user-defined types, the default value (if any) is determined by the type’s constructors (§17.3.3).

For user-defined types, there can be a distinction between direct initialization (where implicit conversions are allowed) and copy initialization (where they are not); see §16.2.6.

Initialization of particular kinds of objects is discussed where appropriate:

• Pointers: §7.2.2, §7.3.2, §7.4

• References: §7.7.1 (lvalues), §7.7.2 (rvalues)

• Arrays: §7.3.1, §7.3.2

• Constants: §10.4

• Classes: §17.3.1 (not using constructors), §17.3.2 (using constructors), §17.3.3 (default), §17.4 (member and base), §17.5 (copy and move)

• User-defined containers: §17.3.4

6.3.5.1. Missing Initializers

For many types, including all built-in types, it is possible to leave out the initializer. If you do that - and that has unfortunately been common - the situation is more complicated. If you don’t like the complications, just initialize consistently. The only really good case for an uninitialized variable is a large input buffer. For example:

constexpr int max = 1024*1024;
char buf[max];
some_stream.get(buf,max); // read at most max characters into buf

We could easily have initialized buf:

char buf[max] {}; // initialize every char to 0

By redundantly initializing, we would have suffered a performance hit which just might have been significant. Avoid such low-level use of buffers where you can, and don’t leave such buffers uninitialized unless you know (e.g., from measurement) that the optimization compared to using an initialized array is significant.

If no initializer is specified, a global (§6.3.4), namespace (§14.3.1), local static (§12.1.8), or static member (§16.2.12) (collectively called static objects) is initialized to {} of the appropriate type. For example:

int a; // means "int a{};" so that a becomes 0
double d; // means "double d{};" so that d becomes 0.0

Local variables and objects created on the free store (sometimes called dynamic objects or heap objects; §11.2) are not initialized by default unless they are of user-defined types with a default constructor (§17.3.3). For example:

void f()
{
int x; // x does not have a well-defined value
char buf[1024]; // buf[i] does not have a well-defined value

int* p {new int}; // *p does not have a well-defined value
char* q {new char[1024]}; // q[i] does not have a well-defined value

string s; // s=="" because of string's default constructor
vector<char> v; // v=={} because of vector's default constructor

string* ps {new string}; // *ps is "" because of string's default constructor
// ...
}

If you want initialization of local variables of built-in type or objects of built-in type created with new, use {}. For example:

void ff()
{
int x {}; // x becomes 0
char buf[1024]{}; // buf[i] becomes 0 for all i

int* p {new int{10}}; // *p becomes 10
char* q {new char[1024]{}}; // q[i] becomes 0 for all i

// ...
}

A member of an array or a class is default initialized if the array or structure is.

6.3.5.2. Initializer Lists

So far, we have considered the cases of no initializer and one initializer value. More complicated objects can require more than one value as an initializer. This is primarily handled by initializer lists delimited by { and }. For example:

int a[] = { 1, 2 }; // array initializer
struct S { int x, string s };
S s = { 1, "Helios" }; // struct initializer
complex<double> z = { 0, pi }; // use constructor
vector<double> v = { 0.0, 1.1, 2.2, 3.3 }; // use list constructor

For C-style initialization of arrays, see §7.3.1. For C-style structures, see §8.2. For user-defined types with constructors, see §2.3.2 or §16.2.5. For initializer-list constructors, see §17.3.4.

In the cases above, the = is redundant. However, some prefer to add it to emphasize that a set of values are used to initialize a set of member variables.

In some cases, function-style argument lists can also be used (§2.3, §16.2.5). For example:

complex<double> z(0,pi); // use constructor
vector<double> v(10,3.3); // use constructor: v gets 10 elements initialized to 3.3

In a declaration, an empty pair of parentheses, (), always means “function” (§12.1). So, if you want to be explicit about “use default initialization” you need {}. For example:

complex<double> z1(1,2); // function-style initializer (initialization by constructor)
complex<double> f1(); // function declaration

complex<double> z2 {1,2}; // initialization by constructor to {1,2}
complex<double> f2 {}; // initialization by constructor to the default value {0,0}

Note that initialization using the {} notation does not narrow (§6.3.5).

When using auto, a {}-list has its type deduced to std::initializer_list<T>. For example:

auto x1 {1,2,3,4}; // x1 is an initializer_list<int>
auto x2 {1.0, 2.25, 3.5 }; // x2 is an initializer_list of<double>
auto x3 {1.0,2}; // error: cannot deduce the type of {1.0,2} (§6.3.6.2)

6.3.6. Deducing a Type: auto and decltype()

The language provides two mechanisms for deducing a type from an expression:

• auto for deducing a type of an object from its initializer; the type can be the type of a variable, a const, or a constexpr.

• decltype(expr) for deducing the type of something that is not a simple initializer, such as the return type for a function or the type of a class member.

The deduction done here is very simple: auto and decltype() simply report the type of an expression already known to the compiler.

6.3.6.1. The auto Type Specifier

When a declaration of a variable has an initializer, we don’t need to explicitly specify a type. Instead, we can let the variable have the type of its initializer. Consider:

int a1 = 123;
char a2 = 123;
auto a3 = 123; // the type of a3 is "int"

The type of the integer literal 123 is int, so a3 is an int. That is, auto is a placeholder for the type of the initializer.

There is not much advantage in using auto instead of int for an expression as simple as 123. The harder the type is to write and the harder the type is to know, the more useful auto becomes. For example:

template<class T> void f1(vector<T>& arg)
{
for (vector<T>::iterator p = arg.begin(); p!=arg.end(); ++p)
*p = 7;

for (auto p = arg.begin(); p!=arg.end(); ++p)
*p = 7;
}

The loop using auto is the more convenient to write and the easier to read. Also, it is more resilient to code changes. For example, if I changed arg to be a list, the loop using auto would still work correctly whereas the first loop would need to be rewritten. So, unless there is a good reason not to, use auto in small scopes.

If a scope is large, mentioning a type explicitly can help localize errors. That is, compared to using a specific type, using auto can delay the detection of type errors. For example:

void f(double d)
{
constexpr auto max = d+7;
int a[max]; // error: array bound not an integer
// ...
}

If auto causes surprises, the best cure is typically to make functions smaller, which most often is a good idea anyway (§12.1).

We can decorate a deduced type with specifiers and modifiers (§6.3.1), such as const and & (reference; §7.7). For example:

void f(vector<int>& v)
{
for (const auto& x : v) { // x is a const int&
// ...
}
}

Here, auto is determined by the element type of v, that is, int.

Note that the type of an expression is never a reference because references are implicitly dereferenced in expressions (§7.7). For example:

void g(int& v)
{
auto x = v; // x is an int (not an int&)
auto& y = v; // y is an int&
}

6.3.6.2. auto and {}-lists

When we explicitly mention the type of an object we are initializing, we have two types to consider: the type of the object and the type of the initializer. For example:

char v1 = 12345; // 12345 is an int
int v2 = 'c'; // 'c' is a char
T v3 = f();

By using the {}-initializer syntax for such definitions, we minimize the chances for unfortunate conversions:

char v1 {12345}; // error: narrowing
int v2 {'c'}; // fine: implicit char->int conversion
T v3 {f()}; // works if and only if the type of f() can be implicitly converted to a T

When we use auto, there is only one type involved, the type of the initializer, and we can safely use the = syntax:

auto v1 = 12345; // v1 is an int
auto v2 = 'c'; // v2 is a char
auto v3 = f(); // v3 is of some appropriate type

In fact, it can be an advantage to use the = syntax with auto, because the {}-list syntax might surprise someone:

auto v1 {12345}; // v1 is a list of int
auto v2 {'c'}; // v2 is a list of char
auto v3 {f()}; // v3 is a list of some appropriate type

This is logical. Consider:

auto x0 {}; // error: cannot deduce a type
auto x1 {1}; // list of int with one element
auto x2 {1,2}; // list of int with two elements
auto x3 {1,2,3}; // list of int with three elements

The type of a homogeneous list of elements of type T is taken to be of type initializer_list<T> (§3.2.1.3, §11.3.3). In particular, the type of x1 is not deduced to be int. Had it been, what would be the types of x2 and x3?

Consequently, I recommend using = rather than {} for objects specified auto whenever we don’t mean “list.”

6.3.6.3. The decltype() Specifier

We can use auto when we have a suitable initializer. But sometimes, we want to have a type deduced without defining an initialized variable. Then, we can use a declaration type specifier: decltype(expr) is the declared type of expr. This is mostly useful in generic programming. Consider writing a function that adds two matrices with potentially different element types. What should be the type of the result of the addition? A matrix, of course, but what might its element type be? The obvious answer is that the element type of the sum is the type of the sum of the elements. So, I can declare:

template<class T, class U>
auto operator+(const Matrix<T>& a, const Matrix<U>& b) -> Matrix<decltype(T{}+U{})>;

I use the suffix return type syntax (§12.1) to be able to express the return type in terms of the arguments: Matrix<decltype(T{}+U{})>. That is, the result is a Matrix with the element type being what you get from adding a pair of elements from the argument Matrixes: T{}+U{}.

In the definition, I again need decltype() to express Matrix’s element type:

template<class T, class U>
auto operator+(const Matrix<T>& a, const Matrix<U>& b) -> Matrix<decltype(T{}+U{})>
{
Matrix<decltype(T{}+U{})> res;
for (int i=0; i!=a.rows(); ++i)
for (int j=0; j!=a.cols(); ++j)
res(i,j) += a(i,j) + b(i,j);
return res;
}

6.4. Objects and Values

We can allocate and use objects that do not have names (e.g., created using new), and it is possible to assign to strange-looking expressions (e.g., *p[a+10]=7). Consequently, we need a name for “something in memory.” This is the simplest and most fundamental notion of an object. That is, an object is a contiguous region of storage; an lvalue is an expression that refers to an object. The word “lvalue” was originally coined to mean “something that can be on the left-hand side of an assignment.” However, not every lvalue may be used on the left-hand side of an assignment; anlvalue can refer to a constant (§7.7). An lvalue that has not been declared const is often called a modifiable lvalue. This simple and low-level notion of an object should not be confused with the notions of class object and object of polymorphic type (§3.2.2, §20.3.2).

6.4.1. Lvalues and Rvalues

To complement the notion of an lvalue, we have the notion of an rvalue. Roughly, rvalue means “a value that is not an lvalue,” such as a temporary value (e.g., the value returned by a function).

If you need to be more technical (say, because you want to read the ISO C++ standard), you need a more refined view of lvalue and rvalue. There are two properties that matter for an object when it comes to addressing, copying, and moving:

• Has identity: The program has the name of, pointer to, or reference to the object so that it is possible to determine if two objects are the same, whether the value of the object has changed, etc.

• Movable: The object may be moved from (i.e., we are allowed to move its value to another location and leave the object in a valid but unspecified state, rather than copying; §17.5).

It turns out that three of the four possible combinations of those two properties are needed to precisely describe the C++ language rules (we have no need for objects that do not have identity and cannot be moved). Using “m for movable” and “i for has identity,” we can represent this classification of expressions graphically:

So, a classical lvalue is something that has identity and cannot be moved (because we could examine it after a move), and a classical rvalue is anything that we are allowed to move from. The other alternatives are prvalue (“pure rvalue”), glvalue (“generalized lvalue”), and xvalue (“x” for “extraordinary” or “expert only”; the suggestions for the meaning of this “x” have been quite imaginative). For example:

void f(vector<string>& vs)
{
vector<string>& v2 = std::move(vs); // move vs to v2
// ...
}

Here, std::move(vs) is an xvalue: it clearly has identity (we can refer to it as vs), but we have explicitly given permission for it to be moved from by calling std::move() (§3.3.2, §35.5.1).

For practical programming, thinking in terms of rvalue and lvalue is usually sufficient. Note that every expression is either an lvalue or an rvalue, but not both.

6.4.2. Lifetimes of Objects

The lifetime of an object starts when its constructor completes and ends when its destructor starts executing. Objects of types without a declared constructor, such as an int, can be considered to have default constructors and destructors that do nothing.

We can classify objects based on their lifetimes:

• Automatic: Unless the programmer specifies otherwise (§12.1.8, §16.2.12), an object declared in a function is created when its definition is encountered and destroyed when its name goes out of scope. Such objects are sometimes called automatic objects. In a typical implementation, automatic objects are allocated on the stack; each call of the function gets its own stack frame to hold its automatic objects.

• Static: Objects declared in global or namespace scope (§6.3.4) and statics declared in functions (§12.1.8) or classes (§16.2.12) are created and initialized once (only) and “live” until the program terminates (§15.4.3). Such objects are called static objects. A static object has the same address throughout the life of a program execution. Static objects can cause serious problems in a multi-threaded program because they are shared among all threads and typically require locking to avoid data races (§5.3.1, §42.3).

• Free store: Using the new and delete operators, we can create objects whose lifetimes are controlled directly (§11.2).

• Temporary objects (e.g., intermediate results in a computation or an object used to hold a value for a reference to const argument): their lifetime is determined by their use. If they are bound to a reference, their lifetime is that of the reference; otherwise, they “live” until the end of the full expression of which they are part. A full expression is an expression that is not part of another expression. Typically, temporary objects are automatic.

• Thread-local objects; that is, objects declared thread_local (§42.2.8): such objects are created when their thread is and destroyed when their thread is.

Static and automatic are traditionally referred to as storage classes.

Array elements and nonstatic class members have their lifetimes determined by the object of which they are part.

6.5. Type Aliases

Sometimes, we need a new name for a type. Possible reasons include:

• The original name is too long, complicated, or ugly (in some programmer’s eyes).

• A programming technique requires different types to have the same name in a context.

• A specific type is mentioned in one place only to simplify maintenance.

For example:

using Pchar = char*; // pointer to character
using PF = int(*)(double); // pointer to function taking a double and returning an int

Similar types can define the same name as a member alias:

template<class T>
class vector {
using value_type = T; // every container has a value_type
// ...
};

template<class T>
class list {
using value_type = T; // every container has a value_type
// ...
};

For good and bad, type aliases are synonyms for other types rather than distinct types. That is, an alias refers to the type for which it is an alias. For example:

Pchar p1 = nullptr; // p1 is a char*
char* p3 = p1; // fine

People who would like to have distinct types with identical semantics or identical representation should look at enumerations (§8.4) and classes (Chapter 16).

An older syntax using the keyword typedef and placing the name being declared where it would have been in a declaration of a variable can equivalently be used in many contexts. For example:

typedef int int32_t; // equivalent to "using int32_t = int;"
typedef short int16_t; // equivalent to "using int16_t = short;"
typedef void(*PtoF)(int); // equivalent to "using PtoF = void(*)(int);"

Aliases are used when we want to insulate our code from details of the underlying machine. The name int32_t indicates that we want it to represent a 32-bit integer. Having written our code in terms of int32_t, rather than “plain int,” we can port our code to a machine with sizeof(int)==2 by redefining the single occurrence of int32_t in our code to use a longer integer:

using int32_t = long;

The _t suffix is conventional for aliases (“typedefs”). The int16_t, int32_t, and other such aliases can be found in <stdint> (§43.7). Note that naming a type after its representation rather than its purpose is not necessarily a good idea (§6.3.3).

The using keyword can also be used to introduce a template alias (§23.6). For example:

template<typename T>
using Vector = std::vector<T, My_allocator<T>>;

We cannot apply type specifiers, such as unsigned, to an alias. For example:

using Char = char;
using Uchar = unsigned Char; // error
using Uchar = unsigned char; // OK

6.6. Advice

[1] For the final word on language definition issues, see the ISO C++ standard; §6.1.

[2] Avoid unspecified and undefined behavior; §6.1.

[3] Isolate code that must depend on implementation-defined behavior; §6.1.

[4] Avoid unnecessary assumptions about the numeric value of characters; §6.2.3.2, §10.5.2.1.

[5] Remember that an integer starting with a 0 is octal; §6.2.4.1.

[6] Avoid “magic constants”; §6.2.4.1.

[7] Avoid unnecessary assumptions about the size of integers; §6.2.8.

[8] Avoid unnecessary assumptions about the range and precision of floating-point types; §6.2.8.

[9] Prefer plain char over signed char and unsigned char; §6.2.3.1.

[10] Beware of conversions between signed and unsigned types; §6.2.3.1.

[11] Declare one name (only) per declaration; §6.3.2.

[12] Keep common and local names short, and keep uncommon and nonlocal names longer; §6.3.3.

[13] Avoid similar-looking names; §6.3.3.

[14] Name an object to reflect its meaning rather than its type; §6.3.3.

[15] Maintain a consistent naming style; §6.3.3.

[16] Avoid ALL_CAPS names; §6.3.3.

[17] Keep scopes small; §6.3.4.

[18] Don’t use the same name in both a scope and an enclosing scope; §6.3.4.

[19] Prefer the {}-initializer syntax for declarations with a named type; §6.3.5.

[20] Prefer the = syntax for the initialization in declarations using auto; §6.3.5.

[21] Avoid uninitialized variables; §6.3.5.1.

[22] Use an alias to define a meaningful name for a built-in type in cases in which the built-in type used to represent a value might change; §6.5.

[23] Use an alias to define synonyms for types; use enumerations and classes to define new types; §6.5.