Generic Programming - Discovering Modern C++. An Intensive Course for Scientists, Engineers, and Programmers+ (2016)

Discovering Modern C++. An Intensive Course for Scientists, Engineers, and Programmers(2016)

Chapter 3. Generic Programming

Templates are a feature of C++ to create functions and classes that operate on parametric (generic) types. As a result, a function or class can work with many different data types without being manually rewritten for each one.

Generic Programming is sometimes considered synonymous with template programming. But this is not correct. Generic programming is a programming paradigm aiming for maximal applicability while providing correctness. Its main tools are templates. Mathematically it is founded onFormal Concept Analysis [15]. In generic programming, the template programs are completed with documentation of sufficient conditions for correct usage. One could say that generic programming is the responsible fashion of template programming.

3.1 Function Templates

A Function Template—also called a generic function—is a blueprint to generate a potentially infinite number of function overloads. In everyday conversation, the term Template Function is more often used than function template, whereas the latter is the correct term from the standard. Within this book, we use both terms and they have the exact same meaning.

Suppose we want to write a function max(x, y) where x and y are variables or expressions of some type. Using function overloading, we can easily do this as follows:

int max (int a, int b) double max (double a, double b)
{ {
if (a > b) if (a > b)
return a; return a;
else else
return b; return b;
} }

Note that the function body is exactly the same for both int and double. With the template mechanism, we can write just one generic implementation:

template <typename T>
T max (T a, T b)
{
if (a > b)
return a;
else
return b;
}

This function template replaces the non-template overloads and we keep the name max. It can be used in the same fashion as the overloaded functions:

std::cout Image "The maximum of 3 and 5 is " Image max(3, 5) Image '\n';
std::cout Image "The maximum of 3l and 5l is " Image max(3l, 5l) Image '\n';
std::cout Image "The maximum of 3.0 and 5.0 is " Image max(3.0, 5.0) Image '\n';

In the first case, 3 and 5 are literals of type int and the max function is Instantiated to

int max (int, int);

Likewise the second and third calls of max instantiate:

long max (long, long);
double max (double, double);

since the literals are interpreted as long and double. In the same way, the template function can be called with variables and expressions:

unsigned u1= 2, u2= 8;
std::cout Image "The maximum of u1 and u2 is " Image max(u1, u2) Image '\n';
std::cout Image "The maximum of u1*u2 and u1+u2 is "
Image max(u1*u2, u1+u2) Image '\n';

Here the function is instantiated for unsigned.

Instead of typename, one can also write class in this context, but we do not recommend this because typename expresses the intention of generic functions better.

3.1.1 Instantiation

What does Instantiation mean? For a non-generic function, the compiler reads its definition, checks for errors, and generates executable code. When the compiler processes a generic function’s definition, it can only detect errors that are independent of the template parameters like parsing errors. For instance:

template <typename T>
inline T max (T a, T b)
{
if a > b // Error !
return a;
else
return b;
}

would not compile because the if-statement without the parentheses is not a legal expression of the C++ grammar.

However, most errors we encounter depend on the substituted types. For instance, the following implementation would compile:

template <typename T>
inline T max(T x, T y)
{
return x < y ? y.value : x.value;
}

We could not call it with any intrinsic type like int or double, but the function template might not be intended for intrinsic types and may work with the actual argument types.

The compilation of the function template itself does not generate any code in the binary. This only happens when we call it. In this case, we instantiate this function template. Only then the compiler performs a complete check of whether the generic function is correct for the given argument type(s). In our previous examples, we saw that max can be instantiated with int and double.

So far, we have seen the most implicit form: the template is instantiated when a call exists and the type parameter is deduced from the arguments. To be more explicit, we can declare the type that substitutes the template parameter, e.g.:

std::cout Image max<float>(8.1, 9.3) Image '\n';

Here, the template is explicitly instantiated with a given type. In the most explicit form, we force an instantiation without a function call:

template short max<short>(short, short);

This can be useful when we generate object files (§7.2.1.3) and must guarantee that certain instances are present, regardless of the function calls in the compile unit.

Definition 3–1. For conciseness, we call an instantiation with type deduction Implicit Instantiation and an instantiation with an explicit type declaration Explicit Instantiation.

In our experience, implicit instantiation works in most cases as expected. The explicit nomination of the instantiation type is mostly needed for disambiguation and special usages like std::forward (§3.1.2.4). For a deeper understanding of templates, it is very helpful to know how the compiler substitutes the types.

3.1.2 Parameter Type Deduction

Image c++11/template_type_deduction.cpp

In this section, we will have a closer look at how template parameters are substituted depending on whether the arguments are passed by value, lvalue, or rvalue reference. This knowledge is even more important when variables are declared with an automatic type via auto as shown inSection 3.4.1. However, the substitution rules are more intuitive with function parameters than with auto variables and we therefore discuss them here.

3.1.2.1 Value Parameters

In the previous example, we used the type parameter T directly as a function parameter in max:

template <typename T>
T max (T a, T b);

Like any other function parameter, those of function templates can be const- and reference-qualified as well:

template <typename T>
T max (const T& a, const T& b);

Let us denote this (without loss of generalization) for a unary void function f:

template <typename TPara>
void f(FPara p);

where FPara contains TPara. When we call f(arg), the compiler has to Deduce the type TPara such that parameter p can be initialized with arg. This is the whole story in a nutshell. But let us look at some cases to get a feeling for it. The easiest syntactical case is the equality ofTPara and FPara:

template <typename TPara>
void f1(TPara p);

This means the function parameter is a local copy of the argument. We call f1 with an int literal, an int variable, and a mutable and constant int reference:

template <typename TPara>
void f1(TPara p) {}

int main ()
{
int i= 0;
int& j= i;
const int& k= i;

f1(3);
f1(i);
f1(j);
f1(k);
...
}

In all four instantiations, TPara is substituted with int so that the type of the function parameter p is int as well. If TPara was substituted with int& or const int&, the arguments could be passed in as well. But then we would have no value semantics since a modification of p would affect the function argument (e.g., j). Thus, when the function parameter is a type parameter without qualification, TPara becomes the argument type where all qualifiers are removed. This template function accepts all arguments as long as their types are copyable.

For instance, unique_ptr has a deleted copy constructor and can only be passed to this function as an rvalue:

unique_ptr<int> up;
// f1(up); // Error: no copy constructor
f1(move(up)); // Okay: use move constructor

3.1.2.2 Lvalue-Reference Parameters

To really accept every argument, we can use a constant reference as a parameter:

template <typename TPara>
void f2(const TPara& p) {}

TPara is again the argument type with all qualifiers stripped off. Thus, p is a constant reference of the unqualified argument type so we cannot modify p.

A more interesting case is the mutable reference as a parameter:

template <typename TPara>
void f3(TPara& p) {}

This function rejects all literals and temporaries as they are not referable.1 We can phrase this also in terms of type substitution: the temporaries are refused because there exists no type for TPara such that TPara& becomes int&& (we will come back to this when we talk about reference collapsing in Section 3.1.2.3).

1. Formally, neither literals nor temporaries would be acceptable as const reference parameters for the same reason, but the language makes an exception here for the sake of programmers’ convenience.

When we pass in an ordinary int variable like i, TPara is substituted by int so p has the type int& and refers to i. The same substitution can be observed when a mutable reference variable like j is passed. What happens when we pass a const int or const int& like k? Can this be matched with TPara&? Yes, it can, when TPara is substituted with const int. Accordingly, the type of p is const int&. Thus, the type pattern TPara& does not limit the arguments to mutable references. The pattern can match constant references. However, if the function modifies p, the instantiation would fail later.

3.1.2.3 Forward References

Image

In Section 2.3.5.1, we introduced rvalue references that only accept rvalues as arguments. Rvalue references with a type parameter of the form T&& accept lvalues as well. For this reason, Scott Meyers coined the term Universal Reference for them. Here we stick with the standard termForward Reference. We will show why they can accept both rvalues and lvalues. To this end, we look at the type substitution of the following unary function:

template <typename TPara>
void f4(TPara&& p) {}

When we pass an rvalue to this function, e.g.:

f4(3);
f4(move(i));
f4(move(up));

TPara is substituted by the unqualified argument type—here int and unique_ptr<int>—and the type of p is the corresponding rvalue reference.

When we call f4 with an lvalue like i or j, the compiler accepts these arguments as template rvalue-reference parameters. The type parameter TPara is substituted by int& and this is also the type of p. How is this possible? The explanation is found in Table 3–1, which shows how references of references are collapsed.

Image

Table 3–1: Reference Collapsing

The résumé of Table 3–1 is that references are collapsed to an lvalue reference when at least one of them is an lvalue reference (loosely said, very loosely, we can take the minimal number of ampersands). This explains the handling of lvalues in f4. TPara is substituted by int& and an rvalue reference thereof is int&, too.

The lack of type substitution is the reason why the non-template rvalue references do not accept lvalues. The only reason why the function parameter can be an lvalue is that an lvalue reference is introduced by the substitution. Without this substitution, no lvalue reference is involved and references are not collapsed.

A more detailed and more dramatic telling of the whole type deduction story is found in [32, pages 9–35 and 157–214].

3.1.2.4 Perfect Forwarding

Image

We have already seen that lvalues can be turned into rvalues with move (§2.3.5.4). This time we want to cast them conditionally. A forward reference parameter accepts both rvalue and lvalue arguments that are held by rvalue and lvalue references respectively. When we pass such a reference parameter to another function, we want the lvalue reference to be passed as an lvalue and the rvalue reference as an rvalue. However, the references themselves are lvalues in both cases (since they have names). We could cast a reference to an rvalue with move but this would also apply to an lvalue reference.

Here, we need a conditional cast. This is achieved by std::forward. It casts an rvalue reference into an rvalue and leaves an lvalue as it is. forward must be instantiated with the (unqualified) type parameter, e.g.:

template <typename TPara>
void f5(TPara&& p)
{
f4(forward<TPara>(p));
}

The argument of f5 is passed with the same value category to f4. Whatever was passed as an lvalue to f5 is passed as an lvalue to f4; likewise for every rvalue. Like move, forward is a pure cast and does not generate a single machine operation. People have phrased this as: move does not move and forward does not forward. They rather cast their arguments to be moved or forwarded.

3.1.3 Dealing with Errors in Templates

Back to our max example that works for all numeric types. But what happens with types that provide no operator>, for instance, std::complex<T>? Let us try to compile the following snippet:2

2. The double colons in front of max avoid ambiguities with the standard library’s max which some compilers may include implicitly (e.g., g++).

std::complex<float> z(3, 2), c(4, 8);
std::cout Image "The maximum of c and z is " Image ::max(c, z) Image '\n';

Our compilation attempt will end in an error like this:

Error: no match for Imageoperator>Image in Imagea > bImage

What happens when our template function calls another template function which in turn calls another one which . . . and so on? Likewise, these functions are only parsed and the complete check is delayed until instantiation. Let us look at the following program:

int main ()
{
vector<complex<float> > v;
sort(v.begin(), v.end());
}

Without going into detail, the problem is the same as before: we cannot compare complex numbers and thus we are unable to sort arrays of them. This time the missing comparison is discovered in an indirectly called function, and the compiler provides us the entire call and include stack so that we can trace back the error. Please try to compile this example on different compilers and see if you can make any sense out of the error messages.

If you run into such a lengthy error message,3 don’t panic! First, look at the error itself and take out what is useful for you: e.g., missing operator>, or something not assignable, or something that is const that should not be. Then find in the call stack the innermost code that is the part of your program, i.e., the location where you call a template function from the standard or a third-party library. Stare for a while at this code and its preceding lines because this is the most likely place where the error occurred (in our experience). Then ask yourself: Does a type of the function’s template arguments miss an operator or a function according to the error message?

3. The longest message we have heard of was 18MB which corresponds to about 9000 pages of text.

Do not get scared to the point that you decide not to use templates for the rest of your life. In most cases, the problem is much simpler than the never-ending error message makes us believe. In our experience, most errors in template functions can be found faster than run-time errors—with some training.

3.1.4 Mixing Types

Another question that we have not answered so far: What happens with our function max when we use two different types as arguments?

unsigned u1= 2;
int i= 3;
std::cout Image "The maximum of u1 and i is " Image max(u1, i) Image '\n';

The compiler tells us—this time surprisingly briefly—something like

Error: no match for function call Imagemax(unsigned int&, int)Image

Indeed, we assumed that both types are equal. But wait, does not C++ convert arguments implicitly when no exact match exists? Yes, it does, but not for template arguments. The template mechanism is supposed to provide enough flexibility on the type level. In addition, combining template instantiation with implicit conversion has such a high potential for ambiguities.

So far so bad. Can we write a function template with two template parameters? Of course we can. But this creates a new problem: What should be the return type of this template? There are different options. First, we could add a non-templated function overload like

int inline max (int a, int b) { return a > b ? a : b; }

This can be called with mixed types and the unsigned argument would be implicitly converted into an int. But what would happen if we also added another function overload for unsigned?

int max(unsigned a, unsigned b) { return a > b ? a : b; }

Will the int be converted into an unsigned or vice versa? The compiler does not know and will complain about this ambiguity.

At any rate, adding non-templated overloads to the template implementation is far from being elegant or productive. So, we remove all non-template overloads and look first at what we can do in the function call. We can explicitly convert one argument to the type of the other argument:

unsigned u1= 2;
int i= 3;
std::cout Image "max of u1 and i is " Image max(int(u1), i) Image '\n';

Now, max is called with two ints. Yet another option is specifying the template type explicitly in the function call:

std::cout Image "max of u1 and i is " Image max<int>(u1, i) Image '\n';

Then both parameters are int and the function template’s instance can be called when both arguments are either int or implicitly convertible to int.

After these less pleasant details on templates, some really good news: template functions perform as efficiently as their non-template counterparts! The reason is that C++ generates new code for every type or type combination that the function is called with. Java in contrast compiles templates only once and executes them for different types by casting them to the corresponding types. This results in faster compilation and shorter executables but takes more runtime.

Another price we have to pay for the fast templates is that we have longer executables because of the multiple instantiations for each type (combination). In extreme (and rare) cases, larger binaries can lead to slower execution when the faster memory4 is filled with assembly instructions and the data must be loaded from and stored to slower memory instead.

4. L2 and L3 caches are usually shared between data and instructions.

However, in practice the number of a function’s instances will not be that large, and it only matters for large functions not inlined. For inlined functions, the binary code is at any rate inserted directly in the executable at the location of the function call so the impact on the executable length is the same for template and non-template functions.

3.1.5 Uniform Initialization

Image

Uniform initialization (from §2.3.4) works with templates as well. However, in extremely rare cases, the brace elimination can cause some astonishing behavior. If you are curious or have already experienced some surprises, please read Appendix A, Section A.6.1.

3.1.6 Automatic return Type

Image

C++11 introduced lambdas with automatic return types whereas the return type of functions is still mandatory. In C++14, we can let the compiler deduce the return type:

template <typename T, typename U>
inline auto max (T a, U b)
{
return a > b ? a : b;
}

The return type is deduced from the expression in the return statement in the same fashion as parameters of function templates are deduced from the arguments. If the function contains multiple return statements, their deduced types must all be equal. In template libraries, sometimes simple functions have rather lengthy return type declarations—possibly even longer than the function body—and it is then a great relief for the programmer not to have to spell them out.

3.2 Namespaces and Function Lookup

Namespaces are not a sub-topic of generic programming (in fact they are orthogonal to it). However, they become more important in the presence of function templates, so this is a good place in the book to talk about them.

3.2.1 Namespaces

The motivation for namespaces is that common names like min, max, or abs can be defined in different contexts so the names are ambiguous. Even names that are unique when the function or class is implemented can collide later when more libraries are included or an included library evolves. For instance, there is typically a class named window in a GUI implementation, and there might be one in a statistics library. We can distinguish them by namespaces:

namespace GUI {
class window;
}

namespace statistics {
class window;
}

One possibility to deal with name conflicts is using different names like max, my_abs, or library_name_abs. This is in fact what is done in C. Main libraries normally use short function names, user libraries longer names, and OS-related internals typically start with _. This decreases the probability of conflicts, but not sufficiently. Namespaces are very important when we write our own classes and even more so when they are used in function templates. They allow us to hierarchically structure the names in our software. This avoids name clashes and provides sophisticated access control of function and class names.

Namespaces are similar to scopes; i.e., we can see the names in the surrounding namespaces:

struct global {};
namespace c1 {
struct c1c {};
namespace c2 {
struct c2c {};
struct cc {
global x;
c1c y;
c2c z;
};
} // namespace c2
} // namespace c1

Names that are redefined in an inner namespace hide those of the outer ones. In contrast to blocks, we can still refer to these names by Namespace Qualification:

struct same {};
namespace c1 {
struct same {};
namespace c2 {
struct same {};
struct csame {
::same x;
c1::same y;
same z;
};
} // namespace c2
} // namespace c1

As you have guessed, ::same refers to the type from the global namespace and c1::same to the name in c1. The member variable z has type c1::c2::same since the inner name hides the outer ones. Namespaces are sought from inside out. If we add a namespace c1 in c2, this will hide the outer one with the same name and the type of y is incorrect:

struct same {};
namespace c1 {
struct same {};
namespace c2 {
struct same {};
namespace c1 {} // hides ::c1
struct csame {
::same x;
c1::same y; // Error: c1::c2::c1::same not defined
same z;
};
} // namespace c2
} // namespace c1

Here, c1::same exists in the global namespace, but since c1 is hidden by c1::c2::c1, we cannot access it. We would observe similar hiding if we defined a class named c1 in namespace c2. We can avoid the hiding and be more explicit about the type of y by placing double colons in front of the namespace:

struct csame {
::c1::same y; // this is unique
};

This makes clear that we mean the name c1 in the global namespace and not any other name c1. Names of functions or classes that are often needed can be imported with a using declaration:

void fun( ... )
{
using c1::c2::cc;
cc x;
...
cc y;
}

This declaration works in functions and namespaces but not in classes (where it would collide with other using declarations). Importing a name into a namespace within header files considerably increases the danger of name conflicts because the name remains visible in all subsequent files of a compile unit. using within a function (even in header files) is less critical because the imported name is only visible till the end of the function.

Similarly, we can import an entire namespace with a using directive:

void fun( ... )
{
using namespace c1::c2;
cc x;
...
cc y;
}

As before, it can be done within a function or another namespace but not in a class scope. The statement

using namespace std;

is often the first one in a main function or even the first after the includes. Importing std in the global namespace has good potential for name conflicts, for instance, when we also define a class named vector (in the global namespace). Really problematic is the using directive in header files.

When namespace names are too long for us, especially for nested namespaces, we can rename them with a Namespace Alias:

namespace lname= long_namespace_name;
namespace nested= long_namespace_name::yet_another_name::nested;

As before, this should be done in an appropriate scope.

3.2.2 Argument-Dependent Lookup

Argument-Dependent Lookup, or ADL, expands the search of function names to the namespaces of their arguments—but not to their respective parent namespaces. This saves us from verbose namespace qualification for functions. Say we write the ultimate scientific library within the modest namespace rocketscience:

namespace rocketscience {
struct matrix {};
void initialize(matrix& A) { /* ... */ }
matrix operator+(const matrix& A, const matrix& B)
{
matrix C;
initialize(C); // not qualified, same namespace
add(A, B, C);
return C;
}
}

Every time we use the function initialize, we can omit the qualification for all classes in namespace rocketscience:

int main ()
{
rocketscience::matrix A, B, C, D;
rocketscience::initialize(B); // qualified
initialize(C); // rely on ADL

chez_herbert::matrix E, F, G;
rocketscience::initialize(E); // qualification needed
initialize(C); // Error: initialize not found
}

Operators are also subject to ADL:

A= B + C + D;

Imagine the previous expression without ADL:

A= rocketscience::operator+(rocketscience::operator+(B, C), D);

Similarly ugly and even more cumbersome is streaming I/O when the namespace must be qualified. Since user code should not be in namespace std::, the operatorImage for a class is preferably defined in that class’s namespace. This allows ADL to find the right overload for each type, e.g.:

std::cout Image A Image E Image B Image F Image std::endl;

Without ADL we would need to qualify the namespace of each operator in its verbose notation. This would turn the previous expression into

std::operatorImage(chez_herbert::operatorImage(
rocketscience::operatorImage(chez_herbert::operatorImage(
rocketscience::operatorImage(std::cout, A), E), B),
F), std::endl);

The ADL mechanism can also be used to select the right function template overload when the classes are distributed over multiple namespaces. The L1 norm from linear algebra is defined for both matrices and vectors, and we want to provide a template implementation for both:

template <typename Matrix>
double one_norm(const Matrix& A) { ... }

template <typename Vector>
double one_norm(const Vector& x) { ... }

How can the compiler know which overload we want? One possible solution is to introduce a namespace for matrices and one for vectors so that the correct overload can be selected by ADL:

namespace rocketscience {
namespace mat {
struct sparse_matrix {};
struct dense_matrix {};
struct über_matrix5 {}; // Sadly, ü is not allowed in C++

template <typename Matrix>
double one_norm(const Matrix& A) { ... }
}
namespace vec {
struct sparse_vector {};
struct dense_vector {};
struct über_vector {};

template <typename Vector>
double one_norm(const Vector& x) { ... }
}
}

5. Of course, we use the original German spelling of uber—sometimes even seen in American papers. Please note that special characters like ü are not allowed in names.

The ADL mechanism searches functions only in the namespaces of the arguments’ type declarations but not in their respective parent namespaces:

namespace rocketscience {
...
namespace vec {
struct sparse_vector {};
struct dense_vector {};
struct über_vector {};
}
template <typename Vector>
double one_norm(const Vector& x) { ... }
}

int main ()
{
rocketscience::vec::über_vector x;
double norm_x= one_norm(x); // Error: not found by ADL
}

When we import a name within another namespace, the functions in that namespace are not subject to ADL either:

namespace rocketscience {
...
using vec::über_vector;

template <typename Vector>
double one_norm(const Vector& x) { ... }
}

int main ()
{
rocketscience::über_vector x;
double norm_x= one_norm(x); // Error: not found by ADL
}

Relying on ADL only for selecting the right overload has its limitations. When we use a third-party library, we may find functions and operators that we also implemented in our namespace. Such ambiguities can be reduced (but not entirely avoided) by using only single functions instead of entire namespaces.

The probability of ambiguities rises further with multi-argument functions, especially when parameter types come from different namespaces, e.g.:

namespace rocketscience {
namespace mat {
...
template <typename Scalar, typename Matrix>
Matrix operator*(const Scalar& a, const Matrix& A) { ... }
}
namespace vec {
...
template <typename Scalar, typename Vector>
Vector operator*(const Scalar& a, const Vector& x) { ... }

template <typename Matrix, typename Vector>
Vector operator*(const Matrix& A, const Vector& x) { ... }
}
}
int main (int argc, char* argv[])
{
rocketscience::mat::über_matrix A;
rocketscience::vec::über_vector x, y;
y= A * x; // which overload is selected?
}

Here the intention is clear. Well, to human readers. For the compiler it is less so. The type of A is defined in rocketscience::mat and that of x in rocketscience::vec so that operator* is sought in both namespaces. Thus, all three template overloads are available and none of them is a better match than the others (although probably only one would compile).

Unfortunately, explicit template instantiation does not work with ADL. Whenever template arguments are explicitly declared in the function call, the function name is not sought in the namespaces of the arguments.6

6. The problem is that ADL is performed too late in the compilation and the opening angle bracket is already misinterpreted as less-than. To overcome this issue, the function must be made visible by namespace qualification or import via using (more details in §14.8.1.8 of the standard).

Which function overload is called depends on the so-far discussed rules on

• Namespace nesting and qualification,

• Name hiding,

• ADL, and

• Overload resolution.

This non-trivial interplay must be understood for frequently overloaded functions to ascertain that no ambiguity occurs and the right overload is selected. Therefore, we give some examples in Appendix A.6.2. Feel free to postpone this discussion until you get baffled with unexpected overload resolutions or ambiguities when dealing with a larger code base.

3.2.3 Namespace Qualification or ADL

Many programmers do not want to get into the complicated rules of how a compiler picks an overload or runs into ambiguities. They qualify the namespace of the called function and know exactly which function overload is selected (assuming the overloads in that namespace are not ambiguous within the overload resolution). We do not blame them; the name lookup is anything but trivial.

When we plan to write good generic software containing function and class templates instantiatable with many types, we should consider ADL. We will demonstrate this with a very popular performance bug (especially in C++03) that many programmers have run into. The standard library contains a function template called swap. It swaps the content of two objects of the same type. The old default implementation used copies and a temporary:

template <typename T>
inline void swap(T& x, T& y)
{
T tmp(x); x= y; y= tmp;
}

It works for all types with copy constructor and assignment. So far, so good. Say we have two vectors, each containing 1GB of data. Then we have to copy 3GB and also need a spare gigabyte of memory when we use the default implementation. Or we do something smarter: we switch the pointers referring to the data and the size information:

template <typename Value>
class vector
{
...
friend inline void swap(vector& x, vector& y)
{ std::swap(x.my_size, y.my_size); std::swap(x.data, y.data); }
private:
unsigned my_size;
Value *data;
};

Note that this example contains an inline-friend function. This declares a free function which is a friend of the contained class. Apparently, this is shorter than separate friend and function declarations.

Assume we have to swap data of a parametric type in some generic function:

template <typename T, typename U>
inline void some_function(T& x, T& y, const U& z, int i)
{
...
std::swap(x, y); // can be expensive
...
}

We played it safe and used the standard swap function which works with all copyable types. But we copied 3GB of data. It would be much faster and memory-efficient to use our implementation that only switches the pointers. This can be achieved with a small change in a generic manner:

template <typename T, typename U>
inline void some_function(T& x, T& y, const U& z, int i)
{
using std::swap;
...
swap(x, y); // involves ADL
...
}

With this implementation, both swap overloads are candidates but the one in our class is prioritized by overload resolution as its argument type is more specific than that of the standard implementation. More generally, any implementation for a user type is more specific than std::swap. In fact, std::swap is already overloaded for standard containers for the same reason. This is a general pattern:


Use using

Do not qualify namespaces of function templates for which user-type overloads might exist. Make the name visible instead and call the function unqualified.


As an addendum to the default swap implementation: Since C++11, the default is to move the values between the two arguments and the temporary:

Image

template <typename T>
inline void swap(T& x, T& y)
{
T tmp(move(x));
x= move(y);
y= move(tmp);
}

As a result, types without user-defined swap can be swapped efficiently when they provide a fast move constructor and assignment. Only types without user implementation and move support are finally copied.

3.3 Class Templates

In the previous section, we described the use of templates to create generic functions. Templates can also be used to create generic classes. Analogous to generic functions, class template is the correct term from the standard whereas template class (or templated class) is more frequently used in daily life. In these classes, the types of data members can be parameterized.

This is in particular useful for general-purpose container classes like vectors, matrices, and lists. We could also extend the complex class with a parametric value type. However, we have already spent so much time with this class that it seems more entertaining to look at something else.

3.3.1 A Container Example

Image c++11/vector_template.cpp

Let us, for example, write a generic vector class, in the sense of linear algebra not like an STL vector. First, we implement a class with the most fundamental operators only:

Listing 3–1: Template vector class


template <typename T>
class vector
{
public:
explicit vector(int size)
: my_size(size), data( new Tmy_size] )
{}

vector(const vector& that)
: my_size(that.my_size), data(new T[my_size])
{
std::copy(&that.data[0], &that.data[that.my_size], &data[0]);
}

int size() const { return my_size; }

const T& operator[](int i) const
{
check_index(i);
return data[i];
}
// ...

private:
int my_size;
std::unique_ptr<T[]> data;
};


The template class is not essentially different from a non-template class. There is only the extra parameter T as a placeholder for the type of its elements. We have member variables like my_size and member functions size() that are not affected by the template parameter. Other functions like the bracket operator or the copy constructor are parameterized, still resembling their non-template equivalent: wherever we had double before, we put the type parameter T as for return types or in allocations. Likewise, the member variable data is just parameterized by T.

Template parameters can be defaulted. Assume that our vector class parameterizes not only the value type, but also orientation and location:

struct row_major {}; // just for tagging
struct col_major {}; // ditto
struct heap {};
struct stack {};

template <typename T= double, typename Orientation= col_major,
typename Where= heap>
class vector;

The arguments of a vector can be fully declared:

vector<float, row_major, heap> v;

The last argument is equal to the default value and can be omitted:

vector<float, row_major> v;

As for functions, only the final arguments can be omitted. For instance, if the second argument is the default and the last one is not, we must write them all:

vector<float, col_major, stack> w;

When all template parameters are set to default values, we can of course omit them all. However, for grammar reasons not discussed here, the angle brackets still need to be written:

vector x; // Error: it is considered a non-template class
vector<> y; // looks a bit strange but is correct

Other than the defaults of function arguments, the template defaults can refer to preceding parameters:

template <typename T, typename U= T>
class pair;

This is a class for two values that might have different types. If not we can declare the type just once:

pair<int, float> p1; // object with an int and float value
pair<int> p2; // object with two int values

The default can even be expressions of preceding parameters as we will see in Chapter 5.

3.3.2 Designing Uniform Class and Function Interfaces

Image c++03/accumulate_example.cpp

When we write generic classes and functions, we can ask ourselves the chicken-and-egg question: what comes first? We have the choice to write function templates first and adapt our classes to them by realizing the corresponding methods. Alternatively, we can develop the interface of our classes first and implement generic functions against this interface.

The situation changes a little bit when our generic functions should be able to handle intrinsic types or classes from the standard library. These classes cannot be changed, and we should adapt our functions to their interface. There are other options that we will introduce later: specialization and meta-programming, which allow for type-dependent behavior.

As a case study, we use the function accumulate from the Standard Template Library, Section 4.1. It was developed at a time when programmers used pointers and ordinary arrays even more frequently than today. Thus, the STL creators Alex Stepanov and David Musser established an extremely versatile interface that works for pointers and arrays as well as on all containers of their library.

3.3.2.1 Genuine Array Summation

In order to sum the entries of an array generically, the first thing that comes to mind is probably a function taking the address and size of the array:

template <typename T>
T sum(const T* array, int n)
{
T sum(0);
for (int i= 0; i < n; ++i)
sum+= array[i];
return sum;
}

This function can be called as expected:

int ai[]= {2, 4, 7};
double di[]= {2., 4.5, 7.};

cout Image "sum ai is " Image sum(ai, 3) Image '\n';
cout Image "sum ad is " Image sum(ad, 3) Image '\n';

However, we might wonder why we need to pass the size of the array. Could not the compiler deduce it for us? After all, it is known during compilation. In order to use compiler deduction, we introduce a template parameter for the size and pass the array by reference:

template <typename T, unsigned N> // more about non-type templates in §3.7
T sum(const T (&array)[N])
{
T sum(0);
for (int i= 0; i < N; ++i)
sum+= array[i];
return sum;
}

The syntax looks admittedly a bit strange: we need the parentheses to declare a reference of an array as opposed to an array of references. This function can be called with a single argument:

cout Image "sum ai is " Image sum(ai) Image '\n';
cout Image "sum ad is " Image sum(ad) Image '\n';

Now, the type and the size are deduced. This in turn means that if we sum over two arrays of the same type and different size, the function will be instantiated twice. Nonetheless, it should not affect the executable size since such small functions are usually inlined anyway.

3.3.2.2 Summing List Entries

A list is a simple data structure whose elements contain a value and a reference to the next element (and sometimes to the previous one, too). In the C++ standard library, the class template std::list is a double-linked list (§4.1.3.3), and a list without back-references was introduced in C++11 as std::forward_list. Here, we only consider forward references:

template <typename T>
struct listntry
{
listntry(const T& value) : value(value), next(0) {}

T value;
listntry<T>* next;
};

template <typename T>
struct list
{
list() : first(0), last(nullptr) {}
~list()
{
while (first) {
listntry<T> *tmp= first->next;
delete first;
first= tmp;
}
}
void append(const T& x)
{
last= (first? last->next : first)= new listntry<T>(x);
}
listntry<T> *first, *last;
};

This list implementation is actually really minimalistic and a bit terse. With the interface at hand, we can set up a small list:

list<float> l;
l.append(2.0f); l.append(4.0f); l.append(7.0f);

Please feel free to enrich our code with useful methods like the initializer_list construction.

A summation function for this list is straightforward:

Listing 3–2: Sum of list entries


template <typename T>
T sum(const list<T>& l)
{
T sum= 0;
for (auto entry= l.first; entry != nullptr; entry= entry->next)
sum+= entry->value;
return sum;
}


and can be called in the obvious way. We highlighted the details that differ from the array implementation.

3.3.2.3 Commonalities

When we are aiming for a common interface, we first have to ask ourselves: How similar are these two implementations of sum? At first glance not very:

• The values are accessed differently;

• The traversal of the entries is realized differently; and

• The termination criterion is different.

However, on a more abstract level, both functions perform the same tasks:

• Access of data

• Progress to the next entry

• Check for the end

The difference between the two implementations is how these tasks are realized with the given interfaces of the types. Thus, in order to provide a single generic function for both types, we have to establish a common interface.

3.3.2.4 Alternative Array Summation

In Section 3.3.2.1, we accessed the array in an index-oriented style that cannot be applied on lists arbitrarily dispersed in memory—at least not efficiently. Therefore, we reimplement the array summation here in a more sequential style with a stepwise traversal. We can achieve this by incrementing pointers until we pass the end of the array. The first address beyond the array is &a[n] or, more concisely with pointer arithmetic, a + n. Figure 3–1 illustrates that we start our traversal at the address of a and stop when we reach a+n. Thus, we specify the range of entries by a right-open interval of addresses.

Image

Figure 3–1: An array of length n with begin and end pointers

When software is written for maximal applicability, right-open intervals turn out to be more versatile than closed intervals, especially for types like lists where positions are represented by memory addresses randomly allocated. The summation over a right-open interval can be implemented as shown in Listing 3–3.

Listing 3–3: Sum of array entries


template <typename T>
inline T accumulate_array(T* a, T* a_end)
{
T sum(0);
for (; a != a_end; ++a)
sum+= *a;
return sum;
}


and used as follows:

int main (int argc, char* argv[])
{
int ai[]= {2, 4, 7};
double ad[]= {2., 4.5, 7.};

cout Image "sum ai is " Image accumulate_array(ai, &ai[3]) Image '\n';
cout Image "sum ad is " Image accumulate_array(ad, ad+3) Image '\n';

A pair of pointers representing a right-open interval as above is a Range: a very important concept in C++. Many algorithms in the standard library are implemented for ranges of pointer-like objects in a similar style to accumulate_array. To use such functions for new containers, we only need to provide this pointer-like interface. As an example, we will now demonstrate for our list how we can adapt its interface.

3.3.2.5 Generic Summation

The two summation functions in Listing 3–2 and Listing 3–3 look quite different because they are written for different interfaces. Functionally, they are not so different.

In Section 3.3.2.3, we stated about the sum implementations from Section 3.3.2.1 and Section 3.3.2.2:

• They both traverse the sequence from one element to the next.

• They both access the value of the current element and add it to sum.

• They both test whether the end of the sequence is reached.

The same holds for our revised array implementation in Section 3.3.2.4. However, the latter uses an interface with a more abstract notion of incrementally traversing a sequence. As a consequence, it is possible to apply it to another sequence like a list when it provides this sequential interface.

The ingenious idea of Alex Stepanov and David Musser in STL was to introduce a common interface for all container types and traditional arrays. This interface consisted of generalized pointers called Iterators. Then all algorithms were implemented for those iterators. We will discuss this more extensively in Section 4.1.2 and give only a little foretaste here.

Image c++03/accumulate_example.cpp

What we need now is an iterator for our list that provides the necessary functionality in a pointer-like syntax, namely:

• Traverse the sequence with ++it;

• Access a value with *it; and

• Compare iterators with == or !=.

The implementation is straightforward:

template <typename T>
struct list_iterator
{
using value_type= T;

list_iterator(listntry<T>* entry) : entry(entry) {}

T& operator*() { return entry->value; }

const T& operator*operator*() const
{ return entry->value; }

list_iterator<T> operator++()
{ entry= entry->next; return *this; }

bool operator!=(const list_iterator<T>& other) const
{ return entry != other.entry; }

listntry<T>* entry;
};

and for convenience, to add a begin and end method to our list:

template <typename T>
struct list
{
list_iterator<T> begin() { return list_iterator<T>(first); }
list_iterator<T> end() { return list_iterator<T>(0); }
}

The list_iterator allows us to merge Listing 3–2 and Listing 3–3 together to accumulate:

Listing 3–4: Generic summation


template <typename Iter, typename T>
inline T accumulate(Iter it, Iter end, T init)
{
for (; it != end; ++it)
init+= *it;
return init;
}


This generic sum can be used in the following form for both arrays and lists:

cout Image "array sum = " Image sum(a, a+10, 0.0) Image '\n';
cout Image "list sum = " Image sum(l.begin(), l.end(), 0) Image '\n';

As before, the key to success was finding the right abstraction: the iterator.

The list_iterator implementation is also a good opportunity to finally answer the question why iterators should be pre- and not post-incremented. We have already seen that the pre-increment updates the entry member and returns a reference to the iterator. The post-increment must return the old value and increment its internal state such that the following list entry is referred to when the iterator is used next time. Unfortunately, this can only be achieved when the post-increment operation copies the entire iterator before changing member data and returns this copy:

template <typename T>
struct list_iterator
{
list_iterator<T> operator++(int)
{
list_iterator<T> tmp(*this);
p= p->next;
return tmp;
}
};

Often we call the increment operation only to pass to the next entry and don’t care about the value returned by the operation. Then it is just a waste of resources to create an iterator copy that is never used. A good compiler might optimize away the surplus operations but there is no point in taking chances. A funny detail of the post-increment definition is the fake int parameter that is only present for distinction from the pre-increment definition.

3.4 Type Deduction and Definition

C++ compilers already deduce types automatically in C++03 for arguments of function templates. Let f be a template function and we call

f(g(x, y, z) + 3 * x)

Then the compiler can deduce the type of f’s argument.

3.4.1 Automatic Variable Type

Image

When we assign the result of an expression like the preceding one to a variable, we need to know the type of this expression in C++03. On the other hand, if we assign to a type to which the result is not convertible, the compiler will let us know while providing the incompatible types. This shows that the compiler knows the type, and in C++11, this knowledge is shared with the programmer.

The easiest way to use the type information in the previous example is the auto-matic variable type:

auto a= f(g(x, y, z) + 3 * x);

This does not change the fact that C++ is strongly typed. The auto type is different from dynamic types in other languages like Python. In Python, an assignment to a can change the type of a, namely, to that of the assigned expression. In C++11, the variable a has the type of the expression’s result, and this type will never change afterward. Thus, the auto type is not an automatic type that adapts to everything that is assigned to the variable but is determined once only.

We can declare multiple auto variables in the same statement as long as they are all initialized with an expression of the same type:

auto i= 2 * 7.5, j= std::sqrt(3.7); // okay: both are double
auto i= 2 * 4, j= std::sqrt(3.7); // Error: i is int, j double
auto i= 2 * 4, j; // Error: j not initialized
auto v= g(x, y, z); // result of f

We can qualify auto with const and reference attributes:

auto& ri= i; // reference on i
const auto& cri= i; // constant reference on i
auto&& ur= g(x, y, z); // forward reference to result of f

The type deduction with auto variables works exactly like the deduction of function parameters, as described in Section 3.1.2. This means, for instance, that the variable v is not a reference even when g returns a reference. Likewise, the universal reference ur is either an rvalue or an lvalue reference depending on the result type of f being an rvalue or lvalue (reference).

3.4.2 Type of an Expression

Image

The other new feature in C++11 is decltype. It is like a function that returns the type of an expression. If f in the first auto example returns a value, we could also express it with decltype:

decltype(f(g(x, y, z) + 3 * x)) a= f(g(x, y, z) + 3 * x);

Obviously, this is too verbose and thus not very useful in this context.

The feature is very important in places where an explicit type is needed: first of all as a template parameter for class templates. We can, for instance, declare a vector whose elements can hold the sum of two other vectors’ elements, e.g., the type of v1[0] + v2[0]. This allows us to express the appropriate return type for the sum of two vectors of different types:

template <typename Vector1, typename Vector2>
auto operator+(const Vector1& v1, const Vector2& v2)
-> vector< decltype(v1[0] + v2[0]) >;

This code snippet also introduces another new feature: Trailing Return Type. In C++11, we are still obliged to declare the return type of every function. With decltype, it can be more handy to express it in terms of the function arguments. Therefore, we can move the declaration of thereturn type behind the arguments.

The two vectors may have different types and the resulting vector yet another one. With the expression decltype(v1[0] + v2[0]) we deduce what type we get when we add elements of both vectors. This type will be the element type for our resulting vector.

An interesting aspect of decltype is that it only operates on the type level and does not evaluate the expression given as an argument. Thus, the expression from the previous example does not cause an error for empty vectors because v1[0] is not performed but only its type is determined.

The two features auto and decltype differ not only in their application; the type deduction is also different. While auto follows the rules of function template parameters and often drops reference and const qualifiers, decltype takes the expression type as it is. For instance, if the function f in our introductory example returned a reference, the variable a would be a reference. A corresponding auto variable would be a value.

As long as we mainly deal with intrinsic types, we get along without automatic type detection. But with advanced generic and meta-programming, we can greatly benefit from these extremely powerful features.

3.4.3 decltype(auto)

Image

This new feature closes the gap between auto and decltype. With decltype(auto), we can declare auto variables that have the same type as with decltype. The following two declarations are identical:

decltype(expr) v= expr; // redundant + verbose when expr long
decltype(auto) v= expr; // Ahh! Much better.

The first statement is quite verbose: everything we add to expr we have to add twice in the statement. And with every modification we must pay attention that the two expressions are still identical.

Image c++14/value_range_vector.cpp

The preservation of qualifiers is also important in automatic return types. As an example we introduce a view on vectors that tests whether the values are in a given range. The view will access an element of the viewed vector with operator[] and return it after the range test with exactly the same qualifiers. Obviously a job for decltype(auto). Our example implementation of this view only contains a constructor and the access operator:

template <typename Vector>
class value_range_vector
{
using value_type= typename Vector::value_type;
using size_type= typename Vector::size_type;
public:
value_range_vector(Vector& vref, value_type minv, value_type maxv)
: vref(vref), minv(minv), maxv(maxv)
{}

decltype(auto) operator[](size_type i)
{
decltype(auto) value= vref[i];
if (value < minv) throw too_small{};
if (value > maxv) throw too_large{};
return value;
}
private:
Vector& vref;
value_type minv, maxv;
};

Our access operator caches the element from vref for the range checks before it is returned. Both the type of the temporary and the return type are deduced with decltype(auto). To test that vector elements are returned with the right type, we store one in a decltype(auto)variable and inspect its type:

int main ()
{
using Vec= mtl::vector<double>;
Vec v= {2.3, 8.1, 9.2};

value_range_vector<Vec> w(v, 1.0, 10.0);
decltype(auto) val= w[1];
}

The type of val is double& as wanted. The example uses decltype(auto) three times: twice in the view implementation and once in the test. If we replaced only one of them with auto, the type of val would become double.

3.4.4 Defining Types

Image

There are two ways to define types: with typedef or with using. The former was introduced in C and existed in C++ from the beginning. This is also its only advantage: backward compatibility.7 For writing new software without the need of compiling with pre-11 compilers, we highly recommend you

7. This is the only reason why examples in this book sometimes still use typedef.


Advice

Use using instead of typedef.


It is more readable and more powerful. For simple type definitions, it is just a question of order:

typedef double value_type;

versus

using value_type= double;

In a using declaration, the new name is positioned on the left while a typedef puts it on the right side. For declaring an array, the new type name is not the right-most part of a typedef and the type is split into two parts:

typedef double da1[10];

In contrast to it, within the using declaration, the type remains in one piece:

using da2= double[10];

The difference becomes even more pronounced for function (pointer) types—which you will hopefully never need in type definitions. std::function in §4.4.2 is a more flexible alternative. For instance, declaring a function with a float and an int argument that returns a float reads

typedef float float_fun1(float, int);

versus

using float_fun2= float (float, int);

In all these examples, the using declaration clearly separates the new type name from the definition.

In addition, the using declaration allows us to define Template Aliases. These are definitions with type parameters. Assume we have a template class for tensors of arbitrary order and parameterizable value type:

template <unsigned Order, typename Value>
class tensor { ... };

Now we like to introduce the type names vector and matrix for tensors of first and second order, respectively. This cannot be achieved with typedef but easily by template aliases via using:

template <typename Value>
using vector= tensor<1, Value>;

template <typename Value>
using matrix= tensor<2, Value>;

When we throw the output of the following lines:

std::cout Image "type of vector<float> is "
Image typeid(vector<float>).name() Image '\n';
std::cout Image "type of matrix<float> is "
Image typeid(matrix<float>).name() Image '\n';

into a name demangler, we will see

type of vector<float> is tensor<1u, float>
type of matrix<float> is tensor<2u, float>

Resuming, if you have experience with typedef, you will appreciate the new opportunities in C++11, and if you are new in the type definition business, you should start with using right away.

3.5 A Bit of Theory on Templates: Concepts

“Gray, dear friend, is all theory and green the life’s golden tree.”8

—Johann Wolfgang von Goethe

8. Author’s translation. Original: “Grau, teurer Freund, ist alle Theorie und grün des Lebens goldner Baum.”

In the previous sections, you might have gotten the impression that template parameters can be substituted by any type. This is in fact not entirely true. The programmer of template classes and functions makes assumptions about the operations that can be performed on the template arguments.

Thus, it is very important to know which argument types are acceptable. We have seen, for instance, that accumulate can be instantiated with int or double. Types without addition like a solver class (on page 74) cannot be used for accumulate. What should be accumulated from a set of solvers? All the requirements for the template parameter T of function accumulate can be summarized as follows:

• T is copy-constructable: T a(b); is compilable when the type of b is T.

• T is plus-assignable: a+= b; compiles when the type of a and b is T.

• T can be constructed from int: T a(0); compiles.

Such a set of type requirements is called a Concept. A concept CR that contains all requirements of concept C and possibly additional requirements is called a Refinement of C. A type t that holds all requirements of concept C is called a Model of C. Plus-assignable types are, for instance,int, float, double, and even string.

A complete definition of a template function or class should contain the list of required concepts as is done for functions from the Standard Template Library; see http://www.sgi.com/tech/stl/.

Today such requirements are only documentation. Future C++ standards will most likely support concepts as a central language feature. A technical specification mainly by Andrew Sutton, “C++ Extensions for Concepts,” [46] is in progress and may be part of C++17.

3.6 Template Specialization

On one hand, it is a great advantage that we can use the same implementation for many arguments types. For some argument types we may, however, know a more efficient implementation, and this can be realized in C++ with Template Specialization. In principle, we could even implement an entirely different behavior for certain types at the price of utter confusion. Thus, the specialization will be more efficient but behave the same. C++ provides enormous flexibility, and we as programmers are in charge of using this flexibility responsibly and of being consistent with ourselves.

3.6.1 Specializing a Class for One Type

Image c++11/vector_template.cpp

In the following, we want to specialize our vector example from Listing 3.3.1 for bool. Our goal is to save memory by packing 8 bool values into one byte. Let us start with the class definition:

template <>
class vector<bool>
{
// ..
};

Although our specialized class is not type-parametric anymore, we still need the template keyword and the empty triangle brackets. The name vector was declared to be a class template before, and we need this seemingly surplus template notation to show that the following definition is a specialization of the Primary Template. Thus, defining or declaring a template specialization before the primary template is an error. In a specialization, we must provide within the angle brackets a type for each template parameter. These values may be parameters themselves (or expressions thereof). For instance, if we specialize for one out of three parameters, the two others are still declared as template parameters:

template <template T1, template T3>
class some_container<T1, int, T3>
{
// ..
};

Back to our boolean vector class: our primary template defines a constructor for an empty vector and one containing n elements. For the sake of consistency, we should define the same. With the non-empty vector, we have to round up the data size when the number of bits is not divisible by 8:

template <>
class vector<bool>
{
public:
explicit vector(int size)
: my_size(size), data(new unsigned char*(my_size+7) / 8*)
{}
vector() : my_size(0) {}
private:
int my_size;
std::unique_ptr<unsigned char[]> data;
};

You may have noticed that the default constructor is identical to that of the primary template. Unfortunately, the method is not “inherited” to the specialization. Whenever we write a specialization, we have to define everything from scratch or use a common base class.9

9. The author is trying to overcome this verbosity in future standards [16].

We are free to omit member functions or variables from the primary template, but for the sake of consistency we should do this only for good reasons, for very good reasons. For instance, we might omit the operator+ because we have no addition for bool. The constant access operator is implemented with shifting and bit masking:

template <> class vector<bool>
{
bool operator[](int i) const
{ return (data[i/8] Image i%8) & 1; }
};

The mutable access is trickier because we cannot refer to a single bit. The trick is to return a Proxy which provides read and write operations for a single bit:

template <> class vector<bool>
{
vector_bool_proxy operator[](int i)
vector_bool_proxy operator[](int i)
{ return {data[i/8], i%8}; }
};

The return statement uses a braced list to call the two-argument constructor. Let us now implement our proxy to manage a specific bit within vector<bool>. Obviously, the class needs a reference to the containing byte and the position within this byte. To simplify further operations, we create a mask that has one bit on the position in question and zero bits on all other positions:

class vector_bool_proxy
{
public:
vector_bool_proxy(unsigned char& byte, int p)
: byte(byte), mask(1 Image p) {}

private:
unsigned char& byte;
unsigned char mask;
};

The reading access is realized with a conversion to bool where we simply mask the referred byte:

class vector_bool_proxy
{
operator bool() const { return byte & mask; }
};

Only when the considered bit is 1 in byte, the bitwise AND yields a non-zero value that is evaluated to true in the conversion from unsigned char to bool.

Setting a bit is realized by an assignment operator for bool:

class vector_bool_proxy
{
vector_bool_proxy& operator=(bool b)
{
if (b)
byte|= mask;
else
byte&=mask;
return *this;
}
};

The assignment is simpler to implement when we distinguish between the assigned values. When the argument is true, we apply an OR with the mask so that the bit in the considered position is switched on. All other positions remain unchanged since OR with 0 has no effect (0 is the identity element of bitwise OR). Conversely, with false as an argument, we first invert the mask and apply it with AND to the byte reference. Then the mask’s zero bit on the active position turns the bit off. On all other positions, the AND with one bit conserves the old bit values.

With this specialization of vector for bool, we use only about an eighth of the memory. Nonetheless, our specialization is (mostly) consistent with the primary template: we can create vectors and read and write them in the same fashion. To be honest, the compressed vector is not perfectly identical to the primary template, e.g., when we take references of elements or when type deduction is involved. However, we made the specialization as similar as possible to the generic version and in most situations we will not realize the differences and it will work in the same way.

3.6.2 Specializing and Overloading Functions

In this section, we discuss and assess the advantages and disadvantages of template specialization for functions.

3.6.2.1 Specializing a Function to a Specific Type

Functions can be specialized in the same manner as classes. Unfortunately, they do not participate in overload resolution, and a less specific overload is prioritized over the more specific template specialization; see [44]. For that reason Sutter and Alexandrescu give in [45, Item 66] the following:


Advice

Do not use function template specialization!


To provide a special implementation for one specific type or type tuple as above, we can simply use overloading. This works better and is even simpler, e.g.:

#include <cmath>

template <typename Base, typename Exponent>
Base inline power(const Base& x, const Exponent& y) { ... }
double inline power(double x, double y)
{
return std::pow(x, y);
}

Functions with many specializations are best implemented by means of class specialization. This allows for full and partial specialization without taking all overloading and ADL rules into consideration. We will show this in Section 3.6.4.

If you ever feel tempted to write hardware-specific specializations in assembler code, try to resist. If you cannot, please read first the few remarks in Appendix A.6.3.

3.6.2.2 Ambiguities

In the previous examples, we specialized all parameters of the function. It is also possible to specialize some of them in overloads and leave the remaining parameter(s) as template(s):

template <typename Base, typename Exponent>
Base inline power(const Base& x, const Exponent& y);

template <typename Base>
Base inline power(const Base& x, int y);

template <typename Exponent>
double inline power(double x, const Exponent& y);

The compiler will find all overloads that match the argument combination and select the most specific, which is supposed to provide the most efficient special-case implementation. For instance, power(3.0, 2u) will match for the first and third overload where the latter is more specific. To put it in terms of higher math:10 type specificity is a partial order that forms a lattice, and the compiler picks the maximum of the available overloads. However, you do not need to dive deeply into algebra to see which type or type combination is more specific.

10. For those who like higher mathematics. And only for those.

If we called power(3.0, 2) with the previous overloads, all three would match. However, this time we cannot determine the most specific overload. The compiler will tell us that the call is ambiguous and show us overloads 2 and 3 as candidates. As we implemented the overloads consistently and with optimal performance we might be happy with either choice but the compiler will not choose. To disambiguate, we must add a fourth overload:

double inline power(double x, int y);

The lattice experts will immediately say: “Of course, we were missing the join in the specificity lattice.” But even without this expertise, most of us understand why the call was ambiguous with the three overloads and why the fourth one rescued us. In fact, the majority of C++ programmers get along without studying lattices.

3.6.3 Partial Specialization

When we implement template classes, we will sooner or later run into situations where we want to specialize a template class for another template class. Suppose we have a template complex and vector and want to specialize the latter for all instances of complex. It would be quite annoying doing this one by one:

template <>
class vector<complex<float> >;

template <>
class vector<complex<double> >; // again ??? :-/

template <>
class vector<complex<long double> >; // how many more ??? :-P

This is not only inelegant, it also destroys our ideal of universal applicability because the complex class supports all Real types and our specialization above only takes a limited number thereof into account. In particular, instances of complex with future user types cannot be considered for obvious reasons.

The solution that avoids the implementation redundancy and the ignorance of new types is Partial Specialization. We specialize our vector class for all complex instantiations:

template <typename Real>
class vector<complex<Real> >
{ ... };

If you use a compiler without C++11 support, pay attention to put spaces between closing >; otherwise your compiler may interpret two subsequent > as shift operator Image, leading to rather confusing errors. Although this book mainly addresses C++11 programming, we still keep the separating spaces for readability.

Image

Partial specialization also works for classes with multiple parameters, for instance:

template <typename Value, typename Parameters>
class vector<sparse_matrix<Value, Parameters> >
{ ... };

We can also specialize for all pointers:

template <typename T>
class vector<T*>
{ ... };

Whenever the set of types is expressible by a Type Pattern, we can apply partial specialization on it.

Partial template specialization can be combined with regular template specialization from §3.6.1—let us call it Full Specialization for distinction. In this case, the full specialization is prioritized over the partial one. Between different partial specializations the most specific is selected. In the following example:

template <typename Value, typename Parameters>
class vector<sparse_matrix<Value, Parameters> >
{ ... };

template <typename Parameters>
class vector<sparse_matrix<float, Parameters> > { ... };

the second specialization is more specific than the first one and picked when it matches. In fact, a full specialization is always more specific than any partial one.

3.6.4 Partially Specializing Functions

Function templates actually cannot be specialized partially. We can, however, as for the full specialization (§3.6.2.1), use overloading to provide special implementations. For that purpose, we write more specific function templates that are prioritized when they match. As an example, we overload the generic abs with an implementation for all complex instances:

template <typename T>
inline T abs(const T& x)
{
return x < T(0) ? -x : x;
}

template <typename T>
inline T abs(const std::complex<T>& x)
{
return sqrt(real(x)*real(x) + imag(x)*imag(x));
}

Overloading of function templates is easy to implement and works reasonably well. However, for massively overloaded functions or for overloads spread over many files of a large project, sometimes the intended overload is not called. The reason is the non-trivial interplay of the already challenging namespace resolution with the overload resolution of mixed template and non-template functions.

Image c++14/abs_functor.cpp

To ascertain a predictable specialization behavior, it is safest to implement it internally in terms of class template specialization and only provide a single function template as a user interface. The challenging part here is the return type of this single function when the return types for the specializations vary. As in our abs example: the general code returns the argument type while the more specific complex version returns the underlying value type. This can be handled in a portable way so that it works even with C++03. The newer standards, however, provide features to simplify this task.

We start with the easiest implementation by using C++14:

Image

template <typename T> struct abs_functor;

template <typename T>
decltype(auto) abs(const T& x)
{
return abs_functor<T>()(x);
}

Our generic abs function creates an anonymous object abs_functor<T>() and calls its operator() with the argument x. Thus, the corresponding specialization of abs_functor needs a default constructor (usually implicitly generated) and an operator() as a unary function accepting an argument of type T. The return type of operator() is automatically deduced. For abs, we could most likely deduce the return type with auto instead since all different specializations should return a value. Just for the unlikely case that some specialization might beconst- or reference-qualified, we use decltype(auto) to pass on the qualifiers.

When we program with C++11, we have to declare the return type explicitly. At least, this declaration can apply type deduction:

Image

template <typename T>
auto abs(const T& x) -> decltype(abs_functor<T>()(x))
{
return abs_functor<T>()(x);
}

It is admittedly redundant to repeat abs_functor<T>()(x) and any redundancy is a potential source of inconsistency.

Back in C++03, we cannot use type deduction at all for the return type. Thus, the functor must provide it, say, by a typedef named result_type:

Image

template <typename T>
typename abs_functor<T>::result_type
abs(const T& x)
{
return abs_functor<T>()(x);
}

Here we have to rely on the implementor(s) of abs_functor that result_type is consistent with the return type of operator().

Finally, we implement the functor with a partial specialization for complex<T>:

template <typename T>
struct abs_functor
{
typedef T result_type;

T operator()(const T& x)
{
return x < T(0) ? -x : x;
}
};

template <typename T>
struct abs_functor<std::complex<T> >
{
typedef T result_type;

T operator()(const std::complex<T>& x)
{
return sqrt(real(x)*real(x) + imag(x)*imag(x));
}
};

This is a portable implementation working with all three implementations of abs. When we drop the support for C++03, we can omit the typedef in the templates. This abs_functor can be specialized further for any reasonable type pattern without the trouble we may run into with massively overloaded functions.

3.7 Non-Type Parameters for Templates

So far, we have used template parameters only for types. Values can be template arguments as well. Not all values but all integral types, i.e., integer numbers and bool. For completeness, pointers are also allowed but we will not explore this here.

Image c++11/fsize_vector.cpp

Very popular is the definition of short vectors and small matrices with the size as a template parameter:

template <typename T, int Size>
class fsize_vector
{
using self= fsize_vector;
public:
using value_type= T;
const static int my_size= Size;

fsize_vector(int s= Size) { assert(s == Size); }

self& operator=(const self& that)
{
std::copy(that.data, that.data + Size, data);
return *this;
}

self operator+(const self& that) const
{
self sum;
for (int i= 0; i < my_size; ++i)
sum[i]= data[i] + that[i];
return sum;
}
// ...

private:
T data[my_size];
};

Since the size is already provided as a template parameter, we do not need to pass it to the constructor. However, for establishing a uniform interface for vectors, we still accept a size argument at construction and check that it matches the template argument.

Comparing this implementation with the dynamically sized vector in Section 3.3.1, we will not see many differences. The essential distinction is that the size is now part of the type and that it can be accessed at compile time. As a consequence, the compiler can perform additional optimizations. When we add two vectors of size 3, for instance, the compiler can transform the loop into three statements like this:

self operator+(const self& that) const
{
self sum;
sum[0]= data[0] + that[0];
sum[1]= data[1] + that[1];
sum[2]= data[2] + that[2];
return sum;
}

This saves the counter incrementation and the test for the loop end. Possibly the operations are performed in parallel on an SSE. We will talk more about loop unrolling in Section 5.4.

Which optimization is induced by additional compile-time information is of course compiler-dependent. One can only find out which transformation is actually done by reading the generated assembler code or indirectly by observing performance and comparing it with other implementations. Reading assembler is difficult, especially with a high optimization level. With less aggressive optimization, we might not see the benefit from the static size.

In the example above, the compiler will probably unroll the loop as shown for small sizes like 3 and keep the loop for larger sizes like 100. Therefore, these compile-time sizes are particularly interesting for small matrices and vectors, e.g., three-dimensional coordinates or rotations.

Another benefit of knowing the size at compile time is that we can store the values in an array so that our fsize_vector uses a single memory block. This makes the creation and destruction much easier compared to dynamically allocated memory that is expensive to manage.

We mentioned before that the size becomes part of the type. As a consequence, we do not need to check matching sizes for vectors of the same type.

We said that the size becomes part of the type. The careful reader might have realized that we omitted the checks for whether the vectors have the same size. We do not need these tests anymore. If an argument has the same class type, it has the same size implicitly. Consider the following program snippet:

fsize_vector<float, 3> v;
fsize_vector<float, 4> w;
vector<float> x(3), y(4);

v= w;
x= y;

The last two lines are incompatible vector assignments. The difference is that the incompatibility in the second assignment x= y; is discovered at run time in our assertion. The assignment v= w; does not even compile because fixed-size vectors of dimension 3 only accept vectors of the same dimension as arguments.

If we want, we can declare default values for non-type template arguments. Living in our three-dimensional world, it makes sense to assume that many vectors have dimension 3:

template <typename T, int Size= 3>
class fsize_vector
{ /* ... */ };

fsize_vector<float> v, w, x, y;

fsize_vector<float, 4> space_time;
fsize_vector<float, 11> string;

For relativity and string theory, we can afford the extra work of declaring their vector dimensions.

3.8 Functors

In this section, we introduce an extremely powerful feature: Functors, a.k.a. Function Objects. At first glance, they are just classes that provide an operator callable like a function. The crucial difference from ordinary functions is that function objects can be more flexibly applied to each other or to themselves, allowing us to create new function objects. These applications need some time to get used to, and reading this section is probably more challenging than the preceding ones. However, we reach here an entirely new quality of programming and every minute spent on this reading is worth it. This section also paves the way for lambdas (§3.9) and opens the door to meta-programming (Chapter 5).

As a study case, we develop a mathematical algorithm for computing the finite difference of a differentiable function f. The finite difference is an approximation of the first derivative by

Image

where h is a small value also called spacing.

A general function for computing the finite difference is presented in Listing 3-5. We implement this in the function fin_diff, which takes an arbitrary function (from double to double) as an argument:

Listing 3–5: Finite differences with function pointers


double fin_diff(double f(double), double x, double h)
{
return ( f(x+h) - f(x) ) / h;
}

double sin_plus_cos(double x)
{
return sin(x) + cos(x);
}

int main() {
cout Image fin_diff(sin_plus_cos, 1., 0.001) Image '\n';
cout Image fin_diff(sin_plus_cos, 0., 0.001) Image '\n';
}


Here we approximated the derivative of sin_plus_cos at x = 1 and x = 0 with h = 0.001. sin_plus_cos is passed as a function pointer (functions can be implicitly converted to function pointers when necessary).

Now we want to compute the second-order derivative. It would make sense to call fin_diff with itself as an argument. Unfortunately, this is not possible since fin_diff has three parameters and does not match its own function pointer parameter with only one parameter.

We can solve this problem with Functors or Function Objects. These are classes that provide an application operator() so that objects thereof can be called like functions, explaining the term “function objects.” Unfortunately, it is not obvious in many texts whether the term refers to a class or an object. This might not be problematic in those contexts but we need a sharp distinction between classes and objects. Therefore we prefer the word functor despite its other meaning in category theory. In this book, functor always refers to a class and an object thereof is accordingly called a functor object. Whenever we use the term function object it is synonymous with functor object.

Back to our example. The previously used function sin_plus_cos implemented as a functor reads as follows:

Listing 3–6: Function object


struct sc_f
{
double operator() (double x) const
{
return sin(x) + cos(x);
}
};


A great advantage of functors is the ability to hold parameters as internal states. So we could scale x with α in the sin function, i.e., sin αx + cos x:

Listing 3–7: Function object with state


class psc_f
{
public:
psc_f(double alpha) : alpha(alpha) {}
double operator() (double x) const
{
return sin(alpha * x) + cos(x);
}
private:
double alpha;
};


Notation: In this section, we introduce a fair number of types and objects. For better distinction, we use the following naming conventions: Functor types are named with the suffix _f like psc_f and objects thereof have the suffix _o. An approximated derivative is prefixed with d_, the second derivative with dd_, and higher derivatives with d followed by its order, like d7_ for the seventh derivative. For brevity’s sake, we will not state for each derivative that it is only approximated (the derivatives of orders around 20 are actually so incorrect that approximation is presumptuous).

3.8.1 Function-like Parameters

Image c++11/derivative.cpp

After defining our functor types, we have to find out how we can pass objects thereof to functions. Our previous definition of fin_diff had a function pointer as an argument which we cannot use for our functor objects. Furthermore, we cannot use a specific argument type when we want to support different functors, e.g., sc_f and psc_f. There are essentially two techniques for accepting arguments of different types: inheritance and templates. The inheritance version is postponed to Section 6.1.4 until we have actually introduced this feature. Right now, we have to mention that the generic approach is superior in applicability and performance. Thus, we use a type parameter for our functors and functions:

template <typename F, typename T>
T inline fin_diff(F f, const T& x, const T& h)
{
return (f(x+h) - f(x)) / h;
}

int main()
{
psc_f psc_o(1.0);
cout Image fin_diff(psc_o, 1., 0.001) Image endl;
cout Image fin_diff(psc_f(2.0), 1., 0.001) Image endl;
cout Image fin_diff(sin_plus_cos, 0., 0.001) Image endl;
}

In this example, we create the functor object psc_o and pass it as a template argument to fin_diff. The next call passes the on-the-fly-created object psc_f(2.0) to the differentiation. In the last call of fin_diff, we demonstrate that we can still pass in an ordinary function assin_plus_cos.

These three examples show that the parameter f is quite versatile. This raises the question of how versatile. From how we use f we deduce that it must be a function taking one argument. The STL (§4.1) introduces for these requirements the concept UnaryFunction:

• Let f be of type F.

• Let x be of type T, where T is the argument type of F.

• f(x) calls f with one argument and returns an object of the result type.

Since we perform all calculations with values of type T, we should add the requirement that the return type of f is T as well.

3.8.2 Composing Functors

So far, we have looked at different kinds of function parameters for our calculations. Unfortunately, we are not much closer to our goal of computing higher derivatives elegantly by passing fin_diff as an argument to itself. The problem is that fin_diff needs a unary function as an argument while being a ternary function itself. We can can overcome this discrepancy by defining a unary functor11 that holds the the function to differentiate and the step size as internal states:

11. For conciseness, we call a functor whose objects are unary functions a unary functor.

template <typename F, typename T>
class derivative
{
public:
derivative(const F& f, const T& h) : f(f), h(h) {}

T operator()(const T& x) const
{
return ( f(x+h) - f(x) ) / h;
}
private:
const F& f;
T h;
};

Then only x is still passed as a regular function argument to the differentiation. This functor can be instantiated with a functor representing12 f(x) and the result is a functor for the approximated f′(x):

12. This is another abbreviating phrasing: when we say functor ft represents f(x), we mean that an object of ft computes f(x).

using d_psc_f= derivative<psc_f, double>;

Here the derivative of f(x) = sin(α · x) + cos x is represented by the functor d_psc_f. We can now create a function object for the derivative with α = 1:

psc_f psc_o(1.0);
d_psc_f d_psc_o(psc_o, 0.001);

This allows us to calculate the differential quotient at x = 0:

cout Image "der. of sin(0) + cos(0) is " Image d_psc_o(0.0) Image '\n';

Well, we could do this before. The fundamental difference from our preceding solutions is the similarity of the original function and its derivative. They are both unary functions created from functors.

Thus, we have finally reached our goal: we can treat f′(x) the same way we treated f(x) and build f″(x) from it. More technically phrased: we can instantiate derivative with the functor d_psc_f of the derived function:

using dd_psc_f= derivative<d_psc_f, double>;

Now we have indeed a functor for the second derivative. We demonstrate this by creating a function object of it and approximate f″(0):

dd_psc_f dd_psc_o(d_psc_o, 0.001);
cout Image "2 nd der. of sin(0) + cos(0) is " Image dd_psc_o(0.0) Image '\n';

Since dd_psc_f is again a unary functor, we can create one for the third derivative and higher.

In case we need the second derivative from multiple functions we can invest some more effort in creating the second derivative directly without bothering the user to create the first derivative. The following functor creates a function object for the first derivative in the constructor and approximates f″(x):

template <typename F, typename T>
class second_derivative
{
public:
second_derivative(const F& f, const T& h)
: h(h), fp(f, h) {}

T operator()(const T& x) const
{
return ( fp(x+h) - fp(x) ) / h;
}
private:
T h;
derivative<F, T> fp;
};

Now we can build a function object for f″ from f:

second_derivative<psc_f, double> dd_psc_2_o(psc_f(1.0), 0.001);

In the same fashion we could build a generator for each higher-order derivative. Better yet we will now realize a functor for approximating a derivative of arbitrary order.

3.8.3 Recursion

When we think of how we would implement the third, fourth, or in general the nth derivative, we realize that they would look much like the second one: calling the (n-1)th derivative on x+h and x. We can explore this repetitive scheme with a recursive implementation:

template <typename F, typename T, unsigned N>
class nth_derivative
{
using prev_derivative= nth_derivative<F, T, N-1>;
public:
nth_derivative(const F& f, const T& h)
: h(h), fp(f, h) {}

T operator()(const T& x) const
{
return ( fp(x+h) - fp(x) ) / h;
}
private:
T h;
prev_derivative fp;
};

To rescue the compiler from infinite recursion, we must stop this mutual referring when we reach the first derivative. Note that we cannot use if or ?: to stop the recursion because both of its respective branches are eagerly evaluated and one of them still contains the infinite recursion. Recursive template definitions are terminated with a specialization like the following:

template <typename F, typename T>
class nth_derivative<F, T, 1>
{
public:
nth_derivative(const F& f, const T& h) : f(f), h(h) {}

T operator()(const T& x) const
{
return ( f(x+h) - f(x) ) / h;
}
private:
const F& f;
T h;
};

This specialization is identical to the class derivative that we now could throw away. Or we keep it and reuse its functionality by simply deriving from it (more about derivation in Chapter 6).

template <typename F, typename T>
class nth_derivative<F, T, 1>
: public derivative<F, T>
{
using derivative<F, T>::derivative;
};

Now we can compute any derivative like the 22nd:

nth_derivative<psc_f, double, 22> d22_psc_o(psc_f(1.0), 0.00001);

The new object d22_psc_o is again a unary function object. Unfortunately, it approximates so badly that we are too ashamed to present the results here. From Taylor series, we know that the error of the f″ approximation is reduced from O(h) to O(h2) when a backward difference is applied to the forward difference. This said, maybe we can improve our approximation when we alternate between forward and backward differences:

template <typename F, typename T, unsigned N>
class nth_derivative
{
using prev_derivative= nth_derivative<F, T, N-1>;
public:
nth_derivative(const F& f, const T& h) : h(h), fp(f, h) {}

T operator()(const T& x) const
{
return N & 1 ? ( fp(x+h) - fp(x) ) / h
: ( fp(x) - fp(x-h) ) / h;
}
private:
T h;
prev_derivative fp;
};

Sadly, our 22nd derivative is still as wrong as before, well, slightly worse. Which is particularly frustrating when we become aware of the fact that we evaluate f over four million times. Decreasing h doesn’t help either: the tangent better approaches the derivative but, on the other hand, the values of f(x) and f(x ± h) approach each other and their differences remain only few meaningful bits. At least the second derivative improved by our alternating difference schemes as Taylor series teach us. Another consolidating fact is that we probably did not pay for the alteration. The template argument N is known at compile time and so is the result of the condition N&1. Thus, the compiler can shrink the if-statement to the accordingly active then- or else-branch.

If nothing else we learned something about C++ and we are confirmed in the


Truism

Not even the coolest programming can substitute for solid mathematics.


In the end, this book is primarily about programming. And the functors proved to be extremely expressive for generating new function objects. Nonetheless, if any reader has a good idea for a numerically better recursive computation, feel free to contact the author.

There is only one detail still disturbing us: the redundancy between the functor arguments and those of the constructors. Say we compute the seventh derivative of psc_o:

nth_derivative<psc_f, double, 7> d7_psc_o(psc_o, 0.00001);

The first two arguments of nth_derivative are exactly the types of the constructor arguments. This is redundant and we preferred to deduce them. auto and decltype are no big help here:

auto d7_psc_o= nth_derivative<psc_f, double, 7>(psc_o, 0.00001);
nth_derivative<decltype(psc_o),
decltype(0.00001), 7> d7_psc_o(psc_o, 0.00001);

More promising is a make-function that takes the constructor arguments and deduces their types like this:

template <typename F, typename T, unsigned N> // Not clever
nth_derivative<F, T, N>
make_nth_derivative(const F& f, const T& h)
{
return nth_derivative<F, T, N>(f, h);
}

This should deduce F and T, and we only need to declare N explicitly. Unfortunately, this doesn’t work as expected. When we declare a certain template parameter, we are obliged to declare all preceding parameters:

auto d7_psc_o= make_nth_derivative<psc_f, double, 7>(psc_o, 0.00001);

Well, this is not particularly useful. We have to change the order of our make-function’s template parameters: N must be declared and F and T can be deduced. Thus, we put N in front:

template <unsigned N, typename F, typename T>
nth_derivative<F, T, N>
make_nth_derivative(const F& f, const T& h)
{
return nth_derivative<F, T, N>(f, h);
}

Now the compiler can deduce the functor and value type of the seventh derivative:

auto d7_psc_o= make_nth_derivative<7>(psc_o, 0.00001);

We have seen that the order of template parameters mattered for our make-function. In function templates where all parameters are deduced by the compiler, their order is irrelevant. Only when parameters or some of them are explicitly declared do we have to pay attention. The parameters not deduced must be located at the front of the parameter list. To remember this, imagine a template function call with partly deduced parameters: the explicitly declared parameters come first, left of the opening (, and the other parameters are deduced from the arguments to the right of (.

3.8.4 Generic Reduction

Image c++11/accumulate_functor_example.cpp

Recall the function accumulate from Section 3.3.2.5 that we used to illustrate generic programming. In this section, we will generalize this function to a generic reduction. We introduce a BinaryFunction implementing an operation on two arguments as a function or as a callable class object. Then we can perform any reduction applying the BinaryFunction on all elements of our sequence:

template <typename Iter, typename T, typename BinaryFunction>
T accumulate(Iter it, Iter end, T init, BinaryFunction op)
{
for (; it != end; ++it)
init= op(init, *it);
return init;
}

To add values, we can realize a functor that is parameterized by the type of values:

template <typename T>
struct add
{
T operator()(const T& x, const T& y) const { return x + y; }
};

Instead of the class, we can also parameterize the operator():

struct times
{
template <typename T>
T operator()(const T& x, const T& y) const { return x * y; }
};

This has the advantage that the compiler can deduce the value type:

vector v= {7.0, 8.0, 11.0};
double s= accumulate(v.begin(), v.end(), 0.0, add<double>{});
double p= accumulate(v.begin(), v.end(), 1.0, times{});

Here we computed the sum and the product of vector entries. The add functor requires instantiation with the vector’s value type while the times functor is not a template class and the argument type is deduced in the application.

3.9 Lambda

Image

Image c++11/lambda.cpp

C++11 introduced lambda expressions. A λ-expression is simply shorthand for a functor. However, it makes programs more compact but often more comprehensible as well. Especially for simple calculation it is clearer to see their implementation at the place where they are used instead of calling a function whose code is somewhere else. Having seen classical functors before makes it quite easy to understand what lambdas are.

Listing 3–6 realizes a functor for sin x + cos x. The corresponding λ-expression is

Listing 3–8: Simple λ-expression


[](double x){ return sin(x) + cos(x); }


The lambda expression doesn’t only define a functor but immediately creates an object thereof. Thus, we can pass it immediately to a function as an argument:

fin_diff([](double x){ return sin(x) + cos(x); }, 1., 0.001 )

Parameters can be incorporated directly in the lambda expression. So we can scale the sin argument as we did in functor psc_f (Listing 3–7) by just inserting the multiplication and still getting a unary function object:

fin_diff([](double x){ return sin(2.5*x) + cos(x); }, 1., 0.001)

We can also store it to a variable for reuse:

auto sc_l= [](double x){ return sin(x) + cos(x); };

The lambda expressions in the previous examples don’t declare their return types. They are deduced by the compiler in such cases. In case it cannot be deduced or we prefer to declare it explicitly, we can provide the return type as a trailing argument:

[](double x)->double { return sin(x) + cos(x); };

Image c++11/derivative.cpp

Now that we can create function objects on the fly and are not particularly eager to bother with their types, we are glad that our derivative generator is able to deduce types. This allows us to create a function for the approximated seventh derivative of sin 2.5x + cos x in one single expression:

auto d7_psc_l= make_nth_derivative<7>(
[](double x){ return sin(2.5*x) + cos(x); }, 0.0001);

Sadly, the statement in a textbook is too long for a single line but not so in a real program (for the author’s taste).

As soon as lambdas came out, many programmers were so excited about them that they implemented every function argument with a lambda—often over many lines and containing other lambdas inside. This might be an intriguing challenge for experienced programmers, but we are convinced that decomposing monstrous nested expressions into readable pieces helps everybody using and maintaining our software.

3.9.1 Capture

Image

In the previous section, we parameterized a lambda by simply inserting an operation. This is not very productive for a multitude of parameters, however:

a= fin_diff([](double x){ return sin(2.5 * x); }, 1., 0.001);
b= fin_diff([](double x){ return sin(3.0 * x); }, 1., 0.001);
c= fin_diff([](double x){ return sin(3.5 * x); }, 1., 0.001);

Unfortunately, we cannot access variables or constants from the scope like this:

double phi= 2.5;
auto sin_phi= [](double x){ return sin(phi * x); }; // Error

The λ-expression can only use its own parameters or those Captured before.

3.9.2 Capture by Value

Image

In order to use phi, we must capture it first:

double phi= 2.5;
auto sin_phi= [phi](double x){ return sin(phi * x); };

To capture multiple variables, we can give a comma-separated list:

double phi= 2.5 , xi= 0.2;
auto sin2= [phi, xi](double x){ return sin(phi*x) + cos(x)*xi; };

These parameters are copied, but in contrast to function parameters passed by value it is forbidden to modify them. Written as a functor class, the previous λ corresponds to

struct lambda_f
{
lambda_f(double phi, double xi) : phi(phi), xi(xi) {}
double operator()(double x) const
{
return sin(phi * x) + cos(x) * xi;
}
const double phi, xi;
};

As a consequence, modifying the captured variables later has no effect on the lambda:

double phi= 2.5, xi= 0.2;
auto px= [phi, xi](double x){ return sin(phi * x) + cos(x) * xi; };
phi= 3.5; xi= 1.2;
a= fin_diff(px, 1., 0.001); // still uses phi= 2.5 and xi= 0.2

The variables are captured when the λ is defined, and thus the values at that time are used when the lambda is called.

Furthermore, we cannot modify the values inside the lambda function despite their being copies. The reason is that the lambda’s function body corresponds to a const-qualified operator() as above in lambda_f. In the following lambda, for instance, it is illegal to increment the captured phi:

auto l_inc= [phi](double x) {phi+= 0.6; return phi; }; // Error

To allow the modification of the captured values, the lambda must be qualified as mutable:

auto l_mut= [phi](double x) mutable {phi+= 0.6; return phi; };

The functor class equivalent to this mutable lambda would not be const-qualified:

struct l_mut_f
{
double operator()(double x); // not const
// ...
double phi, xi; // not const either
};

As a consequence, we cannot pass the lambda l_mut to const-parameters anymore. On the other hand, we don’t really need the mutability here and could instead return phi+0.6 and drop the incrementation.

3.9.3 Capture by Reference

Image

Variables can also be captured by reference:

double phi= 2.5, xi= 0.2;
auto pxr = [&phi,&xi](double x){ return sin(phi * x) + cos(x) * xi; };
phi= 3.5; xi= 1.2;
a= fin_diff(pxr, 1., 0.001); // now uses phi= 3.5 and xi= 1.2

In this example, the values of phi and xi at the time of the function call of λ are used—not those when the lambda was created. The corresponding functor class would look like this:

struct lambda_ref_type
struct lambda_f
{
lambda_ref_type(double& phi, double& xi) : phi(phi), xi(xi) {}
double operator()(double x)
{
return sin(phi * x) + cos(x) * xi;
}
double& phi;
double& xi;
};

Another consequence of the reference semantics is the ability to modify the referred values. This is not only a possible source of side effects but can be used productively. Let’s say we have different dense and sparse matrix classes. For all those classes, we provide the generic traversal function on_each_nonzero with the matrix as the first argument and a function object as the second (passed by value). This allows us to generically compute the Frobenius norm:

Image

Given this formula, we can apparently ignore all zero entries and process the non-zeros only:

template <typename Matrix>
typename Matrix::value_type
frobenius_norm(const Matrix& A)
{
using std::abs; using std::sqrt;
using value_type= typename Matrix::value_type;
value_type ss= 0;
on_each_nonzero(A, [&ss](value_type x) { ss+= abs(x) * abs (x); });
return sqrt(ss);
}

For simplicity’s sake, we assume here that the types of A(0,0) and abs(A(0,0)) are identical. Note that the λ-expression doesn’t return a value since its purpose is to sum up the squared matrix values in the referred variable ss. Here the lack of a return statement implies the void return type.

There are shortcuts for capturing all variables:

• [=]: capture all by copy;

• [&]: capture all by reference;

• [=,&a,&b,&c]: capture all by copy but a, b, c by reference; and

• [&,a,b,c]: capture all by reference but a, b, c by copy.

Scott Meyers advises not using the capture-all feature as it increases the danger of stale references and of ignoring static or member variables; see [32, Item 31].

3.9.4 Generalized Capture

Image

The generalization of capturing is brought in by Init Capture. It allows us to move variables into a closure and to give new names to context variables or expressions thereof. Say we have a function returning a Hilbert matrix as unique_ptr:

auto F= make_unique<Mat>(Mat{{1., 0.5},{0.5,1./3.}});

We can capture a reference to the pointer, but we risk a stale reference when the closure outlives the pointer. On the other hand, a unique_ptr cannot be copied. For assuring that our matrix lives as long as the closure, we have to move the data into a unique_ptr owned by the closure:

auto apply_hilbert= [F= move(F)](const Vec& x){ return Vec(*F * x); };

The init capture allows us further to introduce new names for existing variables and to evaluate expressions and associate a name with the result:

int x= 4;
auto y=
[&r= x, x= x + 1]()
{
r+= 2; // increment r, a reference to outer x
return x + 2; // return x + 2 where x is the outer x + 1
}();

This example from the standard defines a nullary closure returning an int. The closure introduces two local variables: r is a reference to the context’s x and the local value x is initialized with the outer x + 1.

In the example, we can see that an init capture has the form var= expr. An interesting fact (which you have probably noticed) is that var and expr are defined in different scopes. var is defined in the scope of the closure and expr in its context. Therefore, the same name can appear on both sides of =. Conversely, the following capture list:

int x= 4;
auto y= [&r= x, x= r + 1](){ ... }; // Error: no r in context

is not allowed because r only exists in the closure and can therefore not be used on the right-hand side of a capture.

3.9.5 Generic Lambdas

Image

Lambdas in C++11 determined the return type but the arguments needed to be declared explicitly. This restriction was lifted in C++14. In contrast to function templates, the arguments are not declared in the somewhat verbose template-typename notation but by the keyword auto. For instance, a function that sorts elements of a (random-access) container in descending order is implemented as simply as

template <typename C>
void reverse_sort(C& c)
{
sort(begin(c), end(c), [](auto x , auto y){ return x > y; });
}

In the same fashion, we can simplify our frobenius_norm from Section 3.9.3. The lambda summing the squared magnitudes can simply instantiate the argument type:

template <typename Matrix>
inline auto frobenius_norm(const Matrix& A)
{
using std::abs; using std::sqrt;
decltype(abs(A[0][0])) ss= 0;
on_each_nonzero(A, [&ss](auto x) { ss+= abs(x) * abs(x); });
return sqrt(ss);
}

In this little function, we used more type deduction mechanisms and liberated ourselves entirely from declaring value_type. We can now also handle the fact that abs might return a type different from value_type.

3.10 Variadic Templates

Image

Function and class templates are called Variadic when their arity can vary; i.e., they work for an arbitrary number of arguments. More precisely, there may be a minimal number of arguments but not a maximal. Furthermore, the template arguments are allowed to be different types (or integral constants).

At the time of this writing, the programmer community is still on the journey of discovering this powerful feature. Use cases are, for instance, type-safe printf implementations, different kinds of reduction, and all forms of generic forwarding functions. Hopefully, our examples will contribute a little to more frequent variadic template implementations in applications.

We will illustrate the feature with a sum function for mixed types:

template <typename T>
inline T sum(T t) { return t; }

template <typename T, typename ...P>
inline T sum(T t, P ...p)
{
return t + sum(p...);
}

Variadic templates are handled by recursion. We break down the so-called Parameter Pack and deal with subsets thereof. Usually, one element is split off and combined with the result of the remainder.

Variadic templates introduce the new ellipsis operator denoted by .... The operator on the left side means packing and on the right side unpacking. The different interpretations of the ellipsis are the following:

• typename ...P: pack multiple type arguments into the type pack P;

• <P...>: unpack P when instantiating a class or function template (comes later);

• P ...p: pack multiple function arguments into the variable pack p; and

• sum(p...): unpack variable pack p and call sum with multiple arguments.

Thus, our sum function computes the addition of the first entry with the sum of the others. This sum in turn is computed recursively as well. To terminate the recursion, we write an overload for a single argument. We could have also written an overload for a nullary function (one without arguments) returning 0 as int.

Our implementation has a significant disadvantage: the return type is the type of the first argument. This might work more or less in some cases:

auto s= sum(-7, 3.7f, 9u, -2.6);
std::cout Image "s is " Image s
Image " and its type is " Image typeid(s).name() Image '\n';

yields

s is 2 and its type is int

The correct result of 3.1 cannot be stored as int,13 the type of our first argument -7.

13. The output of typeid must be demangled with tools like c++filt.

This is not really satisfying but it can be much worse as in the following computation:

auto s2= sum(-7, 3.7f, 9u, -42.6);
std::cout Image "s2 is " Image s2 Image " and its type is " Image typeid(s2).name() Image '\n';

which also returns an int:

s2 is -2147483648 and its type is int

The first intermediate result is 9 – 42.6 = –33.6, which yields a very large number when converted to unsigned and is later turned into a very small int value. On the other hand, the calculation

auto s= -7 + 3.7f + 9u + -42.6;

computes the correct result and stores it as double. But before we condemn variadic templates, we have to admit to ourselves that we chose inappropriate types for intermediate values and the final result. We will correct this in Section 5.2.7 with a better suited return type.

For counting the number of arguments in a parameter pack at compile time, we can use the function-like expression sizeof...:

template <typename ...P>
void count(P ...p)
{
cout Image "You have " Image sizeof...(P) Image " parameters.\n";
...
}

The binary I/O from Section A.2.7 is revised in Section A.6.4 to allow arbitrary numbers of arguments in the write and read operations.

As the sum example already indicated, the full power of variadic templates will only be unleashed in combination with meta-programming (Chapter 5).

3.11 Exercises

3.11.1 String Representation

Write a generic function to_string that takes an argument of an arbitrary type (as const&) and generates a string by piping it to a std::stringstream and returning the resulting string.

3.11.2 String Representation of Tuples

Write a variadic template function that represents an arbitrary number of arguments as a tuple in a string. That is, the function call to_tuple_string(x, y, z) returns a string of the form (x, y, z) by printing each element to a string stream.

Hint: Use a helper function to_tuple_string_aux that is overloaded for different arities.

3.11.3 Generic Stack

Write a stack implementation for a generic value type. The maximal size of the stack is defined in the class (hard-wired). Provide the following functions:

• Constructor;

• Destructor if necessary;

• top: show last element;

• pop: remove last element (without returning);

• push: insert new element;

• clear: delete all entries;

• size: number of elements;

• full: whether stack is full;

• empty: whether stack is empty.

Stack over- or underflow must throw an exception.

3.11.4 Iterator of a Vector

Add the methods begin() and end() for returning a begin and end iterator to class vector. Add the types iterator and const_iterator to the class as well. Note that pointers are models of the concept of random-access iterators.

Use the STL function sort for ordering vector entries to demonstrate that your iterators work as they should.

3.11.5 Odd Iterator

Write an iterator class for odd numbers named odd_iterator. The class must model (realize) the ForwardIterator concept (http://www.sgi.com/tech/stl/ForwardIterator.html). That means it must provide the following members:

• Default and copy constructor;

• operator++ to the next odd element, as pre-increment and post-increment;

• operator* as dereference which returns an (odd) int;

• operator== and operator!=; and

• operator=.

with the obvious semantics. In addition, the class should contain a constructor that accepts an int value. This value will be returned in the dereference operator (as long as the iterator is not incremented). This constructor should throw an exception if the value is even. Likewise the default constructor should initialize the internal value with 1 to provide a legal state.

3.11.6 Odd Range

Write a class for a range of odd numbers. The member or free functions begin and end should return an odd_iterator as defined in Exercise 3.11.5.

The following code should print the odd numbers {7, 9, . . ., 25}:

for (int i : odd_range(7, 27))
std::cout Image i Image "\n";

3.11.7 Stack of bool

Specialize your stack implementation from Exercise 3.11.3 for bool. Use an unsigned char for 8 bool as in Section 3.6.1.

3.11.8 Stack with Custom Size

Revise your stack implementation from Exercise 3.11.3 (and optionally that of Exercise 3.11.7) with a user-defined size. The size is passed as the second template argument. The default should be 4096.

3.11.9 Deducing Non-type Template Arguments

We have seen that the type of a template argument can be deduced in a function call. Non-type template arguments are in most cases declared explicitly, but they can be deduced as well when they are part of the argument type. As illustration: write a function array_size that accepts a C array of arbitrary type and size as a reference and returns the size of that array. The actual function argument can be omitted since we are only interested in its type. Do you remember? We threatened this exercise in Section 1.8.7.1. On the other hand, we revealed the trickiest part of it in the meantime.

3.11.10 Trapezoid Rule

A simple method for computing the integral of a function is the trapezoid rule. Suppose we want to integrate the function f over the interval [a, b]. We split the interval in n small intervals [xi, xi+1] of the same length h= (ba)/n and approximate f by a piecewise linear function. The integral is then approximated by the sum of the integrals of that function. This leads to the following formula:

Image

In this exercise, we develop a function for the trapezoid rule, with a functor argument. For comparison, implement this using inheritance and generic programming. As a test case, integrate:

f = exp(–3x) for x Image [0, 4]. Try trapezoid with the following arguments:

double exp3f(double x) {
return std::exp(3.0 * x);
}

struct exp3t {
double operator() (double x) const {
return std::exp(3.0 * x);
}
};

f = sin(x) if x < 1 and f = cos(x) if x ≥ 1 for x Image [0, 4].

• Can we call trapezoid( std::sin, 0.0, 2.0 ); ?

As a second exercise, develop a functor for computing the finite difference. Then integrate the finite difference to verify that you get the function value back.

3.11.11 Functor

Write a functor for 2 cos x + x2 and compute the first and second derivatives with the functor from Section 3.8.1.

3.11.12 Lambda

Compute the same derivatives as in Exercise 3.11.11 but this time with a lambda expression.

3.11.13 Implement make_unique

Implement your own make_unique. Use std::forward to pass parameter packs to the new.