The C++ Programming Language (2013)

Part III: Abstraction Mechanisms

19. Special Operators

We are all special cases.

– Albert Camus

• Introduction

• Special Operators

Subscripting; Function Call; Dereferencing; Increment and Decrement; Allocation and Deallocation; User-Defined Literals

• A String Class

Essential Operations; Access to Characters; Representation; Member Functions; Helper Functions; Using Our String

• Friends

Finding Friends; Friends and Members

• Advice

19.1. Introduction

Overloading is not just for arithmetic and logical operations. In fact, operators are crucial in the design of containers (e.g., vector and map; §4.4), “smart pointers” (e.g., unique_ptr and shared_ptr; §5.2.1), iterators (§4.5), and other classes concerned with resource management.

19.2. Special Operators

The operators

[] () –> ++ –– new delete

are special only in that the mapping from their use in the code to a programmer’s definition differs slightly from that used for conventional unary and binary operators, such as +, <, and ~ (§18.2.3). The [] (subscript) and () (call) operators are among the most useful user-defined operators.

19.2.1. Subscripting

An operator[] function can be used to give subscripts a meaning for class objects. The second argument (the subscript) of an operator[] function may be of any type. This makes it possible to define vectors, associative arrays, etc.

As an example, we can define a simple associative array type like this:

struct Assoc {
vector<pair<string,int>> vec; // vector of {name,value} pairs

const int& operator[] (const string&) const;
int& operator[](const string&);
};

An Assoc keeps a vector of std::pairs. The implementation uses the same trivial and inefficient search method as in §7.7:

int& Assoc::operator[](const string& s)
// search for s; return a reference to its value if found;
// otherwise, make a new pair {s,0} and return a reference to its value
{
for (auto x : vec)
if (s == x.first) return x.second;

vec.push_back({s,0}); // initial value: 0

return vec.back().second; // return last element (§31.2.2)
}

We can use Assoc like this:

int main() // count the occurrences of each word on input
{
Assoc values;
string buf;
while (cin>>buf) ++values[buf];
for (auto x : values.vec)
cout << '{' << x.first << ',' << x.second << "}\n";
}

The standard-library map and unordered_map are further developments of the idea of an associative array (§4.4.3, §31.4.3) with less naive implementations.

An operator[]() must be a non-static member function.

19.2.2. Function Call

Function call, that is, the notation expression(expression-list), can be interpreted as a binary operation with the expression as the left-hand operand and the expression-list as the right-hand operand. The call operator, (), can be overloaded in the same way as other operators can. For example:

struct Action {
int operator()(int);
pair<int,int> operator()(int,int);
double operator()(double);
// ...
};

void f(Action act)
{
int x = act(2);
auto y = act(3,4);
double z = act(2.3);
// ...
};

An argument list for an operator()() is evaluated and checked according to the usual argument-passing rules. Overloading the function call operator seems to be useful primarily for defining types that have only a single operation and for types for which one operation is predominant. The call operator is also known as the application operator.

The most obvious and also the most important, use of the () operator is to provide the usual function call syntax for objects that in some way behave like functions. An object that acts like a function is often called a function-like object or simply a function object (§3.4.3). Such function objects allow us to write code that takes nontrivial operations as parameters. In many cases, it is essential that function objects can hold data needed to perform their operation. For example, we can define a class with an operator()() that adds a stored value to its argument:

class Add {
complex val;
public:
Add(complex c) :val{c} { } // save a value
Add(double r, double i) :val{{r,i}} { }

void operator()(complex& c) const { c += val; } // add a value to argument
};

An object of class Add is initialized with a complex number, and when invoked using (), it adds that number to its argument. For example:

void h(vector<complex>& vec, list<complex>& lst, complex z)
{
for_each(vec.begin(),vec.end(),Add{2,3});
for_each(lst.begin(),lst.end(),Add{z});
}

This will add complex{2,3} to every element of the vector and z to every element of the list. Note that Add{z} constructs an object that is used repeatedly by for_each(): Add{z}’s operator()() is called for each element of the sequence.

This all works because for_each is a template that applies () to its third argument without caring exactly what that third argument really is:

template<typename Iter, typename Fct>
Fct for_each(Iter b, Iter e, Fct f)
{
while (b != e) f(*b++);
return f;
}

At first glance, this technique may look esoteric, but it is simple, efficient, and extremely useful (§3.4.3, §33.4).

Note that a lambda expression (§3.4.3, §11.4) is basically a syntax for defining a function object. For example, we could have written:

void h2(vector<complex>& vec, list<complex>& lst, complex z)
{
for_each(vec.begin(),vec.end(),[](complex& a){ a+={2,3}; });
for_each(lst.begin(),lst.end(),[](complex& a){ a+=z; });
}

In this case, each of the lambda expressions generates the equivalent of the function object Add.

Other popular uses of operator()() are as a substring operator and as a subscripting operator for multidimensional arrays (§29.2.2, §40.5.2).

An operator()() must be a non-static member function.

Function call operators are often templates (§29.2.2, §33.5.3).

19.2.3. Dereferencing

The dereferencing operator, –> (also known as the arrow operator), can be defined as a unary postfix operator. For example:

class Ptr {
// ...
X* operator–>();
};

Objects of class Ptr can be used to access members of class X in a very similar manner to the way pointers are used. For example:

void f(Ptr p)
{
p–>m = 7; // (p.operator->())->m = 7
}

The transformation of the object p into the pointer p.operator–>() does not depend on the member m pointed to. That is the sense in which operator–>() is a unary postfix operator. However, there is no new syntax introduced, so a member name is still required after the –>. For example:

void g(Ptr p)
{
X* q1 = p–>; // syntax error
X* q2 = p.operator–>(); // OK
}

Overloading –> is primarily useful for creating “smart pointers,” that is, objects that act like pointers and in addition perform some action whenever an object is accessed through them. The standard-library “smart pointers” unique_ptr and shared_ptr (§5.2.1) provide operator –>.

As an example, we could define a class Disk_ptr for accessing objects stored on disk. Disk_ptr’s constructor takes a name that can be used to find the object on disk, Disk_ptr::operator–>() brings the object into main memory when accessed through its Disk_ptr, and Disk_ptr’s destructor eventually writes the updated object back out to disk:

template<typename T>
class Disk_ptr {
string identifier;
T* in_core_address;
// ...
public:
Disk_ptr(const string& s) : identifier{s}, in_core_address{nullptr} { }
~Disk_ptr() { write_to_disk(in_core_address,identifier); }

T* operator–>()
{
if (in_core_address == nullptr)
in_core_address = read_from_disk(identifier);
return in_core_address;
}
};

Disk_ptr might be used like this:

struct Rec {
string name;
// ...
};

void update(const string& s)
{
Disk_ptr<Rec> p {s}; // get Disk_ptr for s

p–>name = "Roscoe"; // update s; if necessary, first retrieve from disk
// ...
} // p's destructor writes back to disk

Naturally, a realistic program would contain error-handling code and use a less naive way of interacting with the disk.

For ordinary pointers, use of –> is synonymous with some uses of unary * and []. Given a class Y for which –>, *, and [] have their default meaning and a Y* called p, then:

p–>m == (*p).m // is true
(*p).m == p[0].m // is true
p–>m == p[0].m // is true

As usual, no such guarantee is provided for user-defined operators. The equivalence can be provided where desired:

template<typename T>
class Ptr {
Y* p;
public:
Y* operator–>() { return p; } // dereference to access member
Y& operator*() { return *p; } // dereference to access whole object
Y& operator[](int i) { return p[i]; } // dereference to access element
// ...
};

If you provide more than one of these operators, it might be wise to provide the equivalence, just as it is wise to ensure that ++x and x+=1 have the same effect as x=x+1 for a simple variable x of some class X if ++, +=, =, and + are provided.

The overloading of –> is important to a class of interesting programs and is not just a minor curiosity. The reason is that indirection is a key concept and that overloading –> provides a clean, direct, and efficient way of representing indirection in a program. Iterators (Chapter 33) provide an important example of this.

Operator –> must be a non-static member function. If used, its return type must be a pointer or an object of a class to which you can apply –>. The body of a template class member function is only checked if the function is used (§26.2.1), so we can define operator–>() without worrying about types, such as Ptr<int>, for which –> does not make sense.

Despite the similarity between –> and . (dot), there is no way of overloading operator . (dot).

19.2.4. Increment and Decrement

Once people invent “smart pointers,” they often decide to provide the increment operator ++ and the decrement operator –– to mirror these operators’ use for built-in types. This is especially obvious and necessary where the aim is to replace an ordinary pointer type with a “smart pointer” type that has the same semantics, except that it adds a bit of run-time error checking. For example, consider a troublesome traditional program:

void f1(X a) // traditional use
{
X v[200];
X* p = &v[0];
p––;
*p = a; // oops: p out of range, uncaught
++p;
*p = a; // OK
}

Here, we might want to replace the X* with an object of a class Ptr<X> that can be dereferenced only if it actually points to an X. We would also like to ensure that p can be incremented and decremented only if it points to an object within an array and the increment and decrement operations yield an object within that array. That is, we would like something like this:

void f2(Ptr<X> a) // checked
{
X v[200];
Ptr<X> p(&v[0],v);
p––;
*p = a; // run-time error: p out of range
++p;
*p = a; // OK
}

The increment and decrement operators are unique among C++ operators in that they can be used as both prefix and postfix operators. Consequently, we must define prefix and postfix increment and decrement for Ptr<T>. For example:

template<typename T>
class Ptr {
T* ptr;
T* array;
int sz;
public:
template<int N>
Ptr(T* p, T(&a)[N]); // bind to array a, sz==N, initial value p
Ptr(T* p, T* a, int s); // bind to array a of size s, initial value p
Ptr(T* p); // bind to single object, sz==0, initial value p

Ptr& operator++(); // prefix
Ptr operator++(int); // postfix

Ptr& operator––(); // prefix
Ptr operator––(int); // postfix

T& operator*(); // prefix
};

The int argument is used to indicate that the function is to be invoked for postfix application of ++. This int is never used; the argument is simply a dummy used to distinguish between prefix and postfix application. The way to remember which version of an operator++ is prefix is to note that the version without the dummy argument is prefix, exactly like all the other unary arithmetic and logical operators. The dummy argument is used only for the “odd” postfix ++ and ––.

Consider omitting postfix ++ and –– in a design. They are not only odd syntactically, they tend to be marginally harder to implement than the postfix versions, less efficient, and less frequently used. For example:

template<typename T>
Ptr& Ptr<T>::operator++() // return the current object after incrementing
{
// ... check that ptr+1 can be pointed to ...
return *++ptr;
}

template<typename T>
Ptr Ptr<T>::operator++(int) // increment and return a Ptr with the old value
{
// ... check that ptr+1 can be pointed to ...
Ptr<T> old {ptr,array,sz};
++ptr;
return old;
}

The pre-increment operator can return a reference to its object. The post-increment operator must make a new object to return.

Using Ptr, the example is equivalent to:

void f3(T a) // checked
{
T v[200];
Ptr<T> p(&v[0],v,200);
p.operator––(0); // suffix: p--
p.operator*() = a; // run-time error: p out of range
p.operator++(); // prefix: ++p
p.operator*() = a; // OK
}

Completing class Ptr is left as an exercise. A pointer template that behaves correctly with respect to inheritance is presented in §27.2.2.

19.2.5. Allocation and Deallocation

Operator new (§11.2.3) acquires its memory by calling an operator new(). Similarly, operator delete frees its memory by calling an operator delete(). A user can redefine the global operator new() and operator delete() or define operator new() and operator delete() for a particular class.

Using the standard-library type alias size_t (§6.2.8) for sizes, the declarations of the global versions look like this:

void* operator new(size_t); // use for individual object
void* operator new[](size_t); // use for array
void operator delete(void*, size_t); // use for individual object
void operator delete[](void*, size_t); // use for array

// for more versions, see §11.2.4

That is, when new needs memory on the free store for an object of type X, it calls operator new(sizeof(X)). Similarly, when new needs memory on the free store for an array of N objects of type X, it calls operator new[](N*sizeof(X)). A new expression may ask for more memory than is indicated by N*sizeof(X), but it will always do so in terms of a number of characters (i.e., a number of bytes). Replacing the global operator new() and operator delete() is not for the fainthearted and not recommended. After all, someone else might rely on some aspect of the default behavior or might even have supplied other versions of these functions.

A more selective, and often better, approach is to supply these operations for a specific class. This class might be the base for many derived classes. For example, we might like to have a class Employee provide a specialized allocator and deallocator for itself and all of its derived classes:

class Employee {
public:
// ...

void* operator new(size_t);
void operator delete(void*, size_t);

void* operator new[](size_t);
void operator delete[](void*, size_t);
};

Member operator new()s and operator delete()s are implicitly static members. Consequently, they don’t have a this pointer and do not modify an object. They provide storage that a constructor can initialize and a destructor can clean up.

void* Employee::operator new(size_t s)
{
// allocate s bytes of memory and return a pointer to it
}

void Employee::operator delete(void* p, size_t s)
{
if (p) { // delete only if p!=0; see §11.2, §11.2.3
// assume p points to s bytes of memory allocated by Employee::operator new()
// and free that memory for reuse
}
}

The use of the hitherto mysterious size_t argument now becomes obvious. It is the size of the object being deleted. Deleting a “plain” Employee gives an argument value of sizeof(Employee); deleting a Manager derived from Employee that does not have its own operator delete() gives an argument value of sizeof(Manager). This allows a class-specific allocator to avoid storing size information with each allocation. Naturally, a class-specific allocator can store such information (as a general-purpose allocator must) and ignore the size_t argument to operator delete(). However, doing so makes it harder to improve significantly on the speed and memory consumption of a general-purpose allocator.

How does a compiler know how to supply the right size to operator delete()? The type specified in the delete operation matches the type of the object being deleted. If we delete an object through a pointer to a base class, that base class must have a virtual destructor (§17.2.5) for the correct size to be given:

Employee* p = new Manager; // potential trouble (the exact type is lost)
// ...
delete p; // hope Employee has a virtual destructor

In principle, deallocation is then done by the destructor (which knows the size of its class).

19.2.6. User-defined Literals

C++ provides literals for a variety of built-in types (§6.2.6):

123 // int
1.2 // double
1.2F // float
'a' // char
1ULL // unsigned long long
0xD0 // hexadecimal unsigned
"as" // C-style string (const char[3])

In addition, we can define literals for user-defined types and new forms of literals for built-in types. For example:

"Hi!"s // string, not "zero-terminated array of char"
1.2i // imaginary
101010111000101b // binary
123s // seconds
123.56km // not miles! (units)
1234567890123456789012345678901234567890x // extended-precision

Such user-defined literals are supported through the notion of literal operators that map literals with a given suffix into a desired type. The name of a literal operator is operator"" followed by the suffix. For example:

constexpr complex<double> operator"" i(long double d) // imaginary literal
{
return {0,d}; // complex is a literal type
}

std::string operator"" s(const char* p, size_t n) // std::string literal
{
return string{p,n}; // requires free-store allocation
}

These two operators define suffixes i and s, respectively. I use constexpr to enable compile-time evaluation. Given those, we can write:

template<typename T> void f(const T&);

void g()
{
f("Hello"); // pass pointer to char*
f("Hello"s); // pass (five-character) string object
f("Hello\n"s); // pass (six-character) string object

auto z = 2+1i; // complex{2,1}
}

The basic (implementation) idea is that after parsing what could be a literal, the compiler always checks for a suffix. The user-defined literal mechanism simply allows the user to specify a new suffix and define what is to be done with the literal before it. It is not possible to redefine the meaning of a built-in literal suffix or to augment the syntax of literals.

There are four kinds of literals that can be suffixed to make a user-defined literal (§iso.2.14.8):

• An integer literal (§6.2.4.1): accepted by a literal operator taking an unsigned long long or a const char* argument or by a template literal operator, for example, 123m or 12345678901234567890X

• A floating-point literal (§6.2.5.1): accepted by a literal operator taking a long double or a const char* argument or by a template literal operator, for example, 12345678901234567890.976543210x or 3.99s

• A string literal (§7.3.2): accepted by a literal operator taking a (const char*, size_t) pair of arguments, for example, "string"s and R"(Foo\bar)"_path

• A character literal (§6.2.3.2): accepted by a literal operator taking a character argument of type char, wchar_t, char16_t, or char32_t, for example, 'f'_runic or u'BEEF'_w.

For example, we could define a literal operator to collect digits for integer values that cannot be represented in any of the built-in integer types:

Bignum operator"" x(const char* p)
{
return Bignum(p);
}

void f(Bignum);

f(123456789012345678901234567890123456789012345x);

Here, the C-style string "123456789012345678901234567890123456789012345" is passed to operator"" x(). Note that I did not put those digits in double quotes. I requested a C-style string for my operator, and the compiler delivered it from the digits provided.

To get a C-style string from the program source text into a literal operator, we request both the string and its number of characters. For example:

string operator"" s(const char* p, size_t n);

string s12 = "one two"s; // calls operator ""("one two",7)
string s22 = "two\ntwo"s; // calls operator ""("two\ntwo",7)
string sxx = R"(two\ntwo)"s; // calls operator ""("two\\ntwo",8)

In the raw string (§7.3.2.1), "\n" represents the two characters '\' and 'n'.

The rationale for requiring the number of characters is that if we want to have “a different kind of string,” we almost always want to know the number of characters anyway.

A literal operator that takes just a const char* argument (and no size) can be applied to integer and floating-point literals. For example:

string operator"" SS(const char* p); // warning: this will not work as expected

string s12 = "one two"SS; // error: no applicable literal operator
string s13 = 13SS; // OK, but why would anyone do that?

A literal operator converting numerical values to strings could be quite confusing.

A template literal operator is a literal operator that takes its argument as a template parameter pack, rather than as a function argument. For example:

template<char...>
constexpr int operator"" _b3(); // base 3, i.e., ternary

Given that, we get:

201_b3 // means operator"" b3<'2','0','1'>(); so 9*2+0*3+1 == 19
241_b3 // means operator"" b3<'2','4','1'>(); so error: 4 isn't a ternary digit

The variadic template techniques (§28.6) can be disconcerting, but it is the only way of assigning nonstandard meanings to digits at compile time.

To define operator"" _b3(), we need some helper functions:

constexpr int ipow(int x, int n) // x to the nth power for n>=0
{
return (n>0) ? x*ipow(n–1) : 1;
}
template<char c> // handle the single ternary digit case
constexpr int b3_helper()
{
static_assert(c<'3',"not a ternary digit");
return c;
}

template<char c, char... tail> // peel off one ternary digit
constexpr int b3_helper()
{
static_assert(c<'3',"not a ternary digit");
return ipow(3,sizeof...(tail))*(c–'0')+b3_helper(tail...);
}

Given that, we can define our base 3 literal operator:

template<char... chars>
constexpr int operator"" _b3() // base 3, i.e., ternary
{
return b3_helper(chars...);
}

Many suffixes will be short (e.g., s for std::string, i for imaginary, m for meter (§28.7.3), and x for extended), so different uses could easily clash. Use namespaces to prevent clashes:

namespace Numerics {
// ...

class Bignum { /* ... */ };

namespace literals {
Bignum operator"" x(char const*);
}
// ...
}

using namespace Numerics::literals;

The standard library reserves all suffixes not starting with an initial underscore, so define your suffixes starting with an underscore or risk your code breaking in the future:

123km // reserved by the standard library
123_km // available for your use

19.3. A String Class

The relatively simple string class presented in this section illustrates several techniques that are useful for the design and implementation of classes using conventionally defined operators. This String is a simplified version of the standard-library string (§4.2, Chapter 36). String provides value semantics, checked and unchecked access to characters, stream I/O, support for range-for loops, equality operations, and concatenation operators. I also added a String literal, which std::string does not (yet) have.

To allow simple interoperability with C-style strings (including string literals (§7.3.2)), I represent strings as zero-terminated arrays of characters. For realism, I implement the short string optimization. That is, a String with only a few characters stores those characters in the class object itself, rather than on the free store. This optimizes string usage for small strings. Experience shows that for a huge number of applications most strings are short. This optimization is particularly important in multi-threaded systems where sharing through pointers (or references) is infeasible and free-store allocation and deallocation relatively expensive.

To allow Strings to efficiently “grow” by adding characters at the end, I implement a scheme for keeping extra space for such growth similar to the one used for vector (§13.6.1). This makes String a suitable target for various forms of input.

Writing a better string class and/or one that provides more facilities is a good exercise. That done, we can throw away our exercises and use std::string (Chapter 36).

19.3.1. Essential Operations

Class String provides the usual set of constructors, a destructor, and assignment operations (§17.1):

class String {
public:
String(); // default constructor: x{""}

explicit String(const char* p); // constructor from C-style string: x{"Euler"}

String(const String&); // copy constructor
String& operator=(const String&); // copy assignment

String(String&& x); // move constructor
String& operator=(String&& x); // move assignment

~String() { if (short_max<sz) delete[] ptr; } // destructor
// ...
};

This String has value semantics. That is, after an assignment s1=s2, the two strings s1 and s2 are fully distinct, and subsequent changes to one have no effect on the other. The alternative would be to give String pointer semantics. That would be to let changes to s2 after s1=s2 also affect the value of s1. Where it makes sense, I prefer value semantics; examples are complex, vector, Matrix, and string. However, for value semantics to be affordable, we need to pass Strings by reference when we don’t need copies and to implement move semantics (§3.3.2, §17.5.2) to optimizereturns.

The slightly nontrivial representation of String is presented in §19.3.3. Note that it requires user-defined versions of the copy and move operations.

19.3.2. Access to Characters

The design of access operators for a string is a difficult topic because ideally access is by conventional notation (that is, using []), maximally efficient, and range checked. Unfortunately, you cannot have all of these properties simultaneously. Here, I follow the standard library by providing efficient unchecked operations with the conventional [] subscript notation plus range-checked at() operations:

class String {
public:
// ...

char& operator[](int n) { return ptr[n]; } // unchecked element access
char operator[](int n) const { return ptr[n]; }

char& at(int n) { check(n); return ptr[n]; } // range-checked element access
char at(int n) const { check(n); return ptr[n]; }

String& operator+=(char c); // add c at end

const char* c_str() { return ptr; } // C-style string access
const char* c_str() const { return ptr; }

int size() const { return sz; } // number of elements
int capacity() const // elements plus available space
{ return (sz<=short_max) ? short_max : sz+space; }
// ...
};

The idea is to use [] for ordinary use. For example:

int hash(const String& s)
{
int h {s[0]};
for (int i {1}; i!=s.size(); i++) h ^= s[i]>>1; // unchecked access to s
return h;
}

Here, using the checked at() would be redundant because we correctly access s only from 0 to s.size()–1.

We can use at() where we see a possibility of mistakes. For example:

void print_in_order(const String& s,const vector<int>& index)
{
for (x : index) cout << s.at(x) << '\n';
}

Unfortunately, assuming that people will use at() consistently where mistakes can be made is overly optimistic, so some implementations of std::string (from which the []/at() convention is borrowed) also check []. I personally prefer a checked [] at least during development. However, for serious string manipulation tasks, a range check on each character access could impose quite noticeable overhead.

I provide const and non-const versions of the access functions to allow them to be used for const as well as other objects.

19.3.3. Representation

The representation for String was chosen to meet three goals:

• To make it easy to convert a C-style string (e.g., a string literal) to a String and to allow easy access to the characters of a String as a C-style string

• To minimize the use of the free store

• To make adding characters to the end of a String efficient

The result is clearly messier than a simple {pointer,size} representation, but much more realistic:

class String {
/*
A simple string that implements the short string optimization

size()==sz is the number of elements
if size()<= short_max, the characters are held in the String object itself;
otherwise the free store is used.

ptr points to the start of the character sequence
the character sequence is kept zero-terminated: ptr[size()]==0;
this allows us to use C library string functions and to easily return a C-style string: c_str()

To allow efficient addition of characters at end, String grows by doubling its allocation;
capacity() is the amount of space available for characters
(excluding the terminating 0): sz+space
*/

public:
// ...
private:
static const int short_max = 15;
int sz; // number of characters
char* ptr;
union {
int space; // unused allocated space
char ch[short_max+1]; // leave space for terminating 0
};

void check(int n) const // range check
{
if (n<0 || sz<=n)
throw std::out_of_range("String::at()");
}

// ancillary member functions:
void copy_from(const String& x);
void move_from(String& x);
};

This supports what is known as the short string optimization by using two string representations:

• If sz<=short_max, the characters are stored in the String object itself, in the array named ch.

• If !(sz<=short_max), the characters are stored on the free store and we may allocate extra space for expansion. The member named space is the number of such characters.

In both cases, the number of elements is kept in sz and we look at sz, to determine which implementation scheme is used for a given string.

In both cases, ptr points to the elements. This is essential for performance: the access functions do not need to test which representation is used; they simply use ptr. Only the constructors, assignments, moves, and the destructor (§19.3.4) must care about the two alternatives.

We use the array ch only if sz<=short_max and the integer space only if !(sz<=short_max). Consequently, it would be a waste to allocate space for both ch and space in a String object. To avoid such waste, I use a union (§8.3). In particular, I used a form of union called an anonymous union (§8.3.2), which is specifically designed to allow a class to manage alternative representations of objects. All members of an anonymous union are allocated in the same memory, starting at the same address. Only one member may be used at any one time, but otherwise they are accessed and used exactly as if they were separate members of the scope surrounding the anonymous union. It is the programmer’s job to make sure that they are never misused. For example, all member functions of String that use space must make sure that it really was space that was set and not ch. That is done by looking at sz<=short_max. In other words, Shape is (among other things) a discriminated union with sz<=short_max as the discriminant.

19.3.3.1. Ancillary Functions

In addition to functions intended for general use, I found that my code became cleaner when I provided three ancillary functions as “building blocks” to help me with the somewhat tricky representation and to minimize code replication. Two of those need to access the representation ofString, so I made them members. However, I made them private members because they don’t represent operations that are generally useful and safe to use. For many interesting classes, the implementation is not just the representation plus the public functions. Ancillary functions can lead to less duplication of code, better design, and improved maintainability.

The first such function moves characters into newly allocated memory:

char* expand(const char* ptr, int n) // expand into free store
{
char* p = new char[n];
strcpy(p,ptr); // §43.4
return p;
}

This function does not access the String representation, so I did not make it a member.

The second implementation function is used by copy operations to give a String a copy of the members of another:

void String::copy_from(const String& x)
// make *this a copy of x
{
if (x.sz<=short_max) { // copy *this
memcpy(this,&x,sizeof(x)); // §43.5
ptr = ch;
}
else { // copy the elements
ptr = expand(x.ptr,x.sz+1);
sz = x.sz;
space = 0;
}
}

Any necessary cleanup of the target String is the task of callers of copy_from(); copy_from() unconditionally overwrites its target. I use the standard-library memcpy() (§43.5) to copy the bytes of the source into the target. That’s a low-level and sometimes pretty nasty function. It should be used only where there are no objects with constructors or destructors in the copied memory because memcpy() knows nothing about types. Both String copy operations use copy_from().

The corresponding function for move operations is:

void String::move_from(String& x)
{
if (x.sz<=short_max) { // copy *this
memcpy(this,&x,sizeof(x)); // §43.5
ptr = ch;
}

else { // grab the elements
ptr = x.ptr;
sz = x.sz;
space = x.space;
x.ptr = x.ch; // x = ""
x.sz = 0;
x.ch[0]=0;
}
}

It too unconditionally makes its target a copy of its argument. However, it does not leave its argument owning any free store. I could also have used memcpy() in the long string case, but since a long string representation uses only part of String’s representation, I decided to copy the used members individually.

19.3.4. Member Functions

The default constructor defines a String to be empty:

String::String() // default constructor: x{""}
: sz{0}, ptr{ch} // ptr points to elements, ch is an initial location (§19.3.3)
{
ch[0] = 0; // terminating 0
}

Given copy_from() and move_from(), the constructors, moves, and assignments are fairly simple to implement. The constructor that takes a C-style string argument must determine the number of characters and store them appropriately:

String::String(const char* p)
:sz{strlen(p)},
ptr{(sz<=short_max) ? ch : new char[sz+1]},
space{0}
{
strcpy(ptr,p); // copy characters into ptr from p
}

If the argument is a short string, ptr is set to point to ch; otherwise, space is allocated on the free store. In either case, the characters are copied from the argument string into the memory managed by String.

The copy constructor simply copies the representation of its arguments:

String::String(const String& x) // copy constructor
{
copy_from(x); // copy representation from x
}

I didn’t bother trying to optimize the case where the size of the source equals the size of the target (as was done for vector; §13.6.3). I don’t know if that would be worthwhile.

Similarly, the move constructor moves the representation from its source (and possibly sets it argument to be the empty string):

String::String(String&& x) // move constructor
{
move_from(x);
}

Like the copy constructor, the copy assignment uses copy_from() to clone its argument’s representation. In addition, it has to delete any free store owned by the target and make sure it does not get into trouble with self-assignment (e.g., s=s):

String& String::operator=(const String& x)
{
if (this==&x) return *this; // deal with self-assignment
char* p = (short_max<sz) ? ptr : 0;
copy_from(x);
delete[] p;
return *this;
}

The String move assignment deletes its target’s free store (if there is any) and then moves:

String& String::operator=(String&& x)
{
if (this==&x) return *this; // deal with self-assignment (x = move(x) is insanity)
if (short_max<sz) delete[] ptr; // delete target
move_from(x); // does not throw
return *this;
}

It is logically possible to move a source into itself (e.g., s=std::move(s)), so again we have to protect against self-assignment (however unlikely).

The logically most complicated String operation is +=, which adds a character to the end of the string, increasing its size by one:

String& String::operator+=(char c)
{
if (sz==short_max) { // expand to long string
int n = sz+sz+2; // double the allocation (+2 because of the terminating 0)
ptr = expand(ptr,n);
space = n–sz–2;
}
else if (short_max<sz) {
if (space==0) { // expand in free store
int n = sz+sz+2; // double the allocation (+2 because of the terminating 0)
char* p = expand(ptr,n);
delete[] ptr;
ptr = p;
space = n–sz–2;
}
else
––space;
}
ptr[sz] = c; // add c at end
ptr[++sz] = 0; // increase size and set terminator

return *this;
}

There is a lot going on here: operator+=() has to keep track of which representation (short or long) is used and whether there is extra space available to expand into. If more space is needed, expand() is called to allocate that space and move the old characters into the new space. If there was an old allocation that needs deleting, it is returned, so that += can delete it. Once enough space is available, it is trivial to put the new character c into it and to add the terminating 0.

Note the calculation of available memory for space. Of all the String implementation that took the longest to get right: its a messy little calculation prone to off-by-one errors. That repeated constant 2 feels awfully like a “magic constant.”

All String members take care not to modify a new representation before they are certain that a new one can be put in place. In particular, they don’t delete until after any possible new operations have been done. In fact, the String members provide the strong exception guarantee (§13.2).

If you don’t like the kind of fiddly code presented as part of the implementation of String, simply use std::string. To a large extent, the standard-library facilities exist to save us from programming at this low level most of the time. Stronger: writing a string class, a vector class, or a map is an excellent exercise. However, once the exercise is done, one outcome should be an appreciation of what the standard offers and a desire not to maintain your own version.

19.3.5. Helper Functions

To complete class String, I provide a set of useful functions, stream I/O, support for range-for loops, comparison, and concatenation. These all mirror the design choices used for std::string. In particular, << just prints the characters without added formatting, and >> skips initial whitespace before reading until it finds terminating whitespace (or the end of the stream):

ostream& operator<<(ostream& os, const String& s)
{
return os << s.c_str(); // §36.3.3
}

istream& operator>>(istream& is, String& s)
{
s = ""; // clear the target string
is>>ws; // skip whitespace (§38.4.5.1)
char ch = ' ';
while(is.get(ch) && !isspace(ch))
s += ch;
return is;
}

I provide == and != for comparison:

bool operator==(const String& a, const String& b)
{
if (a.size()!=b.size())
return false;
for (int i = 0; i!=a.size(); ++i)
if (a[i]!=b[i])
return false;
return true;
}

bool operator!=(const String& a, const String& b)
{
return !(a==b);
}

Adding <, etc., would be trivial.

To support the range-for loop, we need begin() and end() (§9.5.1). Again, we can provide those as freestanding (nonmember) functions without direct access to the String implementation:

char* begin(String& x) // C-string-style access
{
return x.c_str();
}

char* end(String& x)
{
return x.c_str()+x.size();
}

const char* begin(const String& x)
{
return x.c_str();
}

const char* end(const String& x)
{
return x.c_str()+x.size();
}

Given the member function += that adds a character at the end, concatenation operators are easily provided as nonmember functions:

String& operator+=(String& a, const String& b) // concatenation
{
for (auto x : b)
a+=x;
return a;
}
String operator+(const String& a, const String& b) // concatenation
{
String res {a};
res += b;
return res;
}

I feel that I may have slightly “cheated” here. Should I have provided a member += that added a C-style string to the end? The standard-library string does, but without it, concatenation with a C-style string still works. For example:

String s = "Njal ";
s += "Gunnar"; // concatenate: add to the end of s

This use of += is interpreted as operator+=(s,String("Gunnar")). My guess is that I could provide a more efficient String::operator+=(const char*), but I have no idea if the added performance would be worthwhile in real-world code. In such cases, I try to be conservative and deliver the minimal design. Being able to do something is not by itself a good reason for doing it.

Similarly, I do not try to optimize += by taking the size of a source string into account.

Adding _s as a string literal suffix meaning String is trivial:

String operator"" _s(const char* p, size_t)
{
return String{p};
}

We can now write:

void f(const char*); // C-style string
void f(const String&); // our string

void g()
{
f("Madden's"); // f(const char*)
f("Christopher's"_s); // f(const String&);
}

19.3.6. Using Our String

The main program simply exercises the String operators a bit:

int main()
{
String s ("abcdefghij");
cout << s << '\n';
s += 'k';
s += 'l';
s += 'm';
s += 'n';
cout << s << '\n';
String s2 = "Hell";
s2 += " and high water";
cout << s2 << '\n';

String s3 = "qwerty";
s3 = s3;
String s4 ="the quick brown fox jumped over the lazy dog";
s4 = s4;
cout << s3 << " " << s4 << "\n";
cout << s + ". " + s3 + String(". ") + "Horsefeathers\n";

String buf;
while (cin>>buf && buf!="quit")
cout << buf << " " << buf.size() << " " << buf.capacity() << '\n';
}

This String lacks many features that you might consider important or even essential. However, for what it does it closely resembles std::string (Chapter 36) and illustrates techniques used for the implementation of the standard-library string.

19.4. Friends

An ordinary member function declaration specifies three logically distinct things:

[1] The function can access the private part of the class declaration.

[2] The function is in the scope of the class.

[3] The function must be invoked on an object (has a this pointer).

By declaring a member function static (§16.2.12), we can give it the first two properties only. By declaring a nonmember function a friend, we can give it the first property only. That is, a function declared friend is granted access to the implementation of a class just like a member function but is otherwise independent of that class.

For example, we could define an operator that multiplies a Matrix by a Vector. Naturally, Vector and Matrix hide their respective representations and provide a complete set of operations for manipulating objects of their type. However, our multiplication routine cannot be a member of both. Also, we don’t really want to provide low-level access functions to allow every user to both read and write the complete representation of both Matrix and Vector. To avoid this, we declare the operator* a friend of both:

constexpr rc_max {4}; // row and column size

class Matrix;

class Vector {
float v[rc_max];
// ...
friend Vector operator*(const Matrix&, const Vector&);
};

class Matrix {
Vector v[rc_max];
// ...
friend Vector operator*(const Matrix&, const Vector&);
};

Now operator*() can reach into the implementation of both Vector and Matrix. That would allow sophisticated implementation techniques, but a simple implementation would be:

Vector operator*(const Matrix& m, const Vector& v)
{
Vector r;
for (int i = 0; i!=rc_max; i++) { // r[i] = m[i] * v;
r.v[i] = 0;
for (int j = 0; j!=rc_max; j++)
r.v[i] += m.v[i].v[j] *v.v[j];
}
return r;
}

A friend declaration can be placed in either the private or the public part of a class declaration; it does not matter where. Like a member function, a friend function is explicitly declared in the declaration of the class of which it is a friend. It is therefore as much a part of that interface as is a member function.

A member function of one class can be the friend of another. For example:

class List_iterator {
// ...
int* next();
};

class List {
friend int* List_iterator::next();
// ...
};

There is a shorthand for making all functions of one class friends of another. For example:

class List {
friend class List_iterator;
// ...
};

This friend declaration makes all of List_iterator’s member functions friends of List.

Declaring a class a friend grants access to every function of that class. That implies that we cannot know the set of functions that can access the granting class’s representation just by looking at the class itself. In this, a friend class declaration differs from the declaration of a member function and a friend function. Clearly, friend classes should be used with caution and only to express closely connected concepts.

It is possible to make a template argument a friend:

template<typename T>
class X {
friend T;
friend class T; // redundant "class"
// ...
};

Often, there is a choice between making a class a member (a nested class) or a nonmember friend (§18.3.1).

19.4.1. Finding Friends

A friend must be previously declared in an enclosing scope or defined in the non-class scope immediately enclosing the class that is declaring it to be a friend. Scopes outside the innermost enclosing namespace scope are not considered for a name first declared as a friend (§iso.7.3.1.2). Consider a technical example:

class C1 { }; // will become friend of N::C
void f1(); // will become friend of N::C

namespace N {
class C2 { }; // will become friend of C
void f2() { } // will become friend of C

class C {
int x;
public:
friend class C1; // OK (previously defined)
friend void f1();

friend class C3; // OK (defined in enclosing namespace)
friend void f3();
friend class C4; // First declared in N and assumed to be in N
friend void f4();
};

class C3 { }; // friend of C
void f3() { C x; x.x = 1; } // OK: friend of C
} // namespace N

class C4 { }; // not friend of N::C
void f4() { N::C x; x.x = 1; } // error: x is private and f4() is not a friend of N::C

A friend function can be found through its arguments (§14.2.4) even if it was not declared in the immediately enclosing scope. For example:

void f(Matrix& m)
{
invert(m); // Matrix's friend invert()
}

Thus, a friend function should be explicitly declared in an enclosing scope or take an argument of its class or a class derived from that. If not, the friend cannot be called. For example:

// no f() in this scope

class X {
friend void f(); // useless
friend void h(const X&); // can be found through its argument
};

void g(const X& x)
{
f(); // no f() in scope
h(x); // X's friend h()
}

19.4.2. Friends and Members

When should we use a friend function, and when is a member function the better choice for specifying an operation? First, we try to minimize the number of functions that access the representation of a class and try to make the set of access functions as appropriate as possible. Therefore, the first question is not “Should it be a member, a static member, or a friend?” but rather “Does it really need access?” Typically, the set of functions that need access is smaller than we are willing to believe at first. Some operations must be members – for example, constructors, destructors, and virtual functions (§3.2.3, §17.2.5) – but typically there is a choice. Because member names are local to the class, a function that requires direct access to the representation should be a member unless there is a specific reason for it to be a nonmember.

Consider a class X supplying alternative ways of presenting an operation:

class X {
// ...
X(int);

int m1(); // member
int m2() const;

friend int f1(X&); // friend, not member
friend int f2(const X&);
friend int f3(X);
};

Member functions can be invoked for objects of their class only; no user-defined conversions are applied to the leftmost operand of a . or –> (but see §19.2.3). For example:

void g()
{
99.m1(); // error: X(99).m1() not tried
99.m2(); // error: X(99).m2() not tried
}

The global function f1() has a similar property because implicit conversions are not used for non-const reference arguments (§7.7). However, conversions may be applied to the arguments of f2() and f3():

void h()
{
f1(99); // error: f1(X(99)) not tried: non-const X& argument
f2(99); // OK: f2(X(99)); const X& argument
f3(99); // OK: f3(X(99)); X argument
}

An operation modifying the state of a class object should therefore be a member or a function taking a non-const reference argument (or a non-const pointer argument).

Operators that modify an operand (e.g., =, *=, and ++) are most naturally defined as members for user-defined types. Conversely, if implicit type conversion is desired for all operands of an operation, the function implementing it must be a nonmember function taking a const reference argument or a non-reference argument. This is often the case for the functions implementing operators that do not require lvalue operands when applied to fundamental types (e.g., +, –, and ||). However, such operators often need access to the representations of their operand class. Consequently, binary operators are the most common source of friend functions.

Unless type conversions are defined, there appears to be no compelling reason to choose a member over a friend taking a reference argument, or vice versa. In some cases, the programmer may have a preference for one call syntax over another. For example, most people seem to prefer the notation m2=inv(m) for producing a inverted Matrix from m to the alternative m2=m.inv(). On the other hand, if inv() inverts m itself, rather than producing a new Matrix that is the inverse of m, it should be a member.

All other things considered equal, implement operations that need direct access to a representation as member functions:

• It is not possible to know if someone someday will define a conversion operator.

• The member function call syntax makes it clear to the user that the object may be modified; a reference argument is far less obvious.

• Expressions in the body of a member can be noticeably shorter than the equivalent expressions in a global function; a nonmember function must use an explicit argument, whereas the member can use this implicitly.

• Member names are local to a class, so they tend to be shorter than the names of nonmember functions.

• If we have defined a member f() and we later feel the need for a nonmember f(x), we can simply define it to mean x.f().

Conversely, operations that do not need direct access to a representation are often best represented as nonmember functions, possibly in a namespace that makes their relationship with the class explicit (§18.3.6).

19.5. Advice

[1] Use operator[]() for subscripting and for selection based on a single value; §19.2.1.

[2] Use operator()() for call semantics, for subscripting, and for selection based on multiple values; §19.2.2.

[3] Use operator–>() to dereference “smart pointers”; §19.2.3.

[4] Prefer prefix ++ over suffix ++; §19.2.4.

[5] Define the global operator new() and operator delete() only if you really have to; §19.2.5.

[6] Define member operator new() and member operator delete() to control allocation and deallocation of objects of a specific class or hierarchy of classes; §19.2.5.

[7] Use user-defined literals to mimic conventional notation; §19.2.6.

[8] Place literal operators in separate namespaces to allow selective use; §19.2.6.

[9] For nonspecialized uses, prefer the standard string (Chapter 36) to the result of your own exercises; §19.3.

[10] Use a friend function if you need a nonmember function to have access to the representation of a class (e.g., to improve notation or to access the representation of two classes); §19.4.

[11] Prefer member functions to friend functions for granting access to the implementation of a class; §19.4.2.