Optimize Data Structures - Optimized C++ (2016)

Optimized C++ (2016)

Chapter 10. Optimize Data Structures

A thing of beauty is a joy forever

John Keats (1818)

If you have never stopped to marvel at the container classes of the C++ standard library (formerly the Standard Template Library, or STL), perhaps you should do so now. At the time it was introduced into the draft C++ standard in 1994, Stepanov’s Standard Template Library was the first reusable library of efficient containers and algorithms. Prior to the STL, every project developed its own linked list and binary tree implementations, possibly adapting source code from other users. C has no equivalent. Standard library containers have allowed many programmers to forget their algorithms and data structures classes and choose exclusively from the standard library’s menu of prebuilt containers for the past 20 years.

Get to Know the Standard Library Containers

There are many things to like about the C++ standard library containers, such as uniform naming and a consistent notion of iterators for traversing the containers. But for optimization purposes, some properties are especially important. These include:

· Big-O performance guarantees for the cost of insertion and deletion

· Amortized constant-time cost of appending to sequence containers

· Ability to finely control the dynamic memory allocated by the container

The C++ standard library containers seem almost similar enough that they can be substituted for one another, notwithstanding their obviously different implementations. But this is an illusion. The standard library containers are old enough to vote. Like other parts of C++, standard library containers have evolved independently of one another. The interfaces are only partially overlapping. Big-O performance of the same operation differs by container. Most importantly, the semantics of some member functions differ from container to container, even when they have the same name. The developer must know each container class in detail to understand how to use them optimally.

Sequence Containers

The sequence containers std::string, std::vector, std::deque, std::list, and std::forward_list keep items in the order in which they are inserted. Therefore, each container has a front and a back. All sequence containers have means to insert items. All but std::forward_list have constant-time member functions to push items onto the back. However, only std::deque, std::list, and std::forward_list can efficiently push items onto the front.

The items in std::string, std::vector, and std::deque are numbered from 0 to size–1, and can be retrieved efficiently by subscripting. std::list and std::forward_list are different, having no subscripting operator.

std::string, std::vector, and std::deque are built on an array-like internal backbone. When an item is inserted, any item following it must be bumped up to the next location in the array, which makes the cost of inserting anywhere but at the end O(n), where n is the number of items. When items are inserted, these internal arrays may be reallocated, which invalidates all iterators and pointers. By contrast, std::list and std::forward_list invalidate iterators and pointers only to list elements that are removed. Two std::list or std::forward_listinstances can even be spliced together or merged without invalidating iterators. Insertion in the middle of a std::list or std::forward_list takes constant time, assuming that an iterator already points to the insertion point

Associative Containers

Associative containers all store inserted items based on some ordering property of the items, rather than on the order of insertion. All associative containers provide efficient, sub-linear-time access to the items they contain.

Maps and sets present different interfaces. Maps store a separately defined key and value, and thus provide an efficient mapping from key to value. Sets store unique, ordered values, and provide efficient tests for the presence of a value. “Multimaps” differ from maps (and similarly “multisets” differ from sets) only in that they permit insertion of multiple items that compare as equal.

In implementation terms, there are four ordered associative containers: std::map, std::multimap, std::set, and std::multiset. The ordered associative containers require that an ordering relationship like operator<() be defined on the keys (std::map) or items themselves (std::set). The ordered associative containers are implemented as balanced binary trees. It is not necessary to sort an ordered associative container. Iterating through them produces items in order by their ordering relationship. Inserting or removing items takes amortized O(log2 n) time, where n is the number of items in the container.

Although it is possible for maps and sets to have different implementations, in practice all four associative containers are implemented as separate façades over the same balanced binary tree data structure. This was true of the compilers I used. I therefore do not present separate timing results for multimaps, sets, and multisets.

Since the introduction of C++11, there are also four unordered associative containers: std::unordered_map, std::unordered_multimap, std::unordered_set, and std::unordered_multiset. These containers appear in Visual C++ as early as 2010. The unordered associative containers require only an equality relationship to be defined on the keys (std::unordered_map) or items (std::unordered_set). Unordered associative containers are implemented as hash tables. Iterating through an unordered associative container produces the items in no defined order. Inserting or removing items takes constant time on average, although worst-case time is O(n).

Associative containers make obvious choices for lookup tables. The developer can also store items that have an ordering relation in a sequence container, sort the container, and do relatively efficient O(log2 n) searches.

Experimenting with the Standard Library Containers

I built several types of containers with 100,000 elements, and measured performance for insertion, deletion, and visiting each element. In the sequence containers, I also measured the cost of sorting. These are common building blocks for the infinite uses to which data structures may be applied.

This number of elements is enough that the amortized cost of insertion approaches the asymptotic behavior specified for the container—100,000 elements is enough to exercise the cache memory pretty thoroughly. It isn’t a small container by any means, nor is it so impractically large that it would be rare.

That said, big-O performance does not tell the full story. I found that some containers were many times faster than others, even when an operation had O(1) asymptotic cost on both containers being compared.

I also found that unordered maps, with their O(1) search cost, were faster than maps, but not by the wide margin I had expected. And the cost in memory to obtain that performance was significant.

Most types of container provide several ways in which items can be inserted. I discovered that certain ways of inserting were 10% or 15% faster than others, and often with no clear reason why.

The cost of inserting 100,000 elements into a container comes in two parts: the cost of allocating storage, and the cost of copy constructing the elements into the storage. The cost of allocating storage is fixed for items of a specific size, while the cost of copy-construction is not bounded, depending on the whim of the programmer. In the case of a very expensive item copy constructor, the copying cost will come to dominate the cost of building the container. In this case, all containers will perform approximately the same when tested for insertions.

Most types of container also provide several ways to iterate through items. Again, I found certain ways to be significantly faster than others, with no obvious reason for the differences. Interestingly, there was less difference than I expected in the cost to iterate in the various container types.

I tested the cost of sorting the sequence containers to see how they might fare if substituted for associative containers in applications that search tables. Some containers sort their elements as part of inserting them; other containers cannot be sorted at all.

The results reported here are significant enough to be interesting, but are probably fragile. As implementations evolve over time, the fastest method may change. For instance, while stable_sort() consistently outperformed sort(). I suspect this would not have been the case whenstable_sort() was added to the algorithms library.

Element data type

For the items in the sequence containers, I used a key/value structure. The associative containers create a very similar structure built out of std::pair:

struct kvstruct {

char key[9];

unsigned value; // could be anything at all

kvstruct(unsigned k) : value(k)

{

if (strcpy_s(key, stringify(k)))

DebugBreak();

}

bool operator<(kvstruct const& that) const {

return strcmp(this->key, that.key) < 0;

}

bool operator==(kvstruct const& that) const {

return strcmp(this->key, that.key) == 0;

}

};

The copy constructor for this class is generated by the compiler, but it is nontrivial, having to bit-copy the contents of one kvstruct to another. As before, my goal was to make copying and comparison at least a little expensive to simulate real-world data structures.

The keys themselves were C-style null-terminated character strings comprised of seven-digit numbers. The keys were drawn from a uniform random distribution using C++’s <random> header. The same number was stored as an unsigned integer as the item’s value field. Duplicate keys were eliminated, to produce an initial vector of 100,000 distinct values in no particular order.

A note on designing the experiment

Some containers are very inexpensive to insert into or traverse, even with 100,000 items. To get the test to run for a measurable amount of time, I had to repeat the insertion or traversal 1,000 times. But this presented a problem. Each time I inserted items into a container, I had also to clear out the container by deleting the items, which affected the total run time. For instance, the following code measures the cost of assigning one vector to another. It inevitably and inextricably combines the cost of constructing a new copy of random_vector and then deleting it:

{ Stopwatch sw("assign vector to vector + delete x 1000");

std::vector<kvstruct> test_container;

for (unsigned j = 0; j < 1000; ++j) {

test_container = random_vector;

std::vector<kvstruct>().swap(test_container);

}

}

To obtain separate costs for the assignment and the deletion, I constructed a more complex version of this code to accumulate separately the time taken to create the new copy and to delete it:

{ Stopwatch sw("assign vector to vector", false);

Stopwatch::tick_t ticks;

Stopwatch::tick_t assign_x_1000 = 0;

Stopwatch::tick_t delete_x_1000 = 0;

std::vector<kvstruct> test_container;

for (unsigned j = 0; j < 1000; ++j) {

sw.Start("");

test_container = random_vector;

ticks = sw.Show("");

assign_x_1000 += ticks;

std::vector<kvstruct>().swap(test_container);

delete_x_1000 += sw.Stop("") - ticks;

}

std::cout << " assign vector to vector x 1000: "

<< Stopwatch::GetMs(assign_x_1000)

<< "ms" << std::endl;

std::cout << " vector delete x 1000: "

<< Stopwatch::GetMs(delete_x_1000)

<< "ms" << std::endl;

}

The first statement in the loop, sw.Start("");, starts the stopwatch without printing anything. The next statement, test_container = random_vector;, consumes time copying the vector. The third statement, ticks = sw.Show("");, sets ticks to the elapsed time so far.

What value is in ticks? The source of ticks in the Stopwatch instance sw has a 1-millisecond tick. The time taken by the assignment is far less than 1 millisecond, so mostly the value in ticks is 0. But not always. The clock is independent of this code, ticking steadily somewhere in the hardware. So occasionally, a situation arises where, for instance, the stopwatch started during the 987th microsecond of a 1-millisecond tick, and by the time the assignment statement completed, the tick had occurred. In this case, ticks will equal 1. If the assignment takes 500 microseconds, this will happen about half the time. If the assignment takes 10 microseconds, it will happen about 1% of the time. Given enough repetitions of the loop, an accurate aggregate time emerges.

ticks is accumulated in assign_x_1000, the variable that counts the time taken by the assignment. Then, the statement std::vector().swap(test_container); deletes the contents of the vector test_container. Finally, delete_x_1000 += sw.Stop("") - ticks; gets another tick count that is either zero or one, subtracts the tick count from the end of the assignment, and accumulates the difference in delete_x_1000. The measured cost of deleting the vector 1,000 times is 111 milliseconds, or 0.111 milliseconds per deletion.

Armed now with the cost of deleting a container with 100,000 entries, the remaining code can be tackled with arithmetic. The following code is another loop that fills a container 1,000 times, which also must include the cost of deleting the container:

{ Stopwatch sw("vector iterator insert() + delete x 1000");

std::vector<kvstruct> test_container;

for (unsigned j = 0; j < 1000; ++j) {

test_container.insert(

test_container.begin(),

random_vector.begin(),

random_vector.end());

std::vector<kvstruct>().swap(test_container);

}

}

A particular test run of the preceding code took 696 milliseconds to fill the container and delete the container 1,000 times. If the cost of deleting the vector 1,000 times, as measured earlier, is 111 milliseconds, the cost of a single call to insert() is StartFraction 696 minus 111 Over 1000 EndFraction equals 0.585 milliseconds.

MODERN C++ CODING NOTE

There is a little-remarked standard library in C++ for generating random numbers, called <random>. Once I discovered this library, it became one of my favorite tools for generating random search keys. For instance, Example 10-1 shows the code I used to generate the random strings for my container tests.

Example 10-1. Creating a vector of unique kvstruct instances

# include <random>

// build count-element vector containing unique random strings

void build_rnd_vector(std::vector<kvstruct>& v, unsigned count){

std::default_random_engine e;

std::uniform_int_distribution<unsigned> d(count, 10*count-1);

auto randomizer = std::bind(d,e);

std::set<unsigned> unique;

v.clear();

while (v.size() < count) {

unsigned rv = randomizer();

if (unique.insert(rv).second == true) { // item inserted

kvstruct keyvalue(rv);

v.push_back(keyvalue);

}

}

}

The first line of build_rnd_vector() constructs a random number generator, basically a source of randomness. The second line creates a random number distribution, an object that transforms sequences of random numbers from the generator into sequences of numbers that follow some probability distribution. In this case, the distribution is uniform, which means each value is equally likely to occur, with minimum value count and maximum value 10*count-1. So, if count is 100,000, the values provided by the distribution will range from 100,000 to 999,999. That is, they will all be six digits long. The third line creates an object that applies the generator as an argument to the distribution, so that calling the object’s operator() generates a random number.

The generators are all documented and have known properties. There is even a generator called std::random_device that produces values from a source of true randomness, if one is available.

Distributions provide the power in this library. For instance, here are a few useful distributions:

std::uniform_int_distribution<unsigned> die(1, 6);

The distribution of a fair, six-sided die, which produces numbers from 1 to 6 with equal probability. Dice with 4, 20, or 100 sides can be simulated by varying the second argument.

std::binomial_distribution<unsigned> coin(1, 0.5);

The distribution of a fair coin toss, which produces values of 0 or 1 with equal probability. A biased coin could be simulated by adjusting the second argument away from 0.5.

std::normal_distribution<double> iq(100.0, 15.0);

The distribution of assessed intelligence in the human population, returning values of type double such that about two-thirds of the results are between 85.0 and 115.0.

For more refined statistical tastes, there are Poisson and exponential distributions for building event simulations (also known as test drivers), and several population-based distributions.

std::vector and std::string

A sort of “product sheet” for these data structures would say:

· Sequence container

· Insert time: from back aggregate O(1), from elsewhere O(n)

· Index time: by position, in O(1)

· Sort in O(n log2 n)

· Search in O(log2 n) if sorted, else O(n)

· Iterators and references invalidated when the internal array is reallocated

· Iterators produce items front-to-back or back-to-front

· Reasonable control over allocated capacity independent of size

Historically, std::string was permitted novel implementations, but C++11 locked down the definition. Visual Studio’s implementation might as well be a derived class of std::vector with specialized member functions for string handling. Comments about std::vector apply equally to Visual Studio’s std::string.

std::vector is a dynamically resizable array (see Figure 10-1). Array items are instances of the template type parameter T that are copy-constructed into the vector. Although the item copy constructors may allocate memory for members, the only calls std::vector makes to the memory manager are to reallocate its internal buffer as items are added. This flat structure makes std::vector unusually efficient. Bjarne Stroustrup, creator of C++, recommends std::vector as the go-to container class unless there is a specific reason why another container is needed. I’ll show why in this section.

std::vector is a dynamic buffer of instances of its parameter type

Figure 10-1. std::vector possible implementation

Many operations on std::vector are efficient in big-O terms, running in constant time. Among these operations are pushing a new item onto the end of the vector, and obtaining a reference to the ith element. Because of the vector’s simple internal structure, these operations are quite fast in absolute terms as well. Iterators over std::vector are random-access iterators, which means computing the distance between two iterators in the same vector can be done in constant time. This property makes divide-and-conquer searches and sorts of std::vector efficient.

Performance Consequences of Reallocation

std::vector has a size, which describes how many elements are currently in the vector, and a capacity, which describes how big the internal buffer that holds its elements is. When size == capacity, any further insertions trigger an expensive expansion: reallocating the internal buffer, copying the vector elements to the new storage, and invalidating all iterators and references to the old buffer. When reallocation becomes necessary, the capacity of the new buffer is set to some multiple of the new size. The effect of this is that the cost of insertion is constant in the aggregate, though some insertions are expensive while others are not.

One secret of efficient std::vector use is that capacity can be reserved ahead of need by a call to void reserve(size_t n), thus preventing unnecessary reallocate-and-copy cycles.

Another secret of std::vector’s efficiency is that it doesn’t automatically return memory to the memory manager if items are removed. If a program pushes a million items onto a vector and then removes them, the vector continues to hold onto a million items’ worth of storage. Developers must keep this fact in mind when std::vector is used in constrained environments.

Several member functions of std::vector affect its capacity, but the standard is cagey about making any guarantees. void clear() sets the size of the container to zero, but is not guaranteed to reallocate the internal buffer to reduce its capacity. In C++11, and anticipated in Visual Studio 2010, void shrink_to_fit() is a hint to the vector to reduce its capacity to match its current size, but reallocation is not mandatory.

To ensure the release of memory from a vector in all versions of C++, use the following trick:

std::vector<Foo> x;

...

vector<Foo>().swap(x);

This statement constructs a temporary, empty vector, swaps its contents with the contents of vector x, then deletes the temporary, so that the memory manager reclaims all memory formerly belonging to x.

Inserting and Deleting in std::vector

There are several ways to fill a vector with data. I explored the cost to construct a vector of 100,000 kvstruct instances, determining that there are definitely faster and slower methods.

The fastest way to fill up a vector is to assign it:

std::vector<kvstruct> test_container, random_vector;

...

test_container = random_vector;

Assignment is very efficient because it knows the size of the vector it is copying, and needs to call the memory manager only once to create the assigned vector’s internal buffer. A test of the previous statement copied a 100,000-entry vector in 0.445 milliseconds.

If the data is in another container, std::vector::insert() will copy it into a vector:

std::vector<kvstruct> test_container, random_vector;

...

test_container.insert(

test_container.end(),

random_vector.begin(),

random_vector.end());

A test of this statement copied a 100,000-entry vector in 0.696 milliseconds.

The member function std::vector::push_back() can efficiently (that is, in constant time) put a new item onto the end of a vector. Since the items are in another vector, there are three ways to get at them:

· The code can use a vector iterator:

· std::vector<kvstruct> test_container, random_vector;

· ...

· for (auto it=random_vector.begin(); it!=random_vector.end(); ++it)

· test_container.push_back(*it);

· The std::vector::at() member function can be used:

· std::vector<kvstruct> test_container, random_vector;

· ...

· for (unsigned i = 0; i < nelts; ++i)

· test_container.push_back(random_vector.at(i));

· The code can use a vector subscript directly:

· std::vector<kvstruct> test_container, random_vector;

· ...

· for (unsigned i = 0; i < nelts; ++i)

· test_container.push_back(random_vector[i]);

The three methods produced similar times of 2.26, 2.05, and 1.99 milliseconds, respectively, in my tests. However, this is six times the amount of time taken by the simple assignment statement.

The reason this code is slower is that it inserts items into the vector one at a time. The vector doesn’t know how many items will be inserted, so it makes its internal buffer bigger incrementally. Several times during this loop, the vector’s internal buffer is reallocated, and all the items are copied to the new buffer. std::vector guarantees that in the aggregate, push_back() happens in constant time, but that doesn’t mean it has no cost.

The developer can use extra knowledge to make this loop more efficient, by pre-allocating a buffer big enough to hold the whole copy. The iterator variant of this code looks like this:

std::vector<kvstruct> test_container, random_vector;

...

test_container.reserve(nelts);

for (auto it=random_vector.begin(); it != random_vector.end(); ++it)

test_container.push_back(*it);

The resulting loop runs in a respectable 0.674 milliseconds.

There are still more ways to insert items into a vector. Another variant of the insert() member function can be used:

std::vector<kvstruct> test_container, random_vector;

...

for (auto it=random_vector.begin(); it != random_vector.end(); ++it)

test_container.insert(test_container.end(), *it);

This ought to be exactly as expensive as push_back(), but it isn’t (in Visual Studio 2010). All three variants (iterator, at(), and subscript) run in about 2.7 milliseconds. Reserving space in advance reduces this to 1.45 milliseconds, but this is not competitive with any of the previous results.

Finally, we come to std::vector’s super-weakness: inserting at the front. std::vector does not provide a push_front() member, because the time cost is O(n). Inserting at the front is inefficient because every item in the vector must be copied to make room for the new entry. And it is, indeed, inefficient. The following loop:

std::vector<kvstruct> test_container, random_vector;

...

for (auto it=random_vector.begin(); it != random_vector.end(); ++it)

test_container.insert(test_container.begin(), *it);

takes 8,065 milliseconds. That’s a comma there, not a decimal point. This loop takes almost three thousand times as long as inserting at the back.

So, to fill up a vector efficiently, try assignment, insert() with iterators from another container, push_back(), and insert() at the back of the vector, in that order.

Iterating in std::vector

Vectors are inexpensive to traverse, visiting each element. But as with insertion, the time cost of the available methods differs significantly.

There are three ways to iterate through a vector: with an iterator, with the at() member function, or by subscripting. If the controlled action of the loop is expensive, the difference in cost of the various traversal methods becomes insignificant. However, developers often perform only a simple, fast action on the data in each element. In this example, the loop sums the contents, which takes a negligible amount of time (it also keeps the compiler from optimizing away the whole loop as a no-op):

std::vector<kvstruct> test_container;

...

unsigned sum = 0;

for (auto it=test_container.begin(); it!=test_container.end(); ++it)

sum += it->value;

std::vector<kvstruct> test_container;

...

unsigned sum = 0;

for (unsigned i = 0; i < nelts; ++i)

sum += test_container.at(i).value;

std::vector<kvstruct> test_container;

...

unsigned sum = 0;

for (unsigned i = 0; i < nelts; ++i)

sum += test_container[i].value;

It is reasonable for the developer to expect these loops to be nearly equivalent in cost, but in fact they are not. The iterator version takes 0.236 milliseconds. The at() version is slightly better, at 0.230 milliseconds, but as with insertion, the subscript version is more efficient, taking just 0.129 milliseconds. The subscripting version is 83% faster in Visual Studio 2010.

Sorting std::vector

You can sort a vector efficiently prior to using a binary search to look up items. The C++ standard library has two sorting algorithms, std::sort() and std::stable_sort(). Both algorithms run in O(n log2 n) time if the container’s iterators are random-access iterators, asstd::vector’s are. Both algorithms run somewhat faster on data that is already sorted. Sorting is accomplished by a brief program fragment:

std::vector<kvstruct> sorted_container, random_vector;

...

sorted_container = random_vector;

std::sort(sorted_container.begin(), sorted_container.end());

The results are summarized in Table 10-1.

std::vector

VS2010 release, i7, 100k items

std::sort() vector

18.61 ms

std::sort() sorted vector

3.77 ms

std::stable_sort()

16.08 ms

std::stable_sort() sorted

5.01 ms

Table 10-1. Cost of sorting a vector of 100,000 items

Lookup with std::vector

The following program fragment looks for every key from random_vector in sorted_container:

std::vector<kvstruct> sorted_container, random_vector;

...

for (auto it=random_vector.begin(); it!=random_vector.end(); ++it) {

kp = std::lower_bound(

sorted_container.begin(),

sorted_container.end(),

*it);

if (kp != sorted_container.end() && *it < *kp)

kp = sorted_container.end();

}

This program looked up 100,000 keys in the sorted vector in 28.92 milliseconds.

std::deque

The “product sheet” for deque looks like this:

· Sequence container

· Insert time: from back or front O(1), from elsewhere O(n)

· Index time: by position, in O(1)

· Sort in O(n log2 n)

· Search in O(log2 n) if sorted, else O(n)

· Iterators and references invalidated when the internal array is reallocated

· Iterators produce items front-to-back or back-to-front

std::deque is a specialized container for creating first-in, first-out queues (FIFO). Insertion and deletion at either end is constant-time. Subscripting is also constant-time. The iterators are random-access like those of std::vector, so std::deque can be sorted in O(n log2 n) time.

Because std::deque makes the same performance guarantees (in big-O terms) as std::vector, and has constant-time insertion at both ends, it’s tempting to wonder what the big deal is about std::vector. However, the constant of proportionality of all these operations is greater for deques than for vectors. Measured performance of common operations involving deques is 3 to 10 times slower than corresponding operations in vectors. Iterating, sorting, and lookup are relative bright spots for deques, being only about 30% slower than for vectors.

std::deque is typically implemented as an array of arrays (Figure 10-2). The two indirections needed to get to an item in the deque reduces cache locality, and the cost of more frequent calls to the memory manager is greater than with std::vector.

Pushing an item onto either end of a deque may cause at most two calls to the allocator: one to add another block of items, and, less frequently, another to extend the deque backbone. This allocation behavior is more complex, and thus harder to reason about, than the allocation behavior of vectors. std::deque does not offer any equivalent to std::vector’s reserve() member function to preallocate its internal data structures. The deque may seem like an obvious container with which to implement a FIFO queue. There is even a container adapter template calledstd::queue, for which deque is the default implementation. However, there is no guarantee that the allocation performance will be very good in this use.

std::deque is a dynamic array. A backbone array points to blocks of instances of its parameter type

Figure 10-2. std::deque possible implementation after a few insertions and deletions

Inserting and Deleting in std::deque

std::deque provides the same interface for insertion as does std::vector, plus the member push_front().

Here is an assignment of one deque to another. This operation took 5.70 milliseconds:

std::deque<kvstruct> test_container;

std::vector<kvstruct> random_vector;

...

test_container = random_vector;

Insertion into a deque using a pair of iterators looks like this. This operation took 5.28 milliseconds:

std::deque<kvstruct> test_container;

std::vector<kvstruct> random_vector;

...

test_container.insert(

test_container.end(),

random_vector.begin(),

random_vector.end());

Here are three ways to copy items from a vector to a deque using push_back():

std::deque<kvstruct> test_container;

std::vector<kvstruct> random_vector;

...

for (auto it=random_vector.begin(); it!=random_vector.end(); ++it)

test_container.push_back(*it);

for (unsigned i = 0; i < nelts; ++i)

test_container.push_back(random_vector.at(i));

for (unsigned i = 0; i < nelts; ++i)

test_container.push_back(random_vector[i]);

A developer could be excused for guessing that these three loops would cost pretty much the same. Maybe at() should be a tiny bit slower due to the extra checking it does. Indeed, the iterator version at 4.33 milliseconds was 15% faster than the subscript version at 5.01 milliseconds, with the at() version in between at 4.76 milliseconds. This is not a tremendous difference; maybe not the big game for which an optimization effort would normally want to hunt.

Results were similar for the three variants of prepending using push_front(). The iterator version took 5.19 milliseconds and the subscripting version 5.55 milliseconds—a difference of only 7%, bordering on the unrepeatable. However, the push_front() results were 20% slower than for push_back().

Inserting at the back and at the front was about twice as expensive as push_back() or push_front(), respectively.

Now take a look at the performance of std::vector versus std::deque. A vector performs an assignment 13 times faster for the same number of entries. A vector is 22 times faster at deletion, 9 times faster at iterator-based insertion, twice as fast at push_back(), and 3 times as fast when inserting at the end.

OPTIMIZATION WAR STORY

When I began performance-testing deque, I found a surprise: Operations on std::deque were a thousand times slower than equivalent std::vector operations. At first I told myself, “It is what it is. Deque must simply be an awful choice for a data structure.” It was only when I ran a “final” set of tests for the tables in this book that my foolishness was revealed.

My normal practice in development was to run the test programs in the debugger, because there’s a big fat button for that on the IDE. I was aware that debug runs are linked to a C++ runtime library with extra debug checking. But I had never seen this make more than a slight difference in performance. For the tables in the book, it was my practice to make a final run outside the debugger, because I got more consistent timings that way. That’s how I found that for std::deque, running under the debugger is monstrously expensive due to diagnostic code added to memory allocation routines. This was an exception to my general experience that measuring relative performance in debug builds produced similar results to measuring relative performance in release builds. It is possible to control whether the debug heap or normal heap is ued in debugging. See “Optimizing Pro Tip”.

Iterating in std::deque

Iterating through the elements of a deque took 0.828 milliseconds for the subscript-based version and 0.450 milliseconds for the iterator-based code. Interestingly, the iterator-based traversal is much faster for the deque, while the subscript-based code was faster for the vector. But the fastest deque traversal method is twice as expensive as the fastest vector traversal method, continuing the trend set previously.

Sorting std::deque

std::sort() processed 100,000 deque entries in 24.82 milliseconds, which is 33% more than for the vector. As for the vector, std::stable_sort() was faster at 17.76 milliseconds, and the time was within 10% of the time for the vector. In both cases, the presorted vector was faster to re-sort.

Lookup with std::deque

It took 35 milliseconds to look up all 100,000 keys in a sorted deque. Lookup was only about 20% more expensive for the deque than for the vector.

std::list

The “product sheet” for this data structure says:

· Sequence container

· Insert time: O(1) at any position

· Sort in O(n log2 n)

· Search in O(n)

· Iterators and references never invalidated except for removed items

· Iterators visit list items front-to-back or back-to-front

std::list shares many properties with std::vector and std::deque. Like with vector and deque, items can be pushed onto the back of a list in constant time. Like with a deque (but unlike a vector), items can be pushed onto the front of a list in constant time. Furthermore, unlike either a vector or deque, items can also be inserted into the middle of a list in constant time, given an iterator to the insertion point. Like a deque and a vector, a list can be efficiently sorted. But unlike with vectors and deques, there is no efficient way to search a list. std::find(), which takes O(n) time, is as good as you can do.

The received wisdom is that std::list is too inefficient to be useful, but I did not find that when I tested its performance. While std::list is maybe 10 times as expensive to copy or create as std::vector, it is quite competitive with std::deque. Pushing incrementally onto the tail of a list is less than twice as expensive as for a vector. Traversing and sorting a list are only 30% more expensive than the same operations on a vector. For most of the operations I tested, std::list was less expensive than std::deque.

The same received wisdom holds that std::list, with its forward and reverse links and constant-time size() method, is more expensive than necessary for the features it provides. This thinking led eventually to the inclusion of std::forward_list in C++11. However, a little performance testing shows that std::list is almost identical to std::forward_list in the cost of its operations, at least on PC hardware.

Because std::list has no backbone array that must be reallocated, iterators and references to list items are never invalidated by insertion. They become invalid only if the items they point to are deleted.

One place std::list shines is that lists can be spliced (in O(1) time) and merged without copying list items. Even operations like splice and sort don’t invalidate std::list iterators. Insertion into the middle of a list is constant-time, provided the program already knows where to insert. So, an application that creates lists of items and then shuffles them around might be more efficient using std::list than std::vector.

std::list interacts with the memory manager in a very simple and predictable way. Each list item is separately allocated when needed. There is no unused capacity hiding in a list (see Figure 10-3).

std::list is a doubly-linked list of nodes containing the list parameter type T

Figure 10-3. std::list possible implementation after a few insertions and deletions

The storage allocated for each list item is the same size. This helps sophisticated memory managers operate efficiently, and with less risk of fragmenting memory. It is also possible to define a simple custom allocator for std::list that exploits this property to operate efficiently (see “A Fixed-Block Allocator”).

Inserting and Deleting in std::list

The algorithms for copying one list to another by insert(), push_back(), and push_front() are identical to the listings for a vector and deque, except for the data structure declaration at the beginning. The very simple structure of std::list does not offer the compiler much chance to improve the code during compilation. Timings for all these mechanisms are thus consistent for lists, as illustrated in Table 10-2.

std::list, 100k items, VS2010 release, i7

Time

List versus vector

Assign

5.10 ms

1046%

Delete

2.49 ms

2141%

insert(end())

3.69 ms

533%

Iterator push_back()

4.26 ms

88%

at() push_back()

4.50 ms

120%

Subscript push_back()

4.63 ms

132%

Iterator push_front()

4.77 ms

at() push_front()

4.82 ms

Subscript push_front()

4.99 ms

Iterator insert at back

4.75 ms

75%

at() insert at back

4.84 ms

77%

Subscript insert at back

4.88 ms

75%

Iterator insert at front

4.84 ms

at() insert at front

5.02 ms

Subscript insert at front

5.04 ms

Table 10-2. Summary of performance experiments on std::list

Inserting at the end of a list was the fastest way to construct a list; for some reason, even faster than operator=().

Iterating in std::list

There is no subscripting operator for lists. The only option to traverse the list uses iterators.

Iterating through the list of 100,000 items took 0.326 milliseconds. This was only 38% more expensive than traversing a vector.

Sorting std::list

The iterators over std::list are bidirectional iterators, less powerful than the random-access iterators over std::vector. A particular property of these iterators is that the cost to find the distance, or number of items between two bidirectional iterators, is O(n). Thus, std::sort() has O(n2) performance on std::list. The compiler will still compile a call to std::sort() on a list, but the performance will be far worse than a developer might expect.

Fortunately, std::list has a built-in sort that is more efficient, at O(n log2 n). Sorting the list using std::list’s built in sort() took 23.2 milliseconds, only 25% longer than sorting the equivalent vector.

Lookup with std::list

Because std::list provides only bidirectional iterators, the binary-search algorithms are all O(n) for lists. Searching with std::find() is also O(n), where n is the number of entries in the list. This makes std::list a poor candidate to replace associative containers.

std::forward_list

The “product sheet” for std::forward_list says:

· Sequence container

· Insert time: O(1) at any position

· Sort in O(n log2 n)

· Search in O(n)

· Iterators and references never invalidated except for removed items

· Iterators visit list items front-to-back

std::forward_list is a sequence container stripped down to its bare essentials. It contains a single pointer to the head node of the list. It was designed deliberately to make its implementation equivalent to a handcoded, singly linked list. There is no back() or rbegin() member function.

std::forward_list interacts with the memory manager in a very simple and predictable way. Each forward list item is separately allocated when needed. There is no unused capacity hiding in a forward list (see Figure 10-4). The storage allocated for each forward list item is the same size. This helps sophisticated memory managers operate efficiently, and with less risk of fragmenting memory. It is also possible to define a simple custom allocator for std::forward_list that exploits this property to operate efficiently (see “A Fixed-Block Allocator”).

forward_list is a collection of nodes threaded by pointers to the next node

Figure 10-4. std::forward_list possible implementation

A forward list differs from a list by offering only forward iterators, as the name suggests. A forward list can be traversed with the usual loop:

std::forward_list<kvstruct> flist;

// ...

unsigned sum = 0;

for (auto it = flist.begin(); it != flist.end(); ++it)

sum += it->value;

Insertion, however, requires a different approach. Instead of an insert() method, std::forward_list has insert_after(). There is also a function called before_begin() to get an iterator to before the first list element (there is no other way to insert before the first element because the elements only have pointers to the next element):

std::forward_list<kvstruct> flist;

std::vector<kvstruct> vect;

// ...

auto place = flist.before_begin();

for (auto it = vvect.begin(); it != vect.end(); ++it)

place = flist.insert_after(place, *it);

std::forward_list wasn’t significantly faster than std::list on my PC. The things that made std::list slow (allocation per item, poor cache locality) were just as big a problem for std::forward_list. std::forward_list may be useful on smaller processors with tighter memory constraints, but for desktop- and handset-class processors, there is little to recommend it.

Inserting and Deleting in std::forward_list

std::forward_list has constant-time insertion at any position, provided there is an iterator pointing to just before that position. Inserting 100,000 entries into a forward list took 4.24 milliseconds, about the same as for std::list.

std::forward_list has a push_front() member function. Inserting 100,000 entries in this way took 4.16 milliseconds, again about the same as for std::list.

Iterating in std::forward_list

There is no subscripting operator for std::forward_list. The only option to traverse the list uses iterators.

Iterating through the forward list of 100,000 items took 0.343 milliseconds. This was only 45% more expensive than traversing a vector.

Sorting std::forward_list

Like std::list, std::forward_list has a built-in sort that runs in O(n log2 n). The performance of the sort was similar to that of std::list, sorting 100,000 entries in 23.3 milliseconds.

Lookup in std::forward_list

Because std::forward_list provides only forward iterators, the binary-search algorithms are O(n) for forward lists. Searching with the far simpler std::find() is also O(n), where n is the number of entries in the forward list. This makes the forward list a poor candidate to replace associative containers.

std::map and std::multimap

The “product sheet” for these data structures says:

· Ordered associative container

· Insert time: O(log2 n)

· Index time: by key O(log2 n)

· Iterators and references never invalidated except for removed items

· Iterators produce items in sorted or reverse-sorted order

std::map maps instances of a key type to corresponding instances of some value type. std::map is a node-based data structures, like std::list. However, a map orders its nodes according to the value of a key. Internally, a map is implemented as a balanced binary tree with additional links to facilitate iterator-based traversal (see Figure 10-5).

std::map is a balanced binary tree of nodes containing the key and value

Figure 10-5. std::map simplified possible implementation

Although std::map is implemented using a tree, it is not a tree. There is no way to inspect the links, perform a breadth-first traversal, or do other tree-ish things on a map.

std::map’s interaction with the memory manager is simple and predictable. Each map item is separately allocated when needed. std::map has no backbone array that might be reallocated, so iterators and references to map items are never invalidated by insertion. They become invalid only if the items to which they point are deleted.

The storage allocated for each list item is the same size. This helps sophisticated memory managers operate efficiently, and with less risk of fragmenting memory. It is also possible to define a simple custom allocator for std::map that exploits this property to operate efficiently (see “A Fixed-Block Allocator”).

Inserting and Deleting in std::map

Inserting 100,000 random entries from a vector iterator into a std::map took 33.8 milliseconds.

Inserting into a map is normally O(log2 n) due to the need to traverse the map’s internal tree looking for the insert point. This cost is high enough that std::map provides a version of insert() that takes an extra map iterator serving as a hint that inserting at this point might be more efficient. If the hint is optimal, insertion time goes to amortized O(1).

There is good news and bad news about the insertion hint. The good news is that the insert-with-hint can never be more expensive than an ordinary insert. The bad news is that the optimum value recommended for the insertion hint changed for C++11. Before C++11, the optimum value for the insertion hint was the position before the new item—that is, if items are inserted in sorted order, the position of the previous insertion. Since C++11, the optimal insertion hint is the position after the new item; that is, if items are inserted in sorted order, the position hint should be end(). To make the previously inserted item the optimal position, the program should iterate through the sorted input in reverse, as shown in Example 10-2.

Example 10-2. Insert from sorted vector using C++11-style hint

ContainerT test_container;

std::vector<kvstruct> sorted_vector;

...

std::stable_sort(sorted_vector.begin(), sorted_vector.end());

auto hint = test_container.end();

for (auto it = sorted_vector.rbegin(); it != sorted_vector.rend(); ++it)

hint = test_container.insert(hint, value_type(it->key, it->value));

As my experience with both GCC and Visual Studio 2010 showed, the standard library implementation may either anticipate or trail the latest standard. As a result, a program optimized using the pre-C++11-style hint might slow down when moved to a newer compiler, even if the compiler does not fully conform to C++11.

I ran an insertion test using three hints: end(), an iterator to the predecessor node as specified for standard libraries prior to C++11, and an iterator to the successor node as specified for C++11. Table 10-3 shows the results. To perform this test, the input also had to be sorted.

Experiment

Time/call

Sorted vector insert()

18.0 ms

Sorted vector insert() end hint

9.11 ms

Sorted vector insert() pre-C++11 hint

14.4 ms

Sorted vector insert() C++11 hint

8.56 ms

Table 10-3. Performance of insert-with-hint into a std::map

It appears that Visual Studio 2010 implements the C++11-style hint. But either hint performed better than no hint at all, and better than the no-hint version of insert() and unsorted input.

Optimizing the check-and-update idiom

In a commonly occurring idiom, a program checks to see whether some key is in the map, then performs an action depending on the result. A performance optimization is possible when the action involves inserting or updating the value corresponding to the searched-for key.

The key to understanding the optimization is that map::find() and map::insert() both cost O(log2 n) because of the need to check for the presence of the key and later to find the insertion point. Both of these operations traverse the same set of nodes in the map’s binary tree data structure:

iterator it = table.find(key); // O(log n)

if (it != table.end()) {

// key found path

it->second = value;

}

else {

// key not found path

it = table.insert(key, value); // O(log n)

}

If the program captures the result of the first search, it can use that as a hint to insert(), making the insert run in O(1). There are two ways to improve this idiom, depending on the needs of the program. If all you need is to know whether the key was initially found, you can use a version ofinsert() that returns a pair containing an iterator to the found or inserted entry, and a bool that is true if the entry was found and false if it was inserted. This solution works if the program knows how to initialize the entry before finding out whether it is present or not, or if the value is inexpensive to update:

std::pair<value_t, bool> result = table.insert(key, value);

if (result.second) {

// key found path

}

else {

// key not found path

}

The second method involves finding the key or the insertion point through a call to lower_bound() for a C++98-style hint, or upper_bound() for a C++11-style hint. lower_bound() returns an iterator to the smallest entry in the map whose key is less than the search key, or to end(). This iterator is the insertion hint if the key must be inserted, and it points to the key if an existing entry must be updated. This method makes no assumptions about the entry to be inserted:

iterator it = table.lower_bound(key);

if (it == table.end() || key < it->first) {

// key not found path

table.insert(it, key, value);

}

else {

// key found path

it->second = value;

}

Iterating in std::map

Iterating through the 100,000-item map took 1.34 milliseconds, about 10 times longer than iterating through a vector.

Sorting std::map

Maps are inherently sorted. Iterating through a map produces entries in order by their key and the search predicate in use. Note that it isn’t possible to re-sort a map by another sorting predicate without copying all elements to another map.

Lookup with std::map

Looking up all 100,000 entries in the map took 42.3 milliseconds. By contrast, it took 28.9 milliseconds to look up all 100,000 entries in a sorted vector and 35.1 milliseconds for a sorted deque, using std::lower_bound(). Table 10-4 summarizes the performance of a vector and a map when used as a lookup table.

Insert+sort

Lookup

Vector

19.1 ms

28.9 ms

Map

33.8 ms

42.3 ms

Table 10-4. Insertion and lookup time for vector versus map

For a 100,000-element table that is constructed all at once and then searched repeatedly, a vector-based implementation will be faster. If the table will change frequently, with insertions or deletions, re-sorting the vector-based table will consume any advantage it may have had in search time.

std::set and std::multiset

The “product sheet” for these data structures says:

· Ordered associative container

· Insert time: O(log2 n)

· Index time: by key O(log2 n)

· Iterators and references never invalidated except for removed items

· Iterators produce items in sorted or reverse-sorted order

I did not performance-test std::set. On Windows, std::set and std::multiset use the same data structure as std::map (see Figure 10-6), so the performance characteristics are the same as for a map. Although a set could in principle be implemented using a different data structure, there’s no reason why it should be.

std::set is a balanced binary tree of nodes containing the key and value

Figure 10-6. std::set simplified possible implementation

One difference between std::map and std::set is that the items returned by lookup are const. This is less of a problem than it seems. If you really want to use the set abstraction, fields in the value data type that don’t participate in the ordering relationship can be declared mutable, specifying that they don’t participate in the ordering relationship. Of course, the compiler believes the developer unconditionally, so it’s important not to change members that participate in the ordering relation, or the set data structure will become invalid.

std::unordered_map and std::unordered_multimap

The “product sheet” for these data structures says:

· Unordered associative container

· Insert time: O(1) average, O(n) worst case

· Index time: by key O(1) average, O(n) worst case

· Iterators invalidated on rehash, references invalidated only for removed items

· Capacity can be increased or decreased independent of size

std::unordered_map maps instances of a key type to corresponding instances of some value type. In this way it is similar to std::map. However, the mapping is accomplished differently. std::unordered_map is implemented as a hash table. Keys are converted to an integer hash, which is used as an array index to find the value in amortized average constant time.

The C++ standard constrains the implementation of std::unordered_map, just as it does for std::string. So, while there are several ways in which a hash table might be implemented, only a design with a dynamically allocated backbone array of buckets pointing to linked lists of dynamically allocated nodes is likely to comply with the standard’s definitions.

An unordered map is expensive to construct. It contains dynamically allocated nodes for each table entry, plus a dynamically resizable bucket array that is periodically reallocated as the table grows (Figure 10-7). It thus consumes a considerable amount of memory to achieve an improvement in search performance. Iterators are invalidated every time the bucket array is reallocated. However, references point to the nodes; they are invalidated only on deletion.

Hash tables like std::unordered_map have several parameters to adjust in pursuit of optimal performance. This is either a strength or a weakness, depending on the developer’s point of view.

The number of entries in the unordered map is its size. The computed ratio size / buckets is called the load factor. A load factor greater than 1.0 implies that some buckets have a chain of multiple entries, reducing lookup performance for these keys (in other words, the hash is not perfect). In real hash tables, collisions among the keys cause entries to appear in chains even when the load factor is less than 1.0. A load factor less than 1.0 implies that there are unused buckets consuming space in the unordered map’s backbone array (in other words, the hash is not minimal). When the load factor is less than 1.0, the value 1 – load factor is a lower bound on the number of empty buckets, but since hash functions can be imperfect, the amount of unused space is generally higher.

std::unordered_map is a hash table. A backbone of buckets points to a linked list of nodes containing the key and value

Figure 10-7. std::unordered_map possible implementation

The load factor is a dependent variable in std::unordered_map. A program can observe its value, but cannot directly set it or predict its value after a reallocation. When a new entry is inserted in the unordered map, if the load factor exceeds a maximum load factor that may be specified by the program, the bucket array is reallocated and all the entries are rehashed to buckets in the new array. Since the number of buckets is always increased by a factor greater than 1, the cost of insertion is amortized O(1). Insertion and lookup performance decline significantly when the maximum load factor is greater than 1.0, which is the default. Performance can be improved modestly by reducing the maximum load factor below 1.0.

The initial number of buckets in an unordered map can be specified as a constructor argument. The container will not be reallocated until its size exceeds buckets * load factor. The program can increase the number of buckets in an unordered map by calling its rehash() member function.rehash(size_t n) sets the bucket count to be at least n, reallocates the bucket array, and rebuilds the table, moving all entries to their appropriate buckets in the new array. If n is less than the current number of buckets, rehash() may or may not decrease the size of the table and rehash.

A call to reserve(size_t n) can reserve space sufficient to hold n entries before reallocation. This is equivalent to a call to rehash(ceil(n/max_load_factor())).

A call to the unordered map’s clear() member function erases all the entries and returns all storage to the memory manager. This is a stronger promise than the one provided for vector or string’s clear() member function.

Unlike other C++ standard library containers, std::unordered_map exposes the structure of its implementation by providing an interface for iterating through the buckets as well as for iterating through the entries in a single bucket. Inspecting the length of the chains in each bucket can help reveal any issues with the hash function. I used this interface for inspecting the quality of the hash functions, as seen in Example 10-3.

Example 10-3. Snooping on std::unordered_map behavior

template<typename T> void hash_stats(T const& table) {

unsigned zeros = 0;

unsigned ones = 0;

unsigned many = 0;

unsigned many_sigma = 0;

for (unsigned i = 0; i < table.bucket_count(); ++i) {

unsigned how_many_this_bucket = 0;

for (auto it = table.begin(i); it != table.end(i); ++it) {

how_many_this_bucket += 1;

}

switch(how_many_this_bucket) {

case 0:

zeros += 1;

break;

case 1:

ones += 1;

break;

default:

many += 1;

many_sigma += how_many_this_bucket;

break;

}

}

std::cout << "unordered_map with " << table.size()

<< " entries" << std::endl

<< " " << table.bucket_count() << " buckets"

<< " load factor " << table.load_factor()

<< ", max load factor "

<< table.max_load_factor() << std::endl;

if (ones > 0 && many > 0)

std::cout << " " << zeros << " empty buckets, "

<< ones << " buckets with one entry, "

<< many << " buckets with multiple entries, "

<< std::endl;

if (many > 0)

std::cout << " average length of multi-entry chain "

<< ((float) many_sigma) / many << std::endl;

}

I discovered, using a hash function from the Boost project, that 15% of my entries collided, and that the automatic allocation produced a table with a load factor of 0.38, meaning that 62% of the backbone array was unused. The hash functions I tried performed far worse than I expected.

Inserting and Deleting in std::unordered_map

Like std::map, std::unordered_map provides two insertion methods, with and without an insertion hint. But unlike a map, an unordered map does not use the insertion hint. It is provided for interface compatibility only, and it carries a slight performance penalty even though it accomplishes nothing.

The insertion test took 15.5 milliseconds. Insertion performance can be improved somewhat by preallocating sufficient buckets to prevent rehashing, through a call to reserve(). The improvement in performance was only 4% on my test, improving insertion time to 14.9 milliseconds.

Reallocation can be deferred by setting the maximum load factor very high. I tested to see whether performance improved when I did this and then rehashed after all items were inserted. This turned out to be a disastrous deoptimization for the version of unordered_map in Visual Studio 2010’s standard library. Apparently items are inserted onto the end of the collision chain in Visual Studio’s implementation, so the cost of each insertion went from constant to O(n). It is possible that a more sophisticated implementation would not have this weakness.

Iterating in std::unordered_map

Example 10-4 contains code to iterate through a std::unordered_map.

Example 10-4. Iterating through the entries in an unordered map

for (auto it = test_container.begin();

it != test_container.end();

++it) {

sum += it->second;

}

An unordered map cannot be sorted, and iterating through an unordered map produces items in an opaque order, as the container’s name implies. Iterating through the unordered map was relatively efficient at 0.34 milliseconds. This was only 2.8 times slower than the same traversal of astd:vector.

Lookup with std::unordered_map

Lookup is the raison d'être of std::unordered_map. I compared the lookup performance of a 100,000-entry key/value table based on an unordered map versus a sorted table based on a vector and searched with std::lower_bound().

std::unordered_map performed my lookup test of 100,000 queries in 10.4 milliseconds. As seen in Table 10-5, this is 3 times faster than std::map, and 1.7 times faster than for a sorted std::vector using std::lower_bound(). Yes, this is faster—much faster. But hash tables are held in such reverence that I expected a truly astonishing, order-of-magnitude performance increase. This is not what I saw.

Insert+sort

Lookup

Map

33.8 ms

42.3 ms

Vector

19.1 ms

28.9 ms

Unordered map

15.5 ms

10.4 ms

Table 10-5. Insertion and lookup time for vector, map, and unordered_map

Compared to std::map, the unordered map is somewhat faster to build and much faster to search. The downside of the unordered map is the amount of storage it uses. In a constrained environment, it may be necessary to use the more compact table based on std::vector. Otherwise, the unordered map is an unambiguous performance win.

Other Data Structures

The standard library containers, helpful as they are, are not the final word in data structures. The Boost library alone contains a number of data structures in imitation of the standard library containers. Boost provides the following libraries containing alternative containers:

boost::circular_buffer

Similar in many respects to std::deque, but more efficient.

Boost.Container

Variations on standard library containers such as a stable vector (a vector in which reallocation doesn’t invalidate iterators), a map/multimap/set/multiset implemented as a container adapter for std::vector, a static vector with variable length up to a fixed maximum, and a vector with optimized behavior when the vector has only a few elements.

dynamic_bitset

Looks like a vector of bits.

Fusion

Containers and iterators on tuples.

Boost Graph Library (BGL)

Algorithms and data structures for graph traversal.

boost.heap

A priority queue with better performance and more nuanced behavior than the simple std::priority_queue container adapter.

Boost.Intrusive

Provides intrusive containers (containers that rely on node types that explicitly contain links). The point of intrusive containers is to improve performance in very hot code. This library contains singly and doubly linked lists, associative containers, unordered associative containers, and a variety of explicit balanced tree implementations. make_shared, along with move semantics and the emplace() member functions added to most containers, reduces the need for intrusive containers.

boost.lockfree

Lock-free and wait-free queue and stack.

Boost.MultiIndex

Containers that have multiple indices with different behaviors.

And there are doubtless other container classes buried in Boost.

Another significant contribution comes from games company Electronic Arts, who open-sourced their version of the standard library container classes, called EASTL. Electronic Arts’s take on the standard library container classes includes:

· A simpler and more sensible Allocator definition

· Stronger guarantees on some containers, including a guarantee that containers don’t call into the memory manager until the program puts items into them

· More programmability of std::deque

· A bunch of containers similar to the ones provided by Boost

Summary

· Stepanov’s Standard Template Library was the first reusable library of efficient containers and algorithms.

· Big-O performance of container classes does not tell the full story. Some containers are many times faster than others.

· std::vector is the fastest container at insert, delete, iterate, and sort operations.

· Lookup using std::lower_bound in a sorted std::vector can be competitive with std::map.

· std::deque is only slightly faster than std::list.

· std::forward_list is no faster than std::list.

· The hash table std::unordered_map is faster than std::map, but not by the order-of-magnitude difference that its reputation suggests.

· The Internet is a rich source of containers imitating the standard library containers.