Optimize Memory Management - Optimized C++ (2016)

Optimized C++ (2016)

Chapter 13. Optimize Memory Management

Efficiency is doing better what is already being done.

Peter F. Drucker (1909–2005), American management consultant

The memory manager is the set of functions and data structures of the C++ runtime system that oversees allocation of memory to dynamic variables. The memory manager must meet many demands. Meeting these demands efficiently is an open research challenge. In many C++ programs, the memory manager’s functions are quite hot. If its performance could be improved, it would have a global effect on the program. For all these reasons, the memory manager is a natural target for optimization efforts.

In my opinion, there are other places to look for performance improvement first, that are more likely to be fruitful than fiddling with the memory manager. Memory managers, being very hot code, are usually well wrung out right out of the box. Memory management is at best an aspect of an aspect of overall program run time. Amdahl’s Law limits how much overall improvement the developer can obtain, even if the cost of memory management could be driven to zero. In large programs in one study, performance improvement from optimizing memory management ranged from negligible to about 30%.

The C++ memory manager is highly customizable, with a substantial API. Although many programmers never have to use this API, it provides many ways to optimize performance. Several high-performance memory managers exist that can be plugged into C++ by replacing the C functionsmalloc() and free(). In addition, the developer can substitute specialized memory managers for hot classes and standard library containers.

C++ Memory Management API Refresher

The C++ toolkit for managing dynamic variables was introduced in “C++ Dynamic Variable API Refresher”. This toolkit contains an interface to the memory manager consisting of new- and delete-expressions, memory management functions, and standard library allocator template classes.

The Life Cycle of Dynamic Variables

A dynamic variable goes through five distinct life stages. The most common variation of the new-expression performs the allocate and place stages. After the use stage, a delete-expression performs the destroy and free stages. C++ provides facilities to manage each of these stages separately:

Allocate

The program asks the memory manager to return a pointer to a contiguous region of memory containing at least a specified number of untyped memory bytes. Allocation can fail if memory is not available. The C library function malloc() and the various overloads of the C++operator new() function manage the allocate phase.

Place

The program establishes the initial value of the dynamic variable, placing the value into the allocated memory. If the variable is a class instance, one of its constructors is called. If the variable is a simple type, it is optionally initialized. Placement can fail if the constructor throws an exception, requiring allocated storage to be returned to the memory manager. New-expressions participates in this phase.

Use

The program reads values from the dynamic variable, calls member functions of the dynamic variable, and writes values to the dynamic variable.

Destroy

If the variable is a class instance, the program calls its destructor to perform a final operation on the dynamic variable. Destruction is an opportunity for the dynamic variable to return any system resources it holds, complete any cleanup, say any last words, and make ready to go into the good night. Destruction can fail if the destructor throws an exception that is not handled in the destructor body. If this happens, the program unconditionally terminates. Delete-expressions manage this phase. It is possible to destroy a variable without freeing its storage by calling its destructor explicitly.

Free

The program returns the storage formerly belonging to the destroyed dynamic variable to the memory manager. The C-library free() function and the various overloads of the C++ operator delete() function perform the free phase.

Memory Management Functions Allocate and Free Memory

C++ provides a collection of memory management functions, rather than C’s simple malloc() and free(). These functions provide the rich behavior of new-expressions described in “New-Expressions Construct Dynamic Variables”. Overloads of operator new() allocate storage for single instances of any type. Overloads of operator new[]() allocate storage for arrays of any type. When both array- and non-array functions operate in the same way, I describe them together as operator new(), with the understanding that there is an equivalent operator new[]().

operator new() implements allocation

A new-expression calls one of several versions of operator new() to obtain memory for a dynamic variable, or operator new[]() to obtain memory for a dynamic array. C++ provides default implementations of these operators. It also implicitly declares them, so they can be called by the program without having to include the <new> header. The program can override the defaults with its own implementation if desired.

operator new() is important to examine for optimization purposes because the default memory manager is expensive. In some situations, the program can allocate memory very efficiently by providing a specialized implementation.

C++ defines several overloads of operator new().

void* ::operator new(size_t)

By default, memory is allocated for all dynamically allocated variables by calling this overload of operator new(), with an argument specifying the minimum number of bytes to allocate. The standard library implementation of this overload throws a std::bad_alloc exception if there is not enough memory to fulfill the request.

The standard library implementations of all other operator new() overloads call this one. A program can globally change how memory is allocated by providing a definition of ::operator new(size_t) in any compilation unit.

Although it is not required by the C++ standard, the standard library version of this overload is usually implemented using a call to malloc().

void* ::operator new[](size_t)

Arrays are allocated with a call to this overload. The standard library implementation calls ::operator new(size_t).

void* ::operator new(size_t, const std::nothrow_tag&)

A new-expression like Foo* p = new(std::nothrow) Foo(123); calls the non-throwing overload of operator new(). If no memory is available, this version returns nullptr instead of throwing a std::bad_alloc exception. The standard library implementation callsoperator new(size_t) and catches any exceptions it may throw.

void* ::operator new[](size_t, const std::nothrow_tag&)

This is the array version of the non-throwing operator new().

A new-expression can invoke an operator new() with an arbitrary signature provided the first argument is of type size_t. All of these operator new() overloads are called placement operator new(). The new-expression matches the types of the arguments in the placement parameters with available operator new() function signatures to determine which function is used.

Two overloads of placement operator new() are provided by the standard library and implicitly declared. These do not allocate memory (the first life stage of a dynamic variable) but contain an extra argument that accepts a pointer to memory that the program has already allocated. They are:

void* ::operator new(size_t, void*)

This is placement operator new() for variables. It accepts a pointer to memory as its second argument and simply returns that pointer.

void* ::operator new[](size_t, void*)

This is the array version of placement operator new(). It accepts a pointer to memory as its second argument and returns that pointer.

These two placement operator new() overloads are called by the placement new-expression new(p) type, where p is a pointer to valid storage. The standard says these overloads can’t be replaced by developer code. If they were replaced, and the replacement did anything other than return its pointer argument, much of the standard library would stop working. This is important to know, because some C++ compilers don’t enforce the prohibition against replacing these overloads, which means they can be replaced—for instance, with code that prints out diagnostics.

Aside from the two placement operator new() overloads listed here, other placement operator new() overloads have no defined meaning in C++ and are available to the developer for any use.

operator delete() frees allocated memory

A delete-expression invokes operator delete() to return memory for a dynamic variable to the runtime system, and operator delete[]() to return memory for a dynamic array.

The new operators and delete operators work together to allocate and free memory. If a program defines an operator new() that allocates memory from a special memory pool or in a special way, it must define a matching operator delete() in the same scope to return memory to the pool from which it was allocated, or behavior is undefined.

Memory management functions from the C library

C++ provides the C-library functions malloc(), calloc(), and realloc() to allocate memory, and the free() function to return memory that is no longer needed. These functions are provided for compatibility with C programs.

void* malloc(size_t size) implements the allocate phase of a dynamic variable’s lifetime, returning a pointer to storage sufficient to hold size bytes, or nullptr if no storage is available.

void free(void* p) implements the free phase, returning the storage pointed to by p to the memory manager.

void* calloc(size_t count, size_t size) implements the allocate phase of a dynamic array. It performs a simple computation to convert a number count of array elements, each of size size bytes, to a number of bytes, and then returns the value of a call to malloc().

void* realloc(void* p, size_t size) changes the size of a block of memory, moving the block to new storage if needed. The contents of the old block are copied to the new block, up to the minimum of the new and old sizes. realloc() must be used very carefully. Sometimes it moves the pointed-to block and deletes the old block. If it does this, pointers to the old block become invalid. Sometimes it reuses the existing block, which happens to be bigger than the requested size.

According to the C++ standard, malloc() and free() act upon a region of memory called the “heap,” while overloads of operator new() and operator delete() operate on a region of memory called the “free store.” The careful language of the standard preserves the option for library developers to implement the two sets of functions differently. That said, the requirements for memory management are similar in the C and C++ worlds. It just isn’t sensible for a compiler to have two parallel but different implementations. In all standard library implementations of which I am aware, operator new() calls malloc() to actually allocate storage. A program can globally change how memory is managed by replacing malloc() and free().

New-Expressions Construct Dynamic Variables

A C++ program requests creation of a dynamic variable or array using a new-expression. A new-expression contains the keyword new followed by a type, a pointer to which is returned by the new-expression. The new-expression can also contain an initializer to set the initial value of the variable or each array element. A new-expression returns a typed pointer to a fully initialized C++ variable or array, rather than the simple void pointer to uninitialized storage returned by C++ operator new() or C’s memory management functions .

The syntax of the new-expression, in all its glory, looks like this:

::optional new (placement-params)optional (type) initializeroptional

or:

::optional new (placement-params)optional type initializeroptional

The difference between these two lines is in the optional parentheses around type, which are sometimes needed to help the compiler distinguish the end of the placement-params from the beginning of a complicated type, or the end of the type from the beginning of the initializer.cppreference, among other sources, has vastly more information on all the syntactic wrinkles of new-expressions.

When type declares an array, the highest (that is, leftmost)1 array dimension may be described by a non-constant expression, allowing the size of the array to be specified at run time. This is the only place in C++ where a declaration may have a variable size.

A new-expression returns an rvalue pointer to the dynamic variable or the first element of the dynamic array. (The fact that this pointer is an rvalue is important; see “Implement Move Semantics” for details.)

All versions of the new-expression do a lot more than just call an operator new() function to allocate storage. If the call to operator new() is successful, the non-array version constructs a type object. If the constructor throws an exception, its members and bases are destroyed, and the allocated memory is returned by a call to the operator delete() whose signature corresponds to the operator new() function used to allocate the memory. If no operator delete() matches, the memory is not returned to the memory manager, potentially causing a memory leak. The new-expression returns the pointer, rethrows the caught exception, or returns nullptr in the case of the non-throwing new-expression.

The array new-expression works in the same way, but with the added complexity that any of several constructors might throw an exception, requiring the array new-expression to destroy all successfully constructed instances and return the memory to the free store before returning or rethrowing the exception.

Why are there two kinds of new-expression, one for arrays and one for instances? The array new-expression may allocate space to store the number of array elements, as well as the space allocated for the array itself. That way, the array delete-expression need not provide this value. This extra overhead is not necessary for single instances. This C++ behavior was designed at a time when memory was far more precious than it is today.

Non-throwing new

If placement-params consists of the tag std::nothrow, the new-expression does not throw std::bad_alloc. Instead, it returns nullptr without attempting to construct an object.

Historically, there was a time when many C++ compilers did not properly implement exception handling. Code written for these old compilers, or ported from C, needed an allocation function that returned null when it ran out of memory.

Some industries—notably the aerospace and automotive industries—have imposed coding standards that prohibit the throwing of exceptions. This causes a problem because new-expressions are otherwise defined to throw on error; hence the need for a non-throwing new-expression.

There is received wisdom that exception handling is slow, and that therefore a non-throwing new-expression should be faster. However, modern C++ compilers implement an exception handling mechanism with very low runtime cost unless an exception is thrown, so the truth of this received wisdom may be compiler-dependent. See “Use No-Cost Exception Handling” for a more extensive discussion of the cost of exception handling.

Placement new performs placement without allocation

If placement-params is a pointer to existing valid storage, the new-expression does not call the memory manager, but simply places type at the location indicated by the pointer, which must point to sufficient storage to hold type. Placement new is used like this:

char mem[1000];

class Foo {...};

Foo* foo_p = new (mem) Foo(123);

This example places an instance of class Foo on top of the array mem. Placement new calls a class constructor to perform initialization for class instances. For a basic type, placement new performs initialization instead of construction.

Since placement new does not allocate storage, there is no corresponding placement delete. The instance of Foo placed on top of mem by placement new is not automatically destroyed when mem goes out of scope. It is up to the developer to properly destroy instances created with placement new by explicitly calling the class’s destructor. In fact, if an instance of Foo is placed into storage declared to be an instance of Bar, Bar’s destructor is called, with undefined and typically disastrous results. Placement new should thus be used on memory returned by operator new(), or on memory occupied by an array of char or some other basic type.

Placement new is used in the Allocator template parameter of standard library containers, which must place class instances on previously allocated but unused memory. See “Provide Custom Standard Library Allocators” for details.

Custom placement new, the half-formed hinterland of allocation

If placement-params is anything other than std::nothrow or a single pointer, the new-expression is called custom placement new. C++ does not impose any meaning on the custom placement new-expression. It is available for the developer’s use to allocate storage in an unspecified way. The custom placement new-expression searches for an overload of operator new() or operator new[]() whose first argument matches size_t, and whose subsequent arguments match the types of the expression list. If the constructor for the dynamic object throws an exception, the placement new-expression searches for an overload of operator delete() or operator delete[]() whose first parameter is void*, and whose subsequent arguments match the types of the expression list.

Placement new is useful if the program needs to establish more than one mechanism to create dynamic variables, or to pass arguments for memory manager diagnostics.

An issue with custom placement new is that there is no way to specify a matching “custom placement delete-expression.” Thus, while the various placement overloads of operator delete() are invoked when the object constructor throws an exception in the new-expression, delete-expressions cannot call these overloads. This creates a conundrum for the developer, since the standard says behavior is undefined if the operator delete() does not match the operator new() that allocated the dynamic variable. The matching placement operator delete() must be declared, because it is called in a new-expression if the object constructor throws an exception. There’s just no way to call it from a delete-expression. However, the standard committee is working to address this issue in a future version of the C++ standard.

The simplest solution is to note that if the fancy placement operator new() is compatible with an ordinary operator delete(), the behavior, while undefined, is predictably OK. Another solution is to note that delete-expressions aren’t so complicated or magical that they cannot be coded as free functions if needed.

Class-specific operator new() gives fine control of allocation

The new-expression looks up operator new() in the scope of the type being created. A class can thus provide implementations of these operators to get fine control over allocation for just that class. If a class does not define a class-specific operator new(), the global one is used. To use the global operator new() instead of a class-specific version, the programmer specifies the global scope operator :: in the new-expression, as in

Foo* foo_p = ::new Foo(123);

A class-specific operator new() is invoked only to allocate instances of the class that defines that function. Member functions of the class that contain new-expressions involving other classes use the operator new() defined for that other class, if any, or the global operator new()by default.

A class-specific operator new() can be efficient because it allocates objects of a single size. Thus, the first free block is always usable. If the class is not used in multiple threads, the class-specific operator new() can dispense with the overhead of making its internal data structures thread-safe.

The class-specific operator new() is defined to be a static member function. This makes sense, because operator new() allocates storage for every instance.

If a class implements a custom operator new(), it must implement a corresponding operator delete() or else the global operator delete() called, with undefined and usually unwanted results.

Delete-Expressions Dispose of Dynamic Variables

A program returns the memory used by a dynamic variable to the memory manager using a delete-expression. The delete-expression handles the last two phases of the dynamic variable life cycle: destroying the variable and freeing the memory it formerly occupied. The delete-expression contains the keyword delete followed by an expression that produces a pointer to the variable to be deleted. The syntax of the delete-expression looks like this:

::optional delete expression

or:

::optional delete [] expression

The first form of the delete-expression deletes a dynamic variable created with a new-expression. The second form deletes a dynamic array created with a new[]-expression. Separate delete-expressions exist for ordinary variables and for arrays because arrays may be created in a different manner from ordinary variables. Most implementations allocate extra storage for a count of the number of array elements allocated, so that the right number of destructors are called. Using the wrong version of the delete-expression for the dynamic variable results in the destructive chaos that the C++ standard calls “undefined behavior.”

Explicit Destructor Calls Destroy Dynamic Variables

It is possible to perform just the destruction of a dynamic variable without freeing its storage by calling the destructor explicitly instead of using a delete-expression. The destructor’s name is just the class name with a tilde (“~”) in front:

foo_p->~Foo();

Explicit destructor calls occur in the same places as placement new; in standard library Allocator templates where destruction and freeing happen separately.

There is no explicit constructor call... Or is there?

The C++ standard, section 13.1, says “Constructors have no name,” so the program cannot directly invoke a constructor. The constructor is invoked by means of a new-expression. The standard is fussy about constructors because the memory occupied by a class instance is uninitialized storage before the constructor, and an instance after the constructor. It’s hard to explain this magical transformation.

This isn’t much of a hardship. If a program desires to invoke a constructor explicitly on an already constructed class instance, the placement new syntax is simple enough:

class Blah {

public:

Blah() {...}

...

};

Blah* b = new char[sizeof(Blah)];

Blah myBlah;

...

new (b) Blah;

new (&myBlah) Blah;

Of course, the linker knows perfectly well that the name of Blah’s constructor is Blah::Blah(). In Visual C++, the statement

b->Blah::Blah();

compiles successfully, and invokes Blah’s constructor. This is a coding horror, making Optimized C++ one of the first C++ books in the Gothic tradition. The Linux C++ compiler GCC is a little more standard-compliant, offering an error message. The constructor is meant to be invoked via placement new.

High-Performance Memory Managers

By default, all requests for storage pass through ::operator new(), and released storage passes through ::operator delete(). These functions form C++’s default memory manager. The default C++ memory manager must meet many demands:

· It must perform efficiently, because it is likely to be hot.

· It must work correctly in multithreaded programs. Access to data structures in the default memory manager must be serialized.

· It must allocate many same-sized objects (like list nodes) efficiently.

· It must allocate many differently sized objects (like strings) efficiently.

· It must allocate very big data structures (I/O buffers, arrays of a million ints), as well as small data structures (like a single pointer).

· To be maximally efficient, it must be aware of alignment boundaries for pointers, cache lines, and virtual memory pages, at least for larger allocated memory blocks.

· Its runtime performance must not degrade over time.

· It must efficiently reuse memory returned to it.

Meeting the many demands of a C++ memory manager is an open and evolving challenge spawning ongoing academic research, with compiler vendors in uneven pursuit of the state of the art. In some situations, a memory manager may not need to meet all these demands. Both these facts provide opportunities for developers to optimize.

The version of ::operator new() shipped with most C++ compilers is a thin wrapper around the C function malloc(). In the early days of C++, these malloc() implementations were designed to meet the relatively simple needs of C programs for a few dynamic buffers, rather than the preceding long list of requirements of C++ programs. Replacing a simple, vendor-provided malloc() with a sophisticated memory manager was so successful an optimization that developers could build a reputation as wizards on the basis of this one simple spell.

A variety of more-or-less self-contained memory manager libraries have been written that claim a substantial performance advantage over the default memory manager. If a program heavily uses dynamic variables including strings and standard containers, it’s tempting to view thesemalloc() replacements as a silver bullet, improving performance everywhere for the price of a single linker change, and without all that tedious profiling. But while state-of-the-art memory managers have outstanding performance, there are reasons not to brag prematurely about the benefits of such changes:

· Although state-of-the-art memory managers have demonstrated significant performance improvements over a naïve malloc() implementation, the particular baseline competitor is often not clearly specified, and may be an unrealistic straw man. I have heard a rumor that both Windows and Linux have recently raised their game with state-of-the-art memory managers. Thus, for Linux since 3.7 and Windows since Windows 7, there may be little performance boost from changing memory managers.

· A faster memory manager will help performance only to the extent that allocating and freeing dynamic variables dominates the program’s run time. Even if a program allocates a zillion-node data structure, if this structure is long-lived, improving allocation has limited benefit due to Amdahl’s Law (see “Amdahl’s Law”). While the memory manager code itself may be 3 to 10 times faster than the default, studied whole-program performance improvements range from negligible improvement to 30% in several large open-source programs.

· Reducing the number of calls into the memory manager by means described in Chapter 6 provides a performance boost no matter how fast the allocator is, and can be targeted to hot code using the profiler.

· State-of-the-art memory managers may pay for improved performance by consuming significant amounts of memory for various caches and free-block pools. Constrained environments may not have this additional memory to spend.

For older operating systems and embedded development, here are some malloc() replacements generally regarded as high-performing:

Hoard

Hoard is a commercialized version of a multiprocessor memory allocator from the University of Texas. It claims 3–7× improvement over malloc(). Hoard requires a license for commercial use.

mtmalloc

mtmalloc is a malloc() replacement for highly multithreaded workloads in Solaris. It uses a fast-fit allocator.

ptmalloc (glibc malloc)

ptmalloc is the malloc() provided with Linux 3.7 and later. It has per-thread arenas to reduce contention in multithreaded programs.

TCMalloc (the Thread-Caching malloc())

TCMalloc (located in the gperftools package), is Google’s malloc() replacement. It has a specialized small-object allocator and carefully designed spinlocks for managing large blocks. According to the designers, it is better behaved than glibc’s malloc(). tcmalloc is only tested on Linux.

For small embedded projects, it is not an impossible task to implement your own memory manager. Searching the Web for “fast-fit memory allocation” provides a bunch of links to start from. I implemented a fast-fit memory manager for an embedded project with good results. That said, the design of fully general multithreaded memory managers is another topic that would fill a whole book by itself. The people who write memory managers are specialists. The more sophisticated the program and its operating environment are, the less likely it is that a roll-your-own solution will be performant and bug-free.

Provide Class-Specific Memory Managers

Even a state-of-the-art malloc() is exactly the kind of compromise that creates optimization opportunities. operator new() can also be overridden at the class level (see “Class-specific operator new() gives fine control of allocation”). When code that dynamically creates instances of a class is hot, a class-specific memory manager can improve performance.

If a class implements operator new(), this function is called instead of the global operator new() when requesting memory for instances of the class. A class-specific operator new() can take advantage of extra knowledge not available to the default version. All allocation requests for instances of a particular class request the same number of bytes. Memory managers for same-sized requests are particularly easy to write, and run efficiently for a variety of reasons:

· Fixed-block memory managers efficiently reuse returned memory. They do not experience fragmentation, because all requests are the same size.

· Fixed-block memory managers can be implemented with low or no memory overhead.

· Fixed-block memory managers can provide a guaranteed upper bound on the amount of memory consumed.

· The functions of a fixed-block memory manager that allocate and free memory are internally simple enough that they can efficiently be inlined. The functions of the default C++ memory manager cannot be inlined. They must be function calls so they can be replaced by a developer-defined override. The C memory management functions malloc() and free() must be ordinary functions for the same reason.

· Fixed-block memory managers have good cache behavior. The last node freed can be the next node allocated.

Many developers have never seen a class-specific memory manager. I suspect this is because they have several moving parts that have to be written and strung together, so the learning curve is steep. Even in a big program, only a few classes may be hot enough to benefit from this optimization. It’s not something that needs to be done many times in a given program.

Fixed-Size-Block Memory Manager

Example 13-1 defines a simple fixed-size-block memory manager that allocates blocks from a single, statically declared chunk of storage called the arena. Fixed-size-block memory managers of this kind are most frequently found in embedded projects, as an alternative to allocating from the free store. fixed_block_memory_manager is internally very simple: just a singly linked list of free memory blocks. This simple design will be used for several purposes in this chapter. It’s worth taking a look at it in detail.

Example 13-1. Fixed-size-block memory manager

template <class Arena> struct fixed_block_memory_manager {

template <int N>

fixed_block_memory_manager(char(&a)[N]);

fixed_block_memory_manager(fixed_block_memory_manager&)

= delete;

~fixed_block_memory_manager() = default;

void operator=(fixed_block_memory_manager&) = delete;

void* allocate(size_t);

size_t block_size() const;

size_t capacity() const;

void clear();

void deallocate(void*);

bool empty() const;

private:

struct free_block {

free_block* next;

};

free_block* free_ptr_;

size_t block_size_;

Arena arena_;

};

# include "block_mgr.inl"

The constructor, defined in Example 13-2, accepts a C-style array of char as its argument. This array forms the arena from which blocks of memory will be allocated. The constructor is a template function that allows the array size to be captured as a template parameter.

Example 13-2. fixed_block_memory_manager constructor definition

template <class Arena>

template <int N>

inline fixed_block_memory_manager<Arena>

::fixed_block_memory_manager(char(&a)[N]) :

arena_(a), free_ptr_(nullptr), block_size_(0) {

/* empty */

}

MODERN C++ CODING NOTE

To keep template class definitions tidy, it is possible to define template class member functions outside the template class definition. I keep my member function definitions in a file with the suffix .inl, for "inline definitions.” When the function definition appears outside the template class, however, a more verbose syntax is required to help the compiler connect the definition with the declaration in the template class body. The first line of the previous example, template <class Arena>, declares the template parameter of the class. The second line, template <int N>, applies to the constructor function itself, which is a template function. When the member function definition appears outside the template class body, the inline keyword must be explicitly provided, as inline linkage is assumed only when the definition appears inside the class.

The allocate() member function in Example 13-3 pops a block off the free list, if any blocks are available, and returns it. If the free list is empty, allocate() attempts to get a new list of free blocks from the arena manager, which I will describe in a moment. If the arena manager has no more memory to allocate, it returns nullptr and allocate() throws std::bad_alloc.

Example 13-3. fixed_block_memory_manager’s definition of allocate()

template <class Arena>

inline void* fixed_block_memory_manager<Arena>

::allocate(size_t size) {

if (empty()) {

free_ptr_ = reinterpret_cast<free_block*>

(arena_.allocate(size));

block_size_ = size;

if (empty())

throw std::bad_alloc();

}

if (size != block_size_)

throw std::bad_alloc();

auto p = free_ptr_;

free_ptr_ = free_ptr_->next;

return p;

}

The deallocate() member function is very simple. It pushes a block onto the free list:

template <class Arena>

inline void fixed_block_memory_manager<Arena>

::deallocate(void* p) {

if (p == nullptr)

return;

auto fp = reinterpret_cast<free_block*>(p);

fp->next = free_ptr_;

free_ptr_ = fp;

}

Here are the remaining member function definitions. Copying and assignment of the memory manager are disabled in the class definition using the C++11 syntax:

template <class Arena>

inline size_t fixed_block_memory_manager<Arena>

::capacity() const {

return arena_.capacity();

}

template <class Arena>

inline void fixed_block_memory_manager<Arena>::clear() {

free_ptr_ = nullptr;

arena_.clear();

}

Block Arena

The only complexity in fixed_block_memory_manager arises from how the initial free list is created. This complexity is factored into a separate template class. The implementation presented here is called fixed_arena_controller, and is defined in Example 13-4. As used here,arena means an enclosed space in which some activity takes place. block_arena is a fixed pool of memory that can be allocated by block_manager.

Example 13-4. Block arena for fixed-block memory manager

struct fixed_arena_controller {

template <int N>

fixed_arena_controller(char(&a)[N]);

fixed_arena_controller(fixed_arena_controller&) = delete;

~fixed_arena_controller() = default;

void operator=(fixed_arena_controller&) = delete;

void* allocate(size_t);

size_t block_size() const;

size_t capacity() const;

void clear();

bool empty() const;

private:

void* arena_;

size_t arena_size_;

size_t block_size_;

};

The purpose of the class fixed_arena_controller is to create a list of memory blocks. All the memory blocks are the same size. The size is set the first time allocate() is called. Each memory block must be big enough to hold the requested number of bytes, but it must also be big enough to hold a pointer that is used when the block is on the free list.

The constructor template function accepts the arena array from fixed_block_memory_manager, saving the size of the array and a pointer to its start:

template <int N>

inline fixed_arena_controller

::fixed_arena_controller(char (&a)[N]) :

arena_(a), arena_size_(N), block_size_(0) { /*empty*/

}

The allocate() member function is where the action takes place. It is called by the allocate() member function of fixed_block_memory_manager when the free list is empty, which happens when the first allocation request occurs.

fixed_arena_controller has a single block of memory to allocate. If that block is used up, allocate() is called again and must return an error indication, which in this case is nullptr. Different kinds of arena controllers might break up big blocks of memory obtained, for instance, by a call to ::operator new(). For a different arena controller, calling allocate() more than once might be OK.

The first time it is called, allocate() sets the block size and capacity in blocks. Actually creating the free list is an exercise in reinterpreting untyped memory bytes into typed pointers. The char array is interpreted as a set of blocks laid end-to-end. The first bytes of each block are a pointer to the next block. The pointer in the last block is set to nullptr.

fixed_arena_controller has no control over the size of the arena array. There may be a few unused bytes at the end that are never allocated. The code for setting the free-block pointers is not pretty. It has to continually reinterpret one kind of pointer as another, stepping outside of C++’s type system into the realm of implementation-defined behavior. This is true of memory managers in general, and is unavoidable.

The allocation and deallocation code for fixed_arena_controller is simple, overlaying a list of free nodes over the storage provided to the constructor and returning a pointer to the first element of the list. The code looks like this:

inline void* fixed_arena_controller

::allocate(size_t size) {

if (!empty())

return nullptr; // arena already allocated

block_size_ = std::max(size, sizeof(void*));

size_t count = capacity();

if (count == 0)

return nullptr; // arena not big enough for even one item

char* p;

for (p = (char*)arena_; count > 1; --count, p += size) {

*reinterpret_cast<char**>(p) = p + size;

}

*reinterpret_cast<char**>(p) = nullptr;

return arena_;

}

Here is the rest of fixed_arena_controller:

inline size_t fixed_arena_controller::block_size() const {

return block_size_;

}

inline size_t fixed_arena_controller::capacity() const {

return block_size_ ? (arena_size_ / block_size_) : 0;

}

inline void fixed_arena_controller::clear() {

block_size_ = 0;

}

inline bool fixed_arena_controller::empty() const {

return block_size_ == 0;

}

Adding a Class-Specific operator new()

Example 13-5 is a very simple class with a class-specific operator new() and operator delete(). It also contains the static member mgr_, which is the fixed-block memory manager described in “Fixed-Size-Block Memory Manager”. operator new() and operator delete()are inline functions that forward requests to the allocate() and deallocate() member functions of mgr_.

Example 13-5. Class with class-specific operator new

class MemMgrTester {

int contents_;

public:

MemMgrTester(int c) : contents_(c) {}

static void* operator new(size_t s) {

return mgr_.allocate(s);

}

static void operator delete(void* p) {

mgr_.deallocate(p);

}

static fixed_block_memory_manager<fixed_arena_controller> mgr_;

};

mgr_ is declared public so I could reinitialize the free list to facilitate writing performance tests via a call to mrg_.clear(). If mgr_ is initialized once at program startup and never needs to be reinitialized, it could as well be a private member.

A memory manager that can be reset like this is called a pool memory manager, and the arena it controls is called a memory pool. A pool memory manager can be useful in cases where a data structure is constructed, used, and then destroyed. If the entire memory pool can be reinitialized quickly, the program can dispense with freeing the data structure node-by-node.

mgr_ is declared as a static member of the class BlockTester. Somewhere in the program, static members must also be defined. The definition looks like Example 13-6. This code defines a memory arena and then mgr_, whose constructor takes the arena as an argument.

Example 13-6. Initializing the memory manager

char arena[4004];

fixed_block_memory_manager<fixed_arena_controller>

MemMgrTester::mgr_(arena);

The example code just shown does not define a class-specific operator new[]() to allocate storage for arrays. The fixed-block memory manager would not work for arrays, which by definition may have different numbers of elements. If the program tries to allocate an array ofMemMgrTester, the new-expression uses the global operator new[]() because a class-specific one is not defined. Individual instances are allocated with the fixed-block memory manager, and arrays use malloc().

Performance of the Fixed-Block Memory Manager

The fixed-block memory manager is very efficient. The allocate and free methods have a fixed cost, and the code can be inlined. But how much faster are they than malloc()?

I performed two experiments to test the performance of a class-specific operator new(). In the first test, I allocated a million BlockTester instances. Allocating with the class-specific operator new() and my fixed-block memory manager took 4 milliseconds. Allocating with the global operator new(), which uses malloc(), took 64 milliseconds. The fixed-block memory manager was 15 times faster than malloc() in the test. This result probably overstates the performance improvement that might be achieved in an actual program, though. Amdahl’s Law implies that the more computation occurs in between allocations, the smaller the performance gain from speeding up allocation will be.

In the second experiment, I created an array of 100 pointers to BlockTest. I created a million BlockTester instances, assigning each to an array position at random and deleting any instance already there. Using the fixed-block memory manager, the test took 25 milliseconds. It took 107 milliseconds using the default global memory manager. The fixed-block memory manager was 3.3 times faster.

Variations on the Fixed-Block Memory Manager

The basic structure of the fixed-block memory manager is extremely simple. Variations (any of which you may find if you spend time searching for memory managers on the Internet) can be tried to see if any of them better suit the program being optimized:

· Instead of starting with a fixed arena, memory can be allocated using malloc() when the free list is empty. Freed memory blocks are cached on the free list for rapid reuse.

· The arena can be created by a call to malloc() or ::new instead of being fixed. If necessary, a chain of arena blocks can be maintained so that there is no limit on how many small blocks may be allocated. The fixed-block memory manager still retains its advantages of speed and compactness even if it occasionally calls malloc().

· If instances of the class are used for a while and then all discarded, the fixed-block memory manager can be used as a memory pool. In a memory pool, allocation proceeds as normal, but memory is not freed at all. When the program is done with instances of the class, they are all collected at once by reinitializing the static arena, or returning the dynamically allocated arena to the system memory manager. Even if they are all reclaimed at once, allocated blocks must still be deleted by a call to the destructor. Many pool allocators on the Internet forget this tiny but important detail.

A general memory manager can be designed that serves each different request size from a different arena, and returns each different request size to a different free list. If all request sizes are rounded to the next-higher power of two, the result is a “fast-fit” memory manager. Typically, the fast-fit memory manager allocates only objects up to a particular maximum size, sending larger requests to the default memory manager. The code for a fast-fit allocator is too big to reproduce in this book, but it’s available on the Web.

Boost has a fixed-block memory manager, called Pool.

Non-Thread Safe Memory Managers Are Efficient

The fixed-block memory manager owes some of its efficiency to not being thread safe. Non-thread safe memory managers are efficient for two reasons. First, they don’t require synchronization mechanisms to serialize critical regions. Synchronization is expensive because a very slow memory fence (see “Atomicity by mutual exclusion” and “Memory fences”) is at the heart of every synchronization primitive. Even if only one thread calls the memory manager (which is typical), these expensive instructions slow things down.

Second, non-thread safe memory managers are efficient because they never pend on synchronization primitives. When a program has several threads calling into the memory manager, the threads experience contention for the memory manager as a resource. The more threads there are in the system, the greater the contention is, and the more access to the allocator serializes the activity of the threads.

When a class implements a class-specific memory manager, even if the program as a whole is multithreaded, if a given class is only used in one thread it never has to wait. By contrast, calls into the default memory manager experience contention in multithreaded programs even for objects only used in one thread.

Non-thread safe memory managers are also far easier to write than thread-safe ones, since minimizing the critical section so the memory manager runs efficiently can be complex.

Provide Custom Standard Library Allocators

The container classes of the C++ standard library make heavy use of dynamic memory. They are a natural place to look for optimization opportunities, including custom memory managers like the ones described in “Fixed-Size-Block Memory Manager”.

But there is a problem. The variables dynamically allocated in a std::list<T> are not of the user-provided type T. They are of some invisible type, like listitem<T>, that contains the previous and next list item pointers as well as the payload type T. The dynamically allocated variables in a std::map<K,V> are of another hidden type, like treenode<std::pair<const K, V>>. These class templates are buried in compiler-provided header files. It isn’t possible2 to modify these classes to add a class-specific operator new() and operator delete(). And besides, templates are generic. The developer only wants to change the memory manager for certain instances of the generic template in a specific program, not for all instances in all programs. Fortunately, the C++ standard provides a mechanism to define the memory manager used by each container. Standard library containers take an Allocator argument that allows the same ability to customize memory management as does a class-specific operator new().

Allocator is a template class that manages memory. At its root, an allocator does three things: it retrieves storage from a memory manager, returns storage to a memory manager, and copy-constructs itself from a related allocator. That ought to be simple, but it is not. Allocators have a long, painful history, covered in “Additional Definitions for C++98 Allocator”. Allocators are regarded by some influential developers as among the most broken parts of C++. Still, if the code is hot enough, and the container is one of the more tractable node-based containers (std::list,std::forward_list, std::map, std::multimap, std::set, or std::multiset), a performance improvement may be available through implementing a custom allocator.

Allocator implementations range from the simple to the mind-numbingly complex. The default allocator, std::allocator<T>, is a thin wrapper around ::operator new(). The developer can provide a nondefault allocator with different behavior.

There are two basic kinds of allocator. The simplest kind is stateless—that is, an allocator type with no non-static state. std::allocator<T>, the default allocator for standard library containers, is stateless. Stateless allocators have desirable properties:

· A stateless allocator can be default-constructed. There is no need to create an explicit instance of a stateless allocator to pass into the constructor of the container class. std::list<myClass, myAlloc> my_list; constructs a list of myClass instances allocated with the stateless allocator myAlloc.

· A stateless allocator does not take up any space in container instances. Most standard library container classes inherit from their allocator, taking advantage of the empty base class optimization to produce a zero-byte base class.

· Two instances of a stateless allocator my_allocator<T> are indistinguishable. That means an object allocated by one stateless allocator can be freed by another. This in turn makes operations like std::list’s splice() member function possible. It is sometimes, but not always also the case that two stateless allocators for different types, like AllocX<T> and AllocX<U>, can be equal. This is true, for instance, of std::allocator.

Equality also means that move assignment and std::swap() can happen efficiently. If the two allocators are not equal, the contents of one container must be deep-copied, using the allocator of the target container.

Note that while equality may happen to be true even of instances of completely unrelated allocator types, like AllocX<T> and AllocY<U>, this property is not valuable. The type of the container includes the type of the allocator. You can’t splice a std::list<T,AllocX> to astd::list<T,AllocY> any more than you can splice a std::list<int> to a std::list<string>.

Of course, the primary disadvantage of a stateless allocator is the same as its primary advantage. All instances of a stateless allocator by their nature get memory from the same memory manager. It’s a global resource, a dependency on a global variable.

An allocator with internal state is more complex to create and to use, for the following reasons:

· In most cases, an allocator with local state can’t be default-constructed. The allocator must be constructed, then passed to the container’s constructor:

· char arena[10000];

· MyAlloc<Foo> alloc(arena);

std::list<Foo, MyAlloc<Foo>> foolist(alloc);

· The allocator state must be stored in every variable, increasing its size. This is very painful for containers that create a lot of nodes, like std::list and std::map, but these are exactly the containers programmers most desire to customize.

· Two allocators of the same type may not compare equal because they have differing internal state, making some operations on containers using that allocator type impossible and others painfully inefficient.

Allocators with state have the critical advantage, however, that it is easier to create multiple kinds of memory arenas for different purposes when all requests don’t have to go through a single global memory manager.

For a developer writing custom allocators to improve performance, the choice between allocators with or without local state comes down to how many classes the developer intends to fiddle with. If just one class is hot enough to optimize, a stateless allocator is simpler. If the developer wants to optimize more than one or two classes, then allocators with local state will be more flexible. However, such a developer may have trouble pointing to a profiling run to prove the necessity. Writing custom allocators for many containers may not optimize return on investment of the developer’s time.

Minimal C++11 Allocator

If the developer is lucky enough to have a compiler and standard library that fully conform to C++11, he can provide a minimal allocator that requires only a few definitions. Example 13-7 is one that does approximately what std::allocator does.

Example 13-7. Minimal C++11 allocator

template <typename T> struct my_allocator {

using value_type = T;

my_allocator() = default;

template <class U> my_allocator(const my_allocator<U>&) {}

T* allocate(std::size_t n, void const* = 0) {

return reinterpret_cast<T*>(::operator new(n*sizeof(T)));

}

void deallocate(T* ptr, size_t) {

::operator delete(ptr);

}

};

template <typename T, typename U>

inline bool operator==(const my_allocator<T>&,

const my_allocator<U>&) {

return true;

}

template <typename T, typename U>

inline bool operator!=(const my_allocator<T>& a,

const my_allocator<U>& b) {

return !(a == b);

}

The minimal allocator contains these few functions:

allocator()

This is the default constructor. If the allocator has a default constructor, the developer does not need to explicitly create an instance to pass into the container’s constructor. The default constructor is typically empty in stateless constructors, and is typically not present at all in allocators with non-static state.

template <typename U> allocator(U&)

This copy constructor makes it possible to convert an allocator<T> into a related allocator for a private class such as allocator<treenode<T>>. This is important because in most containers, no nodes of type T are ever allocated.

The copy constructor is typically empty in stateless allocators, but must copy or clone the state in allocators with non-static state.

T* allocate(size_type n, const void* hint = 0)

This function allocates storage sufficient to hold n bytes and returns a pointer to it, or throws std::bad_alloc. hint is meant to help the allocator in an unspecified way related to “locality.” I’ve never seen an implementation that used hint.

void deallocate(T* p, size_t n)

This function releases storage previously returned by allocate(), pointed to by p and occupying n bytes, to the memory manager. n must be the same as the argument to allocate() that allocated the storage pointed to by p.

bool operator==(allocator const& a) const
bool operator!=(allocator const& a) const

These functions test two allocator instances of the same type for equality. If the instances compare equal, then objects allocated from one instance may safely be freed on another. That means the two instances allocate objects from the same storage area.

The implication of equality is large. It means, for instance, that std::list items can be spliced from one list to another if and only if the two lists have the same type of allocator and the two allocator instances compare equal. The allocator type is part of the container instance type, so if the allocators are different types, it doesn’t matter if they secretly share the same storage.

Stateless allocators return true unconditionally when tested for equality. Allocators with non-static state must compare their state to determine equality, or simply return false.

Additional Definitions for C++98 Allocator

C++11 takes significant pains to make allocators easier to write, at the cost of making container classes more complex. The developer who must write an allocator for a pre-C++11 standard library container will find out why.

Allocators weren’t originally meant for (or at least not only meant for) managing memory. The allocator concept evolved in the 1980s, when microprocessors and developers were straining to break out of the confines of a 16-bit address space. PCs of the day formed an address from a segment register plus an offset. Each program was compiled for a memory model, which described the default way in which pointers worked. There were a bunch of memory models. Some were efficient, but limited the amount of memory a program or its data could occupy. Other memory models allowed more memory, but were slower. The C compilers of the day were extended with additional type modifiers so that individual pointers could be declared near or far, based on how much memory the developer wanted to access with them.

Allocators as originally imagined were meant to unmuddle this memory model mess. But by the time allocators arrived in C++, hardware makers had already heard the collective screaming of thousands of C developers, and implemented a uniform memory model with no segment registers. Furthermore, the allocator solution was unworkably inefficient using the compilers of the day.

Prior to C++11, every allocator contained all the functions in the minimal allocator, plus all of the following:

value_type

The type of the object to be allocated.

size_type

An integral type big enough to hold the maximum number of bytes that can be allocated by this allocator.

For allocators used as parameters to standard library container templates, this definition must be typedef size_t size_type;.

difference_type

An integral type big enough to hold the maximum difference between two pointers.

For allocators used as parameters to standard library container templates, this definition must be typedef ptrdiff_t difference_type;.

pointer
const_pointer

The type of a pointer to (const) T.

For allocators used as parameters to standard library container templates, these definitions must be:

typedef T* pointer;
typedef T const* const_pointer;

For other allocators, pointer might be a pointer-like class that implements operator*() to dereference the pointer.

reference
const_reference

The type of a reference to (const) T.

For allocators used as parameters to standard library container templates, the definitions must be:

typedef T& reference;
typedef T const& const_reference;

pointer address(reference)
const_pointer address(const_reference)

Functions that produce a pointer to (const) T, given a reference to (const) T.

For allocators used as parameters to standard library container templates, these definitions are required to be:

pointer address(reference r) { return &r; }
const_pointer address(const_reference r) { return &r; }

These functions were meant to abstract memory models. Unfortunately, they have been hamstrung for compatibility with the standard library containers, which need pointer to actually be a T* so like magic line random-access iterators and binary search work efficiently.

Even though these definitions have fixed values for allocators used by standard library containers, the definitions are required, as the container code in C++98 uses them. For instance:

typedef size_type allocator::size_type;

Some developers derive their allocator templates from std::allocator, to get these definitions without writing them. But this practice is controversial. std::allocator might change someday, after all. It changed a lot in the early years, and it changed again for C++11, so these fears are well founded. Another approach is to simply factor out the most unchanging of these definitions, as in

template <typename T> struct std_allocator_defs {

typedef T value_type;

typedef T* pointer;

typedef const T* const_pointer;

typedef T& reference;

typedef const T& const_reference;

typedef size_t size_type;

typedef ptrdiff_t difference_type;

pointer address(reference r) { return &r; }

const_pointer address(const_reference r) { return &r; }

};

Carried to its logical conclusion, these definitions can be made into a traits class, which is what some of the more complex allocator templates on the Web do. This is also what the C++11 minimal allocator does, only the traits class works backward from normal. The traits class looks at the allocator template to see if it has any of these definitions, and provides a standard definition if the allocator does not. Then, the container code references the allocator_traits class, not the allocator, as in

typedef std::allocator_traits<MyAllocator<T>>::value_type value_type;

With the boilerplate out of the way, it’s time to look at the important definitions (remember, the definitions for the minimal allocator discussed in “Minimal C++11 Allocator” are also included):

void construct(pointer p, const T& val)

This function copy-constructs an instance of T using placement new:

new(p) T(val);

For C++11, this function can be defined so that an argument list can be given to the T constructor:

template <typename U, typename... Args>

void construct(U* p, Args&&... args) {

new(p) T(std::forward<Args>(args...));

}

void destroy(pointer p);

This function destroys a pointer to T, calling p->~T();.

rebind::value

The declaration of struct rebind is at the very heart of allocators. It typically looks like this:

template <typename U> struct rebind {

typedef allocator<U> value;

};

rebind gives a formula for creating an allocator for a new type U, given an allocator<T>. Every allocator must provide this formula. It’s how a container like std::list<T> allocates instances of std::list<T>::listnode<T>. It’s important that in most containers, no nodes of type T are ever allocated.

Example 13-8 is a full C++98-style allocator equivalent to the C++11 minimal allocator in Example 13-7.

Example 13-8. C++98 allocator

template <typename T> struct my_allocator_98 :

public std_allocator_defs<T> {

template <typename U> struct rebind {

typedef my_allocator_98<U, n> other;

};

my_allocator_98() {/*empty*/}

my_allocator_98(my_allocator_98 const&) {/*empty*/}

void construct(pointer p, const T& t) {

new(p) T(t);

}

void destroy(pointer p) {

p->~T();

}

size_type max_size() const {

return block_o_memory::blocksize;

}

pointer allocate(

size_type n,

typename std::allocator<void>::const_pointer = 0) {

return reinterpret_cast<T*>(::operator new(n*sizeof(T)));

}

void deallocate(pointer p, size_type) {

::operator delete(ptr);

}

};

template <typename T, typename U>

inline bool operator==(const my_allocator_98<T>&,

const my_allocator_98<U>&) {

return true;

}

template <typename T, typename U>

inline bool operator!=(const my_allocator_98<T>& a,

const my_allocator_98<U>& b) {

return !(a == b);

}

When examining allocator code on the Internet, developers will encounter a variety of different spellings of the types. An extremely cautious and conformant developer will write the signature of the allocate() function as

pointer allocate(

size_type n,

typename std::allocator<void>::const_pointer = 0);

while a less cautious developer may write the same signature for an allocator strictly for use with standard library containers as

T* allocate(size_t n, void const* = 0);

The first signature is the most technically standard-conformant, but the second signature will compile successfully and has the advantage of brevity. This is the way things work in the wild and woolly world of templates.

Another issue with allocator code on the Internet is that allocate() must throw std::bad_alloc if the request cannot be fulfilled. So, for instance, the following code which calls malloc() to allocate memory is not standard-conforming because malloc() can return nullptr:

pointer allocate(

size_type n,

typename std::allocator<void>::const_pointer = 0) {

return reinterpret_cast<T*>(malloc(n*sizeof(T)));

}

A Fixed-Block Allocator

The standard library container classes std::list, std::map, std::multimap, std::set, and std::multiset all create a data structure from many identical nodes. Such classes can take advantage of simple allocators implemented using the fixed-block memory manager described in“Fixed-Size-Block Memory Manager”. The partial definition in Example 13-9 shows the two functions allocate() and deallocate(). The other definitions are identical to those in the standard allocator shown in Example 13-8.

Example 13-9. Fixed-block allocator

extern fixed_block_memory_manager<fixed_arena_controller>

list_memory_manager;

template <typename T> class StatelessListAllocator {

public:

...

pointer allocate(

size_type count,

typename std::allocator<void>::const_pointer = nullptr) {

return reinterpret_cast<pointer>

(list_memory_manager.allocate(count * sizeof(T)));

}

void deallocate(pointer p, size_type) {

string_memory_manager.deallocate(p);

}

};

As previously mentioned, std::list never attempts to allocate nodes of type T. Instead, std::list uses Allocator template parameter to construct a listnode<T>, by calling list_memory_manager.allocate(sizeof(<listnode<T>>)).

The list allocator required a change to the memory manager previously defined. The implementation of std::list that comes with Microsoft Visual C++ 2015 allocates a special sentinel node that is a different size from the other list nodes. It happens to be smaller than a regular list node, so it was possible to make a small change to the fixed-block memory manager that allowed it to work. The modified version is shown in Example 13-10. The change is that instead of testing that the current request size is equal to the saved block size, allocate() only tests that it is not greater than the saved block size.

Example 13-10. Modified allocate() function

template <class Arena>

inline void* fixed_block_memory_manager<Arena>

::allocate(size_t size) {

if (empty()) {

free_ptr_ = reinterpret_cast<free_block*>

(arena_.allocate(size));

block_size_ = size;

if (empty())

throw std::bad_alloc();

}

if (size > block_size_)

throw std::bad_alloc();

auto p = free_ptr_;

free_ptr_ = free_ptr_->next;

return p;

}

Performance of the fixed-block allocator

I wrote a program to test the performance of the fixed-block allocator. The program is a loop that repeatedly creates a list of 1,000 integers, then deletes the list. Using the default allocator, it took 76.2 microseconds. The program created and destroyed a list using the block allocator in 11.6 microseconds, about 5.6 times faster. This is an impressive performance improvement, but it must be regarded with some suspicion. Only the creation and destruction of the list benefit from this optimization. Performance gains in a program that also processes the list will be far more modest.

I also built a map of 1,000 integer keys. Creating and destroying the map using the default allocator took 142 microseconds, versus 67.4 microseconds with the block allocator. This more modest 110% improvement shows the effect that additional activity in the program (in this case, rebalancing the tree in which the map is stored) has on the performance improvement achieved by the block allocator.

A Fixed-Block Allocator for Strings

std::string stores its contents in a dynamic array of char. Because the array is reallocated to a larger size as the string grows, it doesn’t seem to be a candidate for the simple fixed-block allocator in the previous section. But sometimes even this limitation can be overcome. A developer who knows the maximum size a string can take in the program can create an allocator that always produces fixed-size blocks of the maximum size. This situation is surprisingly common, because the number of applications for million-character strings is limited.

Example 13-11 is a partial listing of the fixed-block allocator for strings.

Example 13-11. Fixed-block allocator for strings

template <typename T> class NewAllocator {

public:

...

pointer allocate(

size_type /*count*/,

typename std::allocator<void>::const_pointer = nullptr) {

return reinterpret_cast<pointer>

(string_memory_manager.allocate(512));

}

void deallocate(pointer p, size_type) {

::operator delete(p);

}

};

The important feature of this allocator is that allocate() completely ignores the size request, and returns a fixed-size block.

Performance of the string allocator

I tested the string allocator in a version of remove_ctrl() from Example 4-1 in Chapter 4. This function used std::string inefficiently, creating many temporary strings. Example 13-12 is the modified function.

Example 13-12. Version of remove_ctrl() using the fixed-block string allocator

typedef std::basic_string<

char,

std::char_traits<char>,

StatelessStringAllocator<char>> fixed_block_string;

fixed_block_string remove_ctrl_fixed_block(std::string s) {

fixed_block_string result;

for (size_t i = 0; i<s.length(); ++i) {

if (s[i] >= 0x20)

result = result + s[i];

}

return result;

}

An experiment to time the original remove_ctrl() ran in 2,693 milliseconds. The improved version in Example 13-12 performed the same test in 1,124 milliseconds, about 1.4 times faster. This performance improvement is significant, but as we saw in Chapter 4, other optimizations performed even better.

Writing a custom memory manager or allocator can be effective, but it has less benefit than optimizations than remove calls to the memory manager altogether.

Summary

· There may be more fruitful places to look for performance improvement than the memory manager.

· Whole-program performance improvements from replacing the default memory manager ranged from negligible to 30% in several large open-source programs.

· Memory managers for same-sized requests are particularly easy to write, and run efficiently.

· All allocation requests for instances of a particular class request the same number of bytes.

· operator new() can be overridden at the class level.

· The standard library container classes std::list, std::map, std::multimap, std::set, and std::multiset all create a data structure from many identical nodes.

· Standard library containers take an Allocator argument that allows the same ability to customize memory management as does a class-specific operator new().

· Writing a custom memory manager or allocator can be effective, but it has less benefit than optimizations that remove calls to the memory manager altogether.

1 In C++, an n-dimensional array is an array of n–1-dimensional arrays. Thus, the leftmost dimension is the highest dimension.

2 Oh, you did not just think, “Oh sure you can. You just go into /usr/include and ...” That is so unclean! Brrrrr.