Optimized C++ (2016)
Chapter 8. Use Better Libraries
A great library is one nobody notices because it is always there, and always has what people need.
Vicki Myron, author of Dewey, the Small Town Library Cat, and librarian of the town of Spencer, Iowa
Libraries are an area of focus in the optimization process. Libraries provide the primitives from which programs are built up. Library functions and classes are often used at the bottom of nested loops, and are thus often hot. Libraries provided by the compiler or operating system command attention to see that they are used efficiently. Project-specific libraries deserve careful design to ensure that they can be used efficiently.
This chapter discusses issues in the use of the C++ standard library, then examines issues in the design of custom libraries that bear on optimization.
Most of Optimized C++ is about tweaking functions to improve performance. This chapter offers instead advice for designers who need to achieve high performance in their initial design, based on my personal experience. Although I introduce this discussion in the context of library design, this section is also a checklist of how good C++ design techniques contribute to good performance.
Optimize Standard Library Use
C++ provides a compact standard library for the following general uses:
· Determining implementation-dependent behavior, like the largest and smallest values of each numeric type.
· Functions that might not best be written in C++, like strcpy() and memmove().
· Easy to use but fussy to write and validate portable transcendental math functions like sine and cosine, log and power, random numbers, and so on.
· Portable, general data structures like strings, lists, and tables that do not depend on the operating system except for memory allocation.
· Portable, general algorithms for searching, sorting, and transforming data.
· Functions that connect to basic services of the operating system in an operating system–independent way, to perform tasks like memory allocation, threading, timekeeping, and stream I/O. This includes a library of functions inherited from the C programming language for compatibility.
Much of the C++ standard library consists of template classes and functions that produce extremely efficient code.
Philosophy of the C++ Standard Library
In keeping with its mission as a systems programming language, C++ provides a somewhat Spartan standard library. The standard library is meant to be simple, general, and fast. Philosophically, functions and classes enter the C++ standard library either because they cannot be provided any other way, or because they promote very wide reuse across multiple operating systems.
Advantages of the C++ approach include that C++ programs can run on hardware that provides no operating system at all, and that programmers can choose a specialist library tailored to the features of a particular operating system when that is appropriate, or use a cross-platform library when platform independence is the goal.
By contrast, some programming languages, including C# and Java, provide extensive standard libraries that include windowing user interface frameworks, web servers, socket networking, and other large subsystems. An advantage of the all-inclusive approach is that developers need only learn one set of libraries to be effective on any supported platform. But such libraries impose many requirements on the operating systems on which they can be made to run (sometimes deliberately). The libraries provided with a programming language also come to represent a least common denominator, less powerful than the native windowing system or networking capabilities of any specific operating system. They can thus be somewhat limiting to programmers accustomed to the native capabilities of a specific OS.
Issues in Use of the C++ Standard Library
While the following discussion is tailored to the C++ standard library, it applies equally to the standard Linux libraries, the POSIX library, or any other widely used cross-platform library. Usage issues include the following:
Standard library implementations have bugs
Although bugs are an inevitable part of software development, even experienced developers may not have encountered bugs in standard library code. They may thus have come to believe that the various implementations of the standard library are rock solid. I am sorry to burst this happy bubble. In writing this book, I came across several standard library bugs.
Classic “Hello, World!” programs are quite likely to work as expected. However, optimization takes the developer into the dark corners and back alleys of the standard library, where bugs are most likely to lurk. The optimizing developer must be ready for occasional disappointment when a promising optimization cannot be implemented, or when an optimization that works on one compiler produces an error message on another.
How can it happen that 30-year-old code still contains bugs, the reader may ask? For one thing, the standard library has been evolving for all that time. The definition of the library and the code that implements it has always been a work in progress. It’s not 30-year-old code. The standard library is maintained separately from the compiler, which also has bugs. For GCC, the library maintainers are volunteers. For Microsoft’s Visual C++, the standard library is a purchased third-party component that has its own release schedule that differs from both the Visual C++ release cycle and the C++ standard release cycle. Changing requirements of the standard, diffuse responsibility, schedule issues, and the complexity of the standard library all take their inevitable toll on quality. What is actually more interesting is that there are so few bugs in the standard library implementations.
Standard library implementations may not conform to the C++ standard
There is probably no such thing as a “standard-conforming implementation.” In the real world, compiler vendors consider their tools to be shippable if they provide a reasonably large subset of the relevant C++ standard, including its most important features.
As for libraries, these are released on a different schedule from that of the compiler, and the compiler is released on a different schedule from that of the C++ standard. A standard library implementation may either lead or trail the standard conformance of the compiler. Different parts of the library my lead or trail to different extents. For developers interested in optimization, changes in the C++ standard, such as the change in optimal insertion points for new map entries (see “Inserting and Deleting in std::map”), mean that the behavior of some functions may surprise the user, since there is no way to document or to determine the version of the standard to which a particular library conforms.
Some libraries are limited by the compiler. For instance, a library cannot implement move semantics until the compiler supports this feature. Imperfect compiler support may lead to limitations in the standard library classes that can be shipped with the compiler. Sometimes, attempting to use a feature gives a compiler error, and it is impossible for a developer to determine whether it is the compiler or the standard library implementation that is broken.
The optimizing developer who is very familiar with the standard may thus be disappointed when an infrequently used feature turns out to be unimplemented by the compiler he is using.
Performance is not the most important thing to standard library developers
Although performance is important to C++ developers, it is not necessarily the most important factor to the developers of the standard library. Coverage is important, especially if it permits them to check a box on the latest C++ standard’s list of features. Simplicity and maintainability are important, because the library is long-lived. Portability is important if the library implementation must support multiple compilers. Performance may sometimes take a back seat to any of these other important factors.
The path from the standard library function call to the related native function may be long and winding. I once traced an fopen() call through half a dozen layers of functions dedicated to argument translation before it arrived at the Windows OpenFile() call that ultimately asked the operating system to open the file. The goal appeared to have been minimizing the number of lines of code, but the many layers of function call could not have helped with performance.
Library implementations may frustrate optimization attempts
The Linux AIO library (not a standard C++ library, but very useful to the optimizing developer) was intended to provide a very efficient, asynchronous, copy-free interface for reading files. The problem was that AIO required a particular kernel version. Until a reasonably large fraction of Linux distros implemented the kernel update, AIO was coded in terms of older, slower I/O calls. The developer could write AIO calls but not get AIO performance.
Not all parts of the C++ standard library are equally useful
Some C++ features, like the elaborate exception hierarchy, vector<bool>, and standard library Allocators, were added to the standard before being wrung out in use. These features actually make coding harder instead of easier. Fortunately, the standard committee seems to have gotten over its earlier enthusiasm for untested new features. Proposed library features now gestate in the Boost library for several years before being taken up by the standard committee.
The standard library is not as efficient as the best native functions
The standard library does not provide features available on some operating systems, such as asynchronous file I/O. The optimizing developer can only go so far with the standard library. To obtain the final iota of performance, she must descend to native calls, sacrificing portability on the altar of the gods of speed.
Optimize Existing Libraries
Optimizing an existing library is like clearing a minefield. It may be possible. It may be necessary. But it is a difficult job that requires patience and attention to detail, or boom!
The easiest kind of library to optimize is well designed, with good separation of concerns and good test cases. Unfortunately, this kind of library is already probably optimized. The reality is that if you are asked to optimize a library, it is probably a mess of functionality squeezed into functions or classes that do too much or too little.
Making changes to a library introduces the risk that some other program relies on unintended or undocumented behavior of the existing implementation. While making a function run faster is not itself likely to cause problems, some accompanying change in behavior may be more of a problem.
Making changes to an open source library introduces potential incompatibility between your project’s version and the main repository. This becomes a problem when the open source library is next revised to fix bugs or add functionality. Either your changes must manually be propagated to the changed library, or your changes are lost in the upgrade, or your modified library becomes fixed in time and does not get the advantage of changes made by other contributors. It is thus a good idea, when picking open source libraries, to be sure the community is welcoming of your changes, or that the library is very mature and stable.
Still, it’s not completely hopeless. The following sections cover some rules for attacking existing libraries.
Change as Little as Possible
The best advice for library optimizers is to change as little as possible. Don’t add or remove functionality from classes or functions, or change function signatures. These kinds of changes are almost guaranteed to break compatibility between the modified library and programs that use it.
Another reason to make the fewest changes possible is to limit the part of the library code that must be understood.
OPTIMIZATION WAR STORY
I once worked for a company that sold and supported an industrial-strength version of OpenSSH, which is based on a program developed in 1995 by Tatu Ylönen, a researcher at the Helsinki University of Technology, for personal use. Being new to the open source world, I noted that the code had not been written by a highly experienced developer, and was thus not as tidy as it could have been. I made some significant changes, to make the code more easily maintained. Or so I thought at first.
Wonderful though my personal code stylings may have been, I found out they could never be deployed.
In retrospect, the reason should have been obvious. My changes made our code differ too much from the open source variant. We relied on the community for important, security-related bug fixes that were automatic on unmodified code, but time-consuming to reimplement on my heavily modified code. While I might have proposed these changes for inclusion in the open source version, the security community is extremely conservative about changes, and rightly so.
I needed to change only code that affected security improvements requested by our users, and then to the minimum extent possible. Refactoring was never even an option with this code, badly as it seemed to cry out for it.
Add Functions Rather than Change Functionality
One bright spot in the gloomy world of optimizing libraries is that new functions and classes can be added to a library in relative safety. There is of course a risk that a future version of the library will define a class or function with the same name as one of your additions, but that risk is controllable by choice of names, and sometimes reparable with macros.
Here are some reasonably safe upgrades that may improve the performance of existing libraries:
· Add functions that move loops down into the library, reflecting idioms of use in your code.
· Implement move semantics in older libraries by adding function overloads that take rvalue references. (See “Implement Move Semantics” for more details on move semantics.)
Design Optimized Libraries
Faced with a badly designed library, there’s little the optimizing developer can do. Faced with a blank screen, the optimizing developer has more latitude to use best practices and avoid pitfalls.
Some of the items in this checklist are aspirational goals. That is, this section offers no specific advice on how to achieve each goal in a particular library, but only notes that the best and most useful libraries tend to embody these virtues. If a particular library design is straying away from these aspirational goals, a review of the design is appropriate.
Code in Haste, Repent at Leisure
Marry in haste, repent at leisure.
Samuel Johnson (1709–1784), English lexicographer, essayist, and poet
Stability of interface is a core deliverable of a library. A library designed in a rush, or a bunch of independently written functions pulled together into a library, will not match in call and return conventions, allocation behavior, or efficiency. The pressure to make a casually built library consistent will arise almost immediately. However, the time required to fix all functions in the library may prevent developers from acting to fill this need.
Designing an optimized library is like designing other C++ code, only the stakes are higher. Libraries, by definition, are meant for broad use. Any imperfections in the design, implementation, or performance of a library are shared among all its users. Casual coding practices, which can be tolerated in noncritical code, are more problematical when developing libraries. Old-school development methods, including up-front specification and design, documentation, and module tests, find their uses in development of critical code like libraries.
OPTIMIZATION PRO TIP: TEST CASES ARE CRITICAL
Test cases are important for any software. They help verify the correctness of the original design, and help reduce the chance that changes made during optimization will affect a program’s correctness. It should not be surprising that they assume an even greater importance in library code, where the stakes are higher.
Test cases help identify dependencies and couplings during the design of a library. The functions of well-designed libraries can be tested in isolation. If many objects must be instantiated before testing a target function, it is a signal to the designer that there is too much coupling between the components of the library.
Test cases help the library designer practice use of the library. Without this practice, it’s far too easy for even experienced designers to leave out important interface functions. Test cases help the library designer identify rough edges in the design at an early stage, when breaking changes to the library interface are not painful. Using the library helps the designer to identify idioms of use, so that these are encoded in the library design, resulting in more efficient function interfaces.
Test cases make good targets for timing tests. Timing tests ensure that any proposed optimizations actually improve performance. Timing tests themselves can be added to the other test cases to ensure that changes do not rob performance.
Parsimony Is a Virtue in Library Design
For those few readers who don’t use this word in their daily lives, Merriam-Webster online defines parsimony as the quality of being careful with money or resources; economy in the use of means to an end. Readers may know this as the KISS (keep it simple, stupid) principle. Parsimony means a library is focused on a particular task and contains the minimum amount of machinery needed to perform that task.
For instance, it is more parsimonious for a library function to accept a valid std::istream reference argument from which to read data than to open a file named by a filename argument; handling operating system–dependent filename semantics and I/O errors is not core to a data processing library. See “Create a Parsimonious Function Signature” for an example of this. Accepting a pointer to a memory buffer is more parsimonious than allocating storage to return; it means the library doesn’t have to handle out-of-memory exceptions. See “Copy Free Libraries” for an example. Parsimony is the ultimate result of consistently applying good C++ development principles like the single responsibility principle and the interface-segregation principle of SOLID design.
Parsimonious libraries are simple. They contain free functions or simple classes that are complete as they stand. They can be learned and understood one piece at a time. This is the way most programmers learn the C++ standard library, a big library that is nevertheless parsimonious.
Make Memory Allocation Decisions Outside the Library
This is a specific instance of the parsimony rule. Since memory allocation is so expensive, push allocation decisions out of the library if possible. For instance, fill in a buffer given as an argument to a library function, rather than returning a buffer allocated by the library function.
Pushing allocation decisions out of library functions allows the caller to implement the optimizations described in Chapter 6, reusing memory where possible instead of allocating new storage for each call.
Pushing allocation out of the library also reduces the number of times a buffer of data may need to be copied as it is passed from function to function.
If necessary, make memory allocation decisions in derived classes, so that the base class keeps only a pointer to the allocated memory. This way, new classes can be derived that allocate memory in different ways.
Requiring memory to be allocated outside of the library affects function signatures (for instance, passing in a pointer to a buffer versus returning an allocated buffer). It is important to make this decision at the time the library is designed. Trying to change the library once functions are in use will definitely cause breaking changes.
When in Doubt, Code Libraries for Speed
In Chapter 1, I quoted Don Knuth’s admonition, “Premature optimization is the root of all evil.” I suggested this was too dogmatic. But in library design, this advice is especially dangerous.
Good performance is particularly important for a library class or function. The library designer cannot predict when the library may be used in a context in which performance matters. Improving performance after the fact can be difficult or impossible, especially if it involves changes to function signatures or behavior. Even a library used within a single enterprise may be invoked from many programs. If the library is widely disseminated, as with open source projects, there is no way to update or even discover all users. Any change becomes a breaking change.
Functions Are Easier to Optimize than Frameworks
There are two kinds of libraries: function libraries and frameworks. A framework is a conceptually large class that implements a complete program skeleton: a windowing application or a web server, for instance. You decorate the framework with small functions that customize it, to be your specific windowing application or your particular web server.
The second kind of library is a collection of functions and classes that are components that together can be used to implement a program: parsing URIs for a web server or drawing text on a window, for instance.
Both kinds of libraries can embody powerful functionality and enhance productivity. A given set of capabilities can be packaged either as functions (as in Windows SDK) or as a framework (as in Windows MFC, for example). However, from an optimizer’s standpoint, function libraries are easier to work with than frameworks.
Functions have the advantage that they can be tested and their performance tuned in isolation. Invoking a framework turns on all its machinery at once, making it harder to isolate and test a change. Frameworks violate the separation of concerns or single responsibility rule. That makes them hard to optimize.
Functions can be used in focused ways within a larger application: a drawing subroutine here, a URI parser there. Only as much functionality as needed is linked from the library. Frameworks contain “god functions” (see “Beware of ‘God Functions’”) that themselves link in many parts of the framework. This can bloat the executable with code that is not actually ever used.
Well-designed functions make few assumptions about the environment in which they run. By contrast, a framework is based on a big, general model of what the developer wants to do. Inefficiency results any time there is a mismatch between the model and the developer’s actual needs.
Flatten Inheritance Hierarchies
Most abstractions require no more than three layers of class derivation: a base class with common functions, one or more derivations to implement polymorphism, and perhaps a mixin layer of multiple inheritance for really complex cases. For specific cases, a developer’s judgment must prevail. However, inheritance hierarchies much deeper than this are a sign that the abstraction represented by the class hierarchy is not clearly thought out, introducing complexity that leads to lower performance.
From an optimization standpoint, the deeper the inheritance hierarchy, the greater the chance to introduce extra computation when a member function is called (see “Cost of Function Calls”). The constructors and destructors in classes with many layers of parents must traverse that whole long chain to do their jobs. Although these functions are not usually called frequently, they still pose a potential risk of introducing expensive calls into performance-critical operations.
Flatten Calling Chains
As with derived classes, the implementations of most abstractions require no more than three nested function calls: a free function or member function implementing strategy, calling a member function of some class, calling some public or private member function implementing the abstraction or accessing data.
If the data is being accessed in a nested abstraction implemented using a contained class instance, that may result in as much as three more layers. This analysis proceeds recursively down chains of nested abstractions. A library that is cleanly decomposed does not contain long chains of nested abstractions. These add overhead in the form of calls and returns.
Flatten Layered Designs
Sometimes one abstraction must be implemented in terms of another abstraction, creating a layered design. As noted previously, this can be carried to extremes that affect performance.
But other times, a single abstraction is reimplemented in a layered fashion. This may be done for several reasons:
· To implement a layer using the façade pattern that changes calling conventions: perhaps switching from project-specific to operating system–specific codes in arguments, switching from text-string to numeric codes in arguments, or inserting project-specific error handling
· To implement an abstraction in terms of a closely related abstraction for which library code already exists
· To implement a transition between error-returning function calls and exception-throwing function calls
· To implement the PIMPL idiom (see “Eliminate Uses of the PIMPL Idiom”)
· To call into a DLL or plug-in
The designer’s judgment must rule in each situation, because there are excellent reasons for doing most of these things. However, each layer transition is an extra function call and return that saps the performance of every call. The designer must review layer transitions to see if they are necessary, or whether two or more layers can be compressed into one. Here are a few thoughts to guide code review:
· Many instances of the façade pattern in a single project may signal overdesign.
· One sign that a design is too layered is if a given layer occurs more than once, as in an error-returning layer calling an exception-handling layer, which then calls an error-returning layer.
· Nested instances of PIMPL are difficult to justify in terms of PIMPL’s original purpose of providing a recompilation firewall. Most subsystems are simply not big enough to require nested PIMPLs (see “Eliminate Uses of the PIMPL Idiom”).
· Project-specific DLLs are often proposed for encapsulating bug fixes. Few projects ever realize this utility, as bug fixes tend to be released in batches that cross DLL boundaries.
Eliminating layers is a task that can be done only at design time. At design time, there is a business need for the library. Once the library is complete, no matter how flawed it is, the cost of any change must be weighed against the benefit. Experience teaches that no manager will welcome your request to spend a couple of sprints fixing up the library, unless there is a gun to their head.
Avoid Dynamic Lookup
Big programs contain big configuration profiles or long lists of registry entries. Complex data files such as audio or video streams contain optional metadata describing the data. When there are just a few metadata items, it’s easy to define a struct or class to contain them. When there get to be dozens or hundreds, many designers are tempted to think of a lookup table design in which each metadata item can be looked up, given a key string. When the profiles are written in JSON or XML, the temptation grows, because there are libraries for dynamically finding items in JSON or XML files. Some programming languages, such as Objective-C, come with system libraries that work this way. However, dynamically searching a symbol table is a performance killer, for several reason:
· Dynamic lookup is inherently inefficient. Some libraries that find JSON or XML items have O(n) performance with respect to the size of the file per lookup. Table-based lookup may be O(log2n). By contrast, fetching a data item from a struct is O(1) and the constant of proportionality is tiny.
· The library designer may not know all the ways the metadata will be accessed. If an initialization profile is only read once at startup, the cost is probably insignificant. But many kinds of metadata must be read repeatedly during processing, and may change between units of work. While premature optimization may be the root of all evil, a library interface that searches for a key string can never be made faster than the search in the underlying key/value table without a change that breaks existing implementations. Apparently evil has more than one root!
· Once a lookup table–based design is in place, the next question that arises is one of consistency. Does the table contain all the metadata needed for a given transformation? Do command-line argument sets that must appear together actually appear? Although a table-based repository can be checked for consistency, it is an expensive runtime operation involving code and multiple expensive searches. Accessing the data in a simple struct is vastly faster than multiple table lookups.
· A struct-based repository is self-documenting in the sense that all possible metadata items are immediately visible. By contrast, a symbol table is just a big opaque bag of named values. A team using such a repository might choose to carefully document the metadata present at each stage in the program’s execution. But such discipline is rare, in my experience. The alternative is to write endless code that attempts to regenerate missing metadata, never knowing whether the code will ever be called, and thus if it is correct.
Beware of ‘God Functions’
A “god function” is a function implementing high-level strategy that, when used in a program, causes the linker to add many more library functions to the executable file. Increasing the size of the executable exhausts physical memory in embedded systems and increases virtual memory paging in desktop-class computers.
Many existing libraries have god functions that are expensive to use. Good libraries eliminate these functions by design. They are inevitable in libraries designed as frameworks.
OPTIMIZATION WAR STORY
This is a parable I call printf() is not your friend.
“Hello, World” is about the simplest possible C++ (or C) program:
# include <stdio.h>
int main(int, char**) {
printf("Hello, World!\n");
return 0;
}
How many executable bytes should this program contain? If you guessed “around 50 or 100 bytes,” you’re wrong by two orders of magnitude. This program was over 8 KB on an embedded controller I once programmed. And that’s just code, not symbol tables, loader information, or anything else.
Here is another program that does the same thing:
# include <stdio.h>
int main(int, char**) {
puts("Hello, World!");
return 0;
}
This program is virtually the same, differing only in the use of puts() to output the string instead of printf(). But the second program occupies around 100 bytes. What causes the difference in size?
printf() is the culprit. printf() can print each particular type in three or four formats. It can interpret a format string to read a variable number of arguments. printf() is a big function on its own, but what really makes it big is that it pulls in standard library functions that format each basic type. On my embedded controller, it was even worse because the processor did not implement floating-point arithmetic in hardware; an extensive library of functions was used instead. printf() is in fact the poster child for the god function—a function that does so much that it sucks in huge swaths of the C runtime library.
On the other hand, puts() just puts one string on the standard output. It is internally quite simple, and furthermore, it doesn’t cause half the standard library to be linked in.
Summary
· Functions and classes enter the C++ standard library either because they cannot be provided any other way, or because they promote very wide reuse across multiple operating systems.
· Standard library implementations have bugs.
· There is no such thing as a “standard-conforming implementation.”
· The standard library is not as efficient as the best native functions.
· When updating a library, change as little as possible.
· Stability of interface is a core deliverable of a library.
· Test cases are critical for optimizing a library.
· Designing a library is like designing other C++ code, only the stakes are higher.
· Most abstractions require no more than three layers of class derivation.
· The implementations of most abstractions require no more than three nested function calls.