Templates and Hierarchies - Abstraction Mechanisms - The C++ Programming Language (2013)

The C++ Programming Language (2013)

Part III: Abstraction Mechanisms

27. Templates and Hierarchies

Euclid’s and Beethoven’s Fifth; knowing just one of them marks you as semi-educated.

– Stan Kelley-Bootle

Introduction

Parameterization and Hierarchy

Generated Types; Template Conversions

Hierarchies of Class Templates

Templates as Interfaces

Template Parameters as Base Classes

Composing Data Structures; Linearizing Class Hierarchies

Advice

27.1. Introduction

Templates and derivation are mechanisms for building new types out of existing ones, for specifying interfaces, and generally for writing useful code that exploits various forms of commonality:

• A template class defines an interface. The template’s own implementation and those of its specializations can be accessed through that interface. The source code implementing the template (in the template definition) is identical for all parameter types. The implementations of different specializations can be very different, but they should all implement the semantics specified for the primary template. A specialization can add functionality to what the primary template offers.

• A base class defines an interface. The class’s own implementation and those of its derived classes can (using virtual functions) be accessed through that interface. The implementations of different derived classes can be very different, but they should all implement the semantics specified for the base class. A derived class can add functionality to what the base class offers.

From a design perspective, the two approaches are close enough to deserve a common name. Since both allow an algorithm to be expressed once and applied to a variety of types, people refer to both as polymorphic (from Greek “many shapes”). To distinguish them, what virtual functions provide is called run-time polymorphism, and what templates offer is called compile-time polymorphism or parametric polymorphism.

The rough duality of the generic and object-oriented approaches can be deceptive. Object-oriented programmers tend to focus on the design of hierarchies of classes (types) with an interface being an individual class (Chapter 21). Generic programmers tend to focus on the design of algorithms with the concepts for template arguments providing an interface that can accommodate many types (Chapter 24). The ideal for a programmer should be mastery of both techniques to the point where either can be used where most appropriate. In many cases, the optimal design contains elements of both. For example vector<Shape*> is a compile-time polymorphic (generic) container holding elements from a run-time polymorphic (object-oriented) hierarchy (§3.2.4).

Generally, good object-oriented programming requires more foresight than good generic programming because all types in a hierarchy must explicitly share an interface defined as a base class. A template will accept any type that meets its concept as an argument, even if there is no explicitly declared commonality among those types. For example accumulate()3.4.2, §24.2, §40.6.1) will accept vectors of ints and lists of complex<double> even though there are no declared relationship between the two element types and no declared relationship between the two sequence types.

27.2. Parameterization and Hierarchy

As shown in §4.4.1 and §27.2.2, combinations of templates and class hierarchies are the basis for many useful techniques. So:

• When do we choose to use a class template?

• When do we rely on a class hierarchy?

Consider these questions from a slightly simplified and abstract point of view:

template<typename X>
class Ct { //
interface expressed in terms of the parameter
X mem;
public:
X f();
int g();
void h(X);
};


template<>
class Ct<A> { //
specialization (for A)
A* mem; // the representation can differ from that of the primary template
public:
A f();
int g();
void h(A);
void k(int); //
added functionality
};
Ct<A> cta; // specialization for A
Ct<B> ctb; // specialization for B

Given that, we can use f(), g(), and h() on the variables cta and ctb, using the implementations of Ct<A> and Ct<B>, respectively. I used an explicit specialization (§23.5.3.4) to show that implementations can vary from what the primary template offers and that it is possible to add functionality. The simpler case – without added functionality – is by far the more common.

The rough equivalent using a hierarchy is:

class X {
//
...
};

class Cx { // interface expressed in terms of types in scope
X mem;
public:
virtual X& f();
virtual int g();
virtual void h(X&);
};

Class DA : public Cx { //
derived class
public:
X& f();
int g();
void h(X&);
};

Class DB : public Cx { //
derived class
DB* p; // representation can be more extensive than what the base provides
public:
X& f();
int g();
void h(X&);
void k(int); //
added functionality
};

Cx& cxa {*new DA}; // cxa is an interface to a DA
Cx& cxb {*new DB}; // cxb is an interface to a DB

Given that, we can use f(), g(), and h() on the variables cxa and cxb, using the implementations of DA and DB, respectively. I use references in the hierarchical version to reflect that we must manipulate derived class objects through pointers or references to preserve run-time polymorphic behavior.

In either case, we manipulate objects that share a common set of operations. From this simplified and abstract point of view we can observe:

• If the types in the interface to the generated or derived classes need to differ, templates have an advantage. To gain access to differing interfaces for derived classes through a base, we must use some form of explicit casting (§22.2).

• If the implementations of generated or derived classes differ only through a parameter or differ only in a few special cases, templates have an advantage. Irregular implementations can be expressed through derived classes or specializations.

• If the actual types of objects used cannot be known at compile time, a class hierarchy is essential.

• If a hierarchical relationship is required between generated or derived types, hierarchies have an advantage. The base class provides a common interface. Conversions among template specializations must be explicitly defined by the programmer (§27.2.2).

• If explicit use of free store (§11.2) is undesirable, templates have an advantage.

• If run-time efficiency is at such a premium that inlining of operations is essential, templates should be used (because effective use of hierarchy requires the use of pointers or references, which inhibit inlining).

Keeping base classes minimal and type-safe can be a struggle. Expressing the interface in terms of existing types that cannot vary for derived classes can be a struggle. Often the result is a compromise that either overconstrains the base class interface (e.g., we pick a class X with a “rich interface” which must be implemented by all derived classes and “for all time”) or underconstrains (e.g., we use void* or a minimal Object*).

The combination of templates and class hierarchies provides design choices and flexibility beyond what either can offer by itself. For example, a pointer to a base class can be used as a template argument to provide run-time polymorphism (§3.2.4), and a template parameter can be used to specify a base class interface to provide type safety (§26.3.7, §27.3.1).

It is not possible to have a virtual function template (§23.4.6.2).

27.2.1. Generated Types

A class template is usefully understood as a specification of how particular types are to be created. In other words, the template implementation is a mechanism that generates types when needed based on a specification. Consequently, a class template is sometimes called a type generator.

As far as the C++ language rules are concerned, there is no relationship between two classes generated from a single class template. For example:

class Shape {
//
...
};

class Circle : public Shape {
{
//
...
};

Given these declarations, people sometimes think that there must be an inheritance relationship between set<Circle> and set<Shape> or at least between set<Circle*> and set<Shape*>. This is a serious logical error based on a flawed argument: “A Circle is a Shape, so a set of Circles is also a set of Shapes; therefore, I should be able to use a set of Circles as a set of Shapes.” The “therefore” part of this argument doesn’t hold. The reason is that a set of Circles guarantees that the members of the set are Circles; a set of Shapes does not provide that guarantee. For example:

class Triangle : public Shape {
//
...
};

void f(set<Shape*>& s)
{
//
...
s.insert(new Triangle{p1,p2,p3});
}

void g(set<Circle*>& s)
{
f(s); //
error, type mismatch: s is a set<Circle*>, not a set<Shape*>
}

This won’t compile because there is no built-in conversion from set<Circle*>& to set<Shape*>&. Nor should there be. The guarantee that the members of a set<Circle*> are Circles allows us to safely and efficiently apply Circle-specific operations, such as determining the radius, to members of the set. If we allowed a set<Circle*> to be treated as a set<Shape*>, we could no longer maintain that guarantee. For example, f() inserts a Triangle* into its set<Shape*> argument. If the set<Shape*> could have been a set<Circle*>, the fundamental guarantee that aset<Circle*> contains Circle*s only would have been violated.

Logically, we could treat an immutable set<Circle*> as an immutable set<Shape*> because the problem of inserting an inappropriate element into the set cannot occur when we cannot change the set. That is, we could provide a conversion from const set<const Circle*> to const set<const Shape*>. The language doesn’t do so by default, but the designer of set could.

The combination of an array and a base class is particularly nasty because a built-in array does not offer the type safety provided by containers. For example:

void maul(Shape* p, int n) // Danger!
{
for (int i=0; i!=n; ++i)
p[i].draw(); //
looks innocent; it is not
}

void user()
{
Circle image[10]; //
an image is composed of 10 Circles
// ...
maul(image,10); // "maul" 10 Circles
// ...
}

How can we call maul() with image? First, image’s type is converted (decays) from Circle[] to Circle*. Next, Circle* is converted to Shape*. The implicit conversion of an array name to a pointer to the array’s first element is fundamental to C-style programming. Similarly, implicit conversion of a pointer to a derived class to a pointer to its base class is fundamental to object-oriented programming. In combination, they offer the opportunity of disaster.

In the example above, assume that a Shape is an abstract class with size 4 and that Circle adds a center and a radius. Then sizeof(Circle)>sizeof(Shape) and when we look at the layout of image we find something like this:

Image

When maul() tries to invoke a virtual function on p[1], there is no virtual function pointer where it is expected and the call hopefully fails instantly.

Note that no explicit cast was needed to get this disaster:

• Prefer containers over built-in arrays.

• Consider interfaces such as void f(T* p, int count) highly suspect; when T can be a base class and count is an element count, trouble awaits.

• Consider . (dot) suspect when applied to something that is supposed to be run-time polymorphic unless it is obviously applied to a reference.

27.2.2. Template Conversions

There cannot be any default relationship between classes generated from the same template (§27.2.1). However, for some templates we would like to express such a relationship. For example, when we define a pointer template, we would like to reflect inheritance relationships among the objects pointed to. Member templates (§23.4.6) allow us to specify many such relationships where desired. Consider:

template<typename T>
class Ptr { //
pointer to T
T* p;
public:
Ptr(T*);
Ptr(const Ptr&); //
copy constructor
template<typename T2>
explicit operator Ptr<T2>(); //
convert Ptr<T> to Ptr<T2>
// ...
};

We would like to define the conversion operators to provide the inheritance relationships we are accustomed to for built-in pointers for these user-defined Ptrs. For example:

void f(Ptr<Circle> pc)
{
Ptr<Shape> ps {pc}; //
should work
Ptr<Circle> pc2 {ps}; // should give error
}

We want to allow the first initialization if and only if Shape really is a direct or indirect public base class of Circle. In general, we need to define the conversion operator so that the Ptr<T> to Ptr<T2> conversion is accepted if and only if a T* can be assigned to a T2*. That can be done like this:

template<typename T>
template<typename T2>
Ptr<T>::operator Ptr<T2>()
{
return Ptr<T2>{p};
}

The return-statement will compile if and only if p (which is a T*) can be an argument to the Ptr<T2>(T2*) constructor. Therefore, if T* can be implicitly converted into a T2*, the Ptr<T> to Ptr<T2> conversion will work. For example, we can now write:

void f(Ptr<Circle> pc)
{
Ptr<Shape> ps {pc}; //
OK: can convert Circle* to Shape*
Ptr<Circle> pc2 {ps}; // error: cannot convert Shape* to Circle*
}

Be careful to define logically meaningful conversions only. If in doubt, use a named conversion function, rather than a conversion operator. A named conversion function offers fewer opportunities for ambiguities.

The template parameter lists of a template and one of its template members cannot be combined. For example:

template<typename T, typename T2> // error
Ptr<T>::operator Ptr<T2>()
{
return Ptr<T2>(p);
}

An alternative solution to this problem uses type traits and enable_if()28.4).

27.3. Hierarchies of Class Templates

Using object-oriented techniques, a base class is often used to provide a common interface to a set of derived classes. A template can be used to parameterize such an interface, and when that is done it is tempting to parameterize the whole hierarchy of derived classes with the same template parameters. For example, we could parameterize the classical shape example (§3.2.4) with types used to provide an abstraction of the target output “device”:

template<typename Color_scheme, typename Canvas> // questionable example
class Shape {
//
...
};

template<typename Color_scheme, typename Canvas>
class Circle : public Shape {
//
...
};
template<typename Color_scheme, typename Canvas>
class Triangle : public Shape {
//
...
};

void user()
{
auto p = new Triangle<RGB,Bitmapped>{{0,0},{0,60},{30,sqr t(60*60–30 *30)}};
//
...
}

Something along this line is often the first idea (after seeing something like vector<T>) that a programmer used to object-oriented programming considers. However, caution is recommended when mixing object-oriented and generic techniques.

As written, this parameterized shape hierarchy is too verbose for real use. That could be addressed using default template arguments (§25.2.5). However, the verbosity is not the main problem. If only one combination of Color_scheme and Canvas is used in a program, the amount of code generated is almost exactly the same as would have been generated for the unparameterized equivalent. The “almost” comes because the compiler will suppress code generation for the definitions of unused non-virtual member functions of a class template. However, if N combinations ofColor_scheme and Canvas are used in a program, the code for every virtual function will be replicated N times. Because a graphics hierarchy is likely to have many derived classes, many member functions, and many complicated functions, the result is likely to be massive code bloat. In particular, the compiler cannot know whether a virtual function is used or not, so it must generate code for all such functions and for all functions called by such functions. Parameterizing a huge class hierarchy with many virtual member functions is typically a poor idea.

For this shape example, the Color_scheme and Canvas parameters are unlikely to affect the interface much: most member functions will not have them as part of their function type. These parameters are an “implementation detail” that escaped into the interface – with likely serious performance implications. It is not really the whole hierarchy that needs those parameters; it is a few configuration functions and (most likely) a few lower-level drawing/rendering functions. It is generally not a good idea to “overparameterize” (§23.4.6.3): try to avoid parameters that affect only a few members. If only a few member functions are affected by a parameter, try to make those function templates with that parameter. For example:

class Shape {
template<typename Color_scheme, typename Canvas>
void configure(const Color_scheme&, const Canvas&);
//
...
};

How to share the configuration information among different classes and different objects is a separate issue. Clearly, we cannot simply store a Color_scheme and Canvas in the Shape without parameterizing Shape itself with Color_scheme and Canvas. One solution would be forconfigure() to “translate” the information in Color_scheme and Canvas into a standard set of configuration parameters (e.g., a set of integers). Another solution is to give a Shape a Configuration* member, where Configuration is a base class providing a general interface to configuration information.

27.3.1. Templates as Interfaces

A template class can be used to provide a flexible and type-safe interface to a common implementation. The vector from §25.3 is a good example of this:

template<typename T>
class Vector<T*>
: private Vector<void*>
{
//
...
};

In general, this technique can be used to provide type-safe interfaces and for localizing casts to implementations, rather than forcing users to write them.

27.4. Template Parameters as Base Classes

In classical object-oriented programming using class hierarchies, we place information that can be different for different classes in derived classes and access it through virtual functions in a base class (§3.2.4, §21.2.1). That way, we can write common code without worrying about variations in the implementations. However, this technique does not allow us to vary the types used in the interface (§27.2). Also, those virtual function calls can be expensive compared to a simple operation from an inlined function. To compensate, we might pass the specialized information and operations to the base class as template arguments. In fact, a template argument can be used as a base class.

The general problem addressed by the following two subsections is “How can we combine separately specified information into a single compact object with a well-specified interface?” This is a fundamental question and the solutions are of general importance.

27.4.1. Composing Data Structures

Consider writing a balanced binary tree library. Since we are providing a library to be used by many different users, we can’t build the type of the user’s (application’s) data into our tree nodes. We have many alternatives:

• We could put the user’s data in a derived class and access it using virtual functions. But the virtual function calls (or equivalently, run-time resolved and checked solutions) are relatively expensive, and our interface to the user’s data is not expressed in terms of the user’s type, so we still need to use casts to access it.

• We could put a void* in our nodes and let the user use that to refer to data allocated outside our nodes. But that could double the number of allocations, add many (potentially expensive) pointer dereferences, and add the space overhead of a pointer in each node. We would also need to use a cast to access our user data using its correct type. That cast can’t be type checked.

• We could put a Data* into our nodes, where Data is a “universal base class” for our data structures. That would solve the type-checking problem, but that combines the cost and inconvenience of the previous two approaches.

There are more alternatives, but consider:

template<typename N>
struct Node_base { //
doesn't know about Val (the user data)
N* left_child;
N* right_child;

Node_base();

void add_left(N* p)
{
if (left_child==nullptr)
left_child = p;
else
//
...
}
//
...
};

template<typename Val>
struct Node : Node_base<Node<Val>> { //
use derived class as part of its own base
Val v;
Node(Val vv);
//
...
};

Here, we pass the derived class Node<Val> as a template argument to its own base (Node_base). That allows Node_base to use Node<Val> in its interfaces without even knowing its real name!

Note that the layout of a Node is compact. For example, a Node<double> will look roughly equivalent to this:

struct Node_base_double {
double val;
Node_base_double* left_child;
Node_base_double* right_child;
};

Unfortunately, with this design, a user has to be aware of Node_base’s operations and the structure of the resulting tree. For example:

using My_node = Node<double>;

void user(const vector<double>& v)
{
My_node root;
int i = 0;

for (auto x : v) {
auto p = new My_node{x};

if (i++%2) // choose where to insert
root.add_left(p);
else
root.add_right(p);
}
}

However, it is not easy for a user to keep the structure of a tree reasonable. Typically, we would like to let the tree take care of that by implementing a tree-balancing algorithm. However, to balance a tree so that you can search it efficiently, the balancer needs to know the user’s values.

How do we add a balancer to our design? We could hardwire the balancing strategy into Node_base and let Node_base “peek” at the user data. For example, a balanced tree implementation, such as the standard-library map, (by default) requires that a value type provides a less-than operation. That way, the Node_base operations can simply use <:

template<typename N>
struct Node_base {
static_assert(Totally_ordered<N>(), "Node_base: N must have a <");

N* left_child;
N* right_child;
Balancing_info bal;

Node_base();

void insert(N& n)
{
if (n<left_child)
//
... do something ...
else
//
... do something else ...
}
//
...
};

This works nicely. In fact, the more information about nodes we build into the Node_base, the simpler the implementation becomes. In particular, we could parameterize the Node_base with a value type rather than a node type (as is done for std::map), and we would have the tree in a single compact package. However, doing so doesn’t address the fundamental question we are trying to address here: how to combine information from several separately specified sources. Writing everything in one place dodges that problem.

So let us assume that the user will want to manipulate Nodes (e.g., move a node from one tree to another), so that we can’t simply store user data into an anonymous node. Let us further assume that we would like to be able to use a variety of balancing algorithms, so that we need to make the balancer an argument. These assumptions force us to face the fundamental question. The simplest solution is to let Node combine the value type with a balancer type. However, Node doesn’t need to use the balancer, so it just passes it on to Node_base:

template<typename Val, typename Balance>
struct Search_node : public Node_base<Search_node<Val, Balance>, Balance>
{
Val val; //
user data
search_node(Val v): val(v) {}
};

Balance is mentioned twice here because it is part of the node type and because Node_base needs to make an object of type Balance:

template<typename N, typename Balance>
struct Node_base : Balance {
N* left_child;
N* right_child;

Node_base();

void insert(N& n)
{
if (this–>compare(n,left_child)) //
use compare() from Balance
// ... do something ...
else
// ... do something else ...
}
//
...
};

I could have used Balance to define a member, rather than using it as a base. However, some important balancers require no per-node data, so by making Balance a base, I benefit from the empty-base optimization. The language guarantees that if a base class has no non-static data members, no memory will be allocated for it in an object of derived class (§iso.1.8). Also, this design is with minor stylistic differences that of a real binary tree framework [Austern,2003]. We might use these classes like this:

struct Red_black_balance {
//
data and operations needed to implement red-black trees
};

template<typename T>
using Rbnode = Search_node<T,Red_black_balance>; //
type alias for red-black trees

Rbnode<double> my_root; // a red-black tree of doubles

using My_node = Rb_node<double>;

void user(const vector<double>& v)
{
for (auto x : v)
root.insert(*new My_node{x});
}

The layout of a node is compact, and we can easily inline all performance-critical functions. What we achieved by the slightly elaborate set of definitions was type safety and ease of composition. This elaboration delivers a performance advantage compared to every approach that introduces a void* into the data structure or function interfaces. Such a use of void* disables valuable type-based optimization techniques. Choosing a low-level (C-style) programming technique in the key parts of a balanced binary tree implementation implies a significant run-time cost.

We passed the balancer as a separate template argument:

template<typename N, typename Balance>
struct Node_base : Balance {
//
...
};

template<typename Val, typename Balance>
struct Search_node
: public Node_base<Search_node<Val, Balance>, Balance>
{
//
...
};

Some find this clear, explicit, and general; others find it verbose and confusing. The alternative is to make the balancer an implicit argument in the form of an associated type (a member type of Search_node):

template<typename N>
struct Node_base : N::balance_type { //
use N's balance_type
// ...
};

template<typename Val, typename Balance>
struct Search_node
: public Node_base<Search_node<Val,Balance>>
{
using balance_type = Balance;
//
...
};

This technique is heavily used in the standard library to minimize explicit template arguments.

The technique of deriving from a base class is very old. It was mentioned in the ARM (1989) and is sometimes referred to as the Barton-Nackman trick after an early use in mathematical software [Barton,1994]. Jim Coplien called it the curiously recurring template pattern (CRTP) [Coplien,1995].

27.4.2. Linearizing Class Hierarchies

The Search_node example from §27.4.1 uses its template to compress its representation and to avoid using void*. The techniques are general and very useful. In particular, many programs that deal with trees rely on it for type safety and performance. For example, the “Internal Program Representation” (IPR) [DosReis,2011] is a general and systematic representation of C++ code as typed abstract syntax trees. It uses template parameters as base classes extensively, both as an implementation aid (implementation inheritance) and to provide abstract interfaces in the classical object-oriented way (interface inheritance). The design addresses a difficult set of criteria, including compactness of nodes (there can be many millions of nodes), optimized memory management, access speed (don’t introduce unnecessary indirections or nodes), type safety, polymorphic interfaces, and generality.

The users see a hierarchy of abstract classes providing perfect encapsulation and clean functional interfaces representing the semantics of a program. For example, a variable is a declaration, which is a statement, which is an expression, which is a node:

Var –> Decl –> Stmt –> Expr –> Node

Clearly, some generalization has been done in the design of the IPR because in ISO C++ statements cannot be used as expressions.

In addition, there is a parallel hierarchy of concrete classes providing compact and efficient implementations of the classes in the interface hierarchy:

impl::Var –> impl::Decl –> impl::Stmt –> impl::Expr –> impl::Node

In all, there are about 80 leaf classes (such as Var, If_stmt, and Multiply) and about 20 generalizations (such as Decl, Unary, and impl::Stmt).

The first attempt of a design was a classical multiple-inheritance “diamond” hierarchy (using solid arrows to represent interface inheritance and dotted arrows for implementation inheritance):

Image

That worked but led to excessive memory overhead: the nodes were too large because of data needed to navigate the virtual bases. In addition, programs were seriously slowed down by the many indirections to access the many virtual bases in each object (§21.3.5).

The solution was to linearize the dual hierarchy so that no virtual bases were used:

Image

For the full set of classes the chain of derivation becomes:

impl::Var –>
impl::Decl<impl::Var> –>
impl::Stmt<impl::Var> –>
impl::Expr<impl::Var> –>
impl::Node<impl::Var> –>
ipr::Var –>
ipr::Decl –>
ipr::Stmt –>
ipr::Expr–>
ipr::Node

This is represented as a compact object with no internal “management data” except the single vptr3.2.3, §20.3.2).

I will show how that is done. The interface hierarchy, defined in namespace ipr is described first. Starting from the bottom, a Node holds data used to optimize traversal and Node type identification (the code_category) and to ease storage of IPR graphs in files (the node_id). These are fairly typical “implementation details” hidden from the users. What a user will know is that every node in an IPR graph has a unique base of type Node and that this can be used to implement operations using the visitor pattern [Gamma,1994] (§22.3):

struct ipr::Node {
const int node_id;
const Category_code category;

virtual void accept(Visitor&) const = 0; //
hook for visitor classes
protected:
Node(Category_code);
};

Node is meant to be used as a base class only, so its constructor is protected. It also has a pure virtual function, so it cannot be instantiated except as a base class.

An expression (Expr) is a Node that has a type:

struct ipr::Expr : Node {
virtual const Type& type() const = 0;
protected:
Expr(Category_code c) : Node(c) { }
};

Obviously, this is quite a generalization of C++ because it implies that even statements and types have types: it is an aim of the IPR to represent all of C++ without implementing all of C++’s irregularities and limitations.

A statement (Stmt) is an Expr that has a source file location and can be annotated with various information:

struct ipr::Stmt : Expr {
virtual const Unit_location& unit_location() const = 0; //
line in file
virtual const Source_location& source_location() const = 0; // file

virtual const Sequence<Annotation>& annotation() const = 0;
protected:
Stmt(Category_code c) : Expr(c) { }
};

A declaration (Decl) is a Stmt that introduces a name:

struct ipr::Decl : Stmt {
enum Specifier { /*
storage class, virtual, access control, etc. */ };

virtual Specifier specifiers() const = 0;
virtual const Linkage& lang_linkage() const = 0;

virtual const Name& name() const = 0;

virtual const Region& home_region() const = 0;
virtual const Region& lexical_region() const = 0;

virtual bool has_initializer() const = 0;
virtual const Expr& initializer() const = 0;

//
...
protected:
Decl(Category_code c) : Stmt(c) { }
};

As you might expect, Decl is one of the central notions when it comes to representing C++ code. This is where you find scope information, storage classes, access specifiers, initializers, etc.

Finally, we can define a class to represent a variable (Var) as a leaf class (most derived class) of our interface hierarchy:

struct ipr::Var : Category<var_cat, Decl> {
};

Basically, Category is a notational aid with the effect of deriving Var from Decl and giving the Category_code used to optimize Node type identification:

template<Category_code Cat, typename T = Expr>
struct Category : T {
protected:
Category() : T(Cat) { }
};

Every data member is a Var. That includes global, namespace, local, and class static variables and constants.

Compared to representations you find in compilers, this interface is tiny. Except for some data for optimizations in Node, this is just a set of classes with pure virtual functions. Note that it is a single hierarchy with no virtual base classes. It is a straightforward object-oriented design. However, implementing this simply, efficiently, and maintainably is not easy, and IPR’s solution is certainly not what an experienced object-oriented designer would first think of.

For each IPR interface class (in ipr), there is a corresponding implementation class (in impl). For example:

template<typename T>
struct impl::Node : T {
using Interface = T; //
make the template argument type available to users
void accept(ipr::Visitor& v) const override { v.visit(*this); }
};

The “trick” is to establish the correspondence between the ipr nodes and the impl nodes. In particular, the impl nodes must provide the necessary data members and override the abstract virtual functions in the ipr nodes. For impl::Node, we can see that if T is an ipr::Node or any class derived from ipr::Node, then the accept() function is properly overridden.

Now, we can proceed to provide implementation classes for the rest of the ipr interface classes:

template<typename Interface>
struct impl::Expr : impl::Node<Interface> {
const ipr::Type* constraint; //
constraint is the type of the expression

Expr() : constraint(0) { }

const ipr::Type& type() const override { return *util::check(constraint); }
};

If the Interface argument is an ipr::Expr or any class derived from ipr::Expr, then impl::Expr is an implementation for ipr::Expr. We can make sure of that. Since ipr::Expr is derived from ipr::Node, this implies that impl::Node gets the ipr::Node base class that it needs.

In other words, we have managed to provide implementations for two (different) interface classes. We can proceed in this manner:

template<typename S>
struct impl::Stmt : S {
ipr::Unit_location unit_locus; //
logical position in translation unit
ipr::Source_location src_locus; // source file, line, and column
ref_sequence<ipr::Annotation> notes;

const ipr::Unit_location& unit_location() const override { return unit_locus; }
const ipr::Source_location& source_location() const override { return src_locus; }
const ipr::Sequence<ipr::Annotation>& annotation() const override { return notes; }
};

That is, impl:Stmt provides the three data items needed to implement ipr::Stmt’s interface and overrides ipr::Stmt’s three virtual functions to do so.

Basically, all impl classes follow Stmt’s pattern:

template<typename D>
struct impl::Decl : Stmt<Node<D> > {
basic_decl_data<D> decl_data;
ipr::Named_map* pat;
val_sequence<ipr::Substitution> args;

Decl() : decl_data(0), pat(0) { }

const ipr::Sequence<ipr::Substitution>& substitutions() const { return args; }
const ipr::Named_map& generating_map() const override { return *util::check(pat); }
const ipr::Linkage& lang_linkage() const override;
const ipr::Region& home_region() const override;
};

Finally, we can define the leaf class impl::Var:

struct Var : impl::Decl<ipr::Var> {
const ipr::Expr* init;
const ipr::Region* lexreg;

Var();

bool has_initializer() const override;
const ipr::Expr& initializer() const override;
const ipr::Region& lexical_region() const override;
};

Note that Var is not a template; it is a user-level abstraction in our application. The implementation of Var is an example of generic programming, but its use is classical object-oriented programming.

The combination of inheritance and parameterization is powerful. This expressive power can lead to confusion for beginners and occasionally even for experienced programmers when faced with a new application area. However, the benefits of the combination are concrete: type safety, performance, and minimal source code size. The extensibility offered by class hierarchies and virtual functions is not compromised.

27.5. Advice

[1] When having to express a general idea in code, consider whether to represent it as a template or as a class hierachy; §27.1.

[2] A template usually provides common code for a variety of arguments; §27.1.

[3] An abstract class can completely hide implementation details from users; §27.1.

[4] Irregular implementations are usually best represented as derived classes; §27.2.

[5] If explicit use of free store is undesirable, templates have an advantage over class hierarchies; §27.2.

[6] Templates have an advantage over abstract classes where inlining is important; §27.2.

[7] Template interfaces are easily expressed in terms of template argument types; §27.2.

[8] If run-time resolution is needed, class hierarchies are necessary; §27.2.

[9] The combination of templates and class hierarchies is often superior to either without the other; §27.2.

[10] Think of templates as type generators (and function generators); §27.2.1.

[11] There is no default relation between two classes generated from the same template; §27.2.1.

[12] Do not mix class hierarchies and arrays; §27.2.1.

[13] Do not naively templatize large class hierarchies; §27.3.

[14] A template can be used to provide a type-safe interface to a single (weakly typed) implementation; §27.3.1.

[15] Templates can be used to compose type-safe and compact data structures; §27.4.1.

[16] Templates can be used to linearize a class hierarchy (minimizing space and access time); §27.4.2.