Run-Time Type Information - Abstraction Mechanisms - The C++ Programming Language (2013)

The C++ Programming Language (2013)

Part III: Abstraction Mechanisms

22. Run-Time Type Information

Premature optimization is the root of all evil.

– Donald Knuth

On the other hand, we cannot ignore efficiency.

– Jon Bentley

Introduction

Class Hierarchy Navigation

dynamic cast; Multiple Inheritance; static_cast and dynamic_cast; Recovering an Interface

Double Dispatch and Visitors

Double Dispatch; Visitors

Construction and Destruction

Type Identification

Extended Type Information

Uses and Misuses of RTII

Advice

22.1. Introduction

In general, a class is constructed from a lattice of base classes. Such a class lattice is often called a class hierarchy. We try to design classes so that users need not be unduly concerned about the way a class is composed out of other classes. In particular, the virtual call mechanism ensures that when we call a function f() on an object, the same function is called whichever class in the hierarchy provided the declaration of f() used for the call and whichever class defined it. This chapter explains how to gain information about the total object given only the interface provided by a base class.

22.2. Class Hierarchy Navigation

A plausible use of the Ival_boxes defined in §21.2 would be to hand them to a system that controlled a screen and have that system hand objects back to the application program whenever some activity had occurred. We will refer to the combination of GUI library and operating system facilities that control the screen as the system. Objects passed back and forth between the system and the application are commonly referred to as widgets or controls. This is how many user interfaces work. From a language point of view, it is important that the system does not know about our Ival_boxes. The system’s interfaces are specified in terms of the system’s own classes and objects rather than our application’s classes. This is necessary and proper. However, it does have the unpleasant effect that we lose information about the type of objects passed to the system and later returned to us.

Recovering the “lost” type of an object requires us to somehow ask the object to reveal its type. Any operation on an object requires us to have a pointer or reference of a suitable type for the object. Consequently, the most obvious and useful operation for inspecting the type of an object at run time is a type conversion operation that returns a valid pointer if the object is of the expected type and a null pointer if it isn’t. The dynamic_cast operator does exactly that. For example, assume that “the system” invokes my_event_handler() with a pointer to a BBwindow, where an activity has occurred. I then might invoke my application code using Ival_box’s do_something():

void my_event_handler(BBwindow* pw)
{
if (auto pb = dynamic_cast<Ival_box*>(pw)) { //
does pw point to an Ival_box?
// ...
int x = pb–>get_value(); // use the Ival_box
// ...
}
else {
//
... oops! cope with unexpected event ...
}
}

One way of explaining what is going on here is that dynamic_cast translates from the implementation-oriented language of the user-interface system to the language of the application. It is important to note what is not mentioned in this example: the actual type of the object. The object will be a particular kind of Ival_box, say, an Ival_slider, implemented by a particular kind of BBwindow, say, a BBslider. It is neither necessary nor desirable to make the actual type of the object explicit in this interaction between “the system” and the application. An interface exists to represent the essentials of an interaction. In particular, a well-designed interface hides inessential details.

Graphically, the action of pb=dynamic_cast<Ival_box*>(pw) can be represented like this:

Image

The arrows from pw and pb represent the pointers into the object passed, whereas the rest of the arrows represent the inheritance relationships between the different parts of the object passed.

The use of type information at run time is conventionally referred to as “run-time type information,” often abbreviated to RTTI.

Casting from a base class to a derived class is often called a downcast because of the convention of drawing inheritance trees growing from the root down. Similarly, a cast from a derived class to a base is called an upcast. A cast that goes from a base to a sibling class, like the cast fromBBwindow to Ival_box, is called a crosscast.

22.2.1. dynamic_cast

The dynamic_cast operator takes two operands: a type bracketed by < and >, and a pointer or reference bracketed by ( and ). Consider first the pointer case:

dynamic_cast<T*>(p)

If p is of type T* or of a type D* where T is a base class of D, the result is exactly as if we had simply assigned p to a T*. For example:

class BB_ival_slider : public Ival_slider, protected BBslider {
//
...
};

void f(BB_ival_slider* p)
{
Ival_slider* pi1 = p; //
OK
Ival_slider* pi2 = dynamic_cast<Ival_slider*>(p); // OK

BBslider* pbb1 = p; // error: BBslider is a protected base
BBslider* pbb2 = dynamic_cast<BBslider*>(p); // OK: pbb2 becomes nullptr
}

This (the upcast) is the uninteresting case. However, it is reassuring to know that dynamic_cast doesn’t allow accidental violation of the protection of private and protected base classes. Since a dynamic_cast used as an upcast is exactly like a simple assignment, it implies no overhead and is sensitive to its lexical context.

The purpose of dynamic_cast is to deal with the case in which the correctness of the conversion cannot be determined by the compiler. In that case, dynamic_cast<T*>(p) looks at the object pointed to by p (if any). If that object is of class T or has a unique base class of type T, thendynamic_cast returns a pointer of type T* to that object; otherwise, nullptr is returned. If the value of p is nullptr, dynamic_cast<T*>(p) returns nullptr. Note the requirement that the conversion must be to a uniquely identified object. It is possible to construct examples where the conversion fails and nullptr is returned because the object pointed to by p has more than one subobject representing bases of type T22.2).

A dynamic_cast requires a pointer or a reference to a polymorphic type in order to do a downcast or a crosscast. For example:

class My_slider: public Ival_slider { // polymorphic base (Ival_slider has virtual functions)
// ...
};

class My_date : public Date { // base not polymorphic (Date has no virtual functions)
// ...
};

void g(Ival_box* pb, Date* pd)
{
My_slider* pd1 = dynamic_cast<My_slider*>(pb); //
OK
My_date* pd2 = dynamic_cast<My_date*>(pd); // error: Date not polymorphic
}

Requiring the pointer’s type to be polymorphic simplifies the implementation of dynamic_cast because it makes it easy to find a place to hold the necessary information about the object’s type. A typical implementation will attach a “type information object” (§22.5) to an object by placing a pointer to the type information in the virtual function table for the object’s class (§3.2.3). For example:

Image

The dashed arrow represents an offset that allows the start of the complete object to be found given only a pointer to a polymorphic subobject. It is clear that dynamic_cast can be efficiently implemented. All that is involved are a few comparisons of type_info objects representing base classes; no expensive lookups or string comparisons are needed.

Restricting dynamic_cast to polymorphic types also makes sense from a logical point of view. That is, if an object has no virtual functions, it cannot safely be manipulated without knowledge of its exact type. Consequently, care should be taken not to get such an object into a context in which its type isn’t known. If its type is known, we don’t need to use dynamic_cast.

The target type of dynamic_cast need not be polymorphic. This allows us to wrap a concrete type in a polymorphic type, say, for transmission through an object I/O system (§22.2.4), and then “unwrap” the concrete type later. For example:

class Io_obj { // base class for object I/O system
virtual Io_obj* clone() = 0;
};


class Io_date : public Date, public Io_obj { };

void f(Io_obj* pio)
{
Date* pd = dynamic_cast<Date*>(pio);
//
...
}

A dynamic_cast to void* can be used to determine the address of the beginning of an object of polymorphic type. For example:

void g(Ival_box* pb, Date* pd)
{
void* pb2 = dynamic_cast<void*>(pb); //
OK
void* pd2 = dynamic_cast<void*>(pd); // error: Date not polymorphic
}

The object representing a base class, such as Ival_box, in a derived class object is not necessarily the first subobject in that object of the most derived class. So, pb does not necessarily hold the same address as pb2.

Such casts are only useful for interaction with very low-level functions (only such functions deal with void*s). There is no dynamic_cast from void* (because there would be no way of knowing where to find the vptr; §22.2.3).

22.2.1.1. dynamic_cast to Reference

To get polymorphic behavior, an object must be manipulated through a pointer or a reference. When a dynamic_cast is used for a pointer type, a nullptr indicates failure. That is neither feasible nor desirable for references.

Given a pointer result, we must consider the possibility that the result is nullptr, that is, that the pointer doesn’t point to an object. Consequently, the result of a dynamic_cast of a pointer should always be explicitly tested. For a pointer p, dynamic_cast<T*>(p) can be seen as the question “Is the object pointed to by p, if any, of type T?” For example:

void fp(Ival_box* p)
{
if (Ival_slider* is = dynamic_cast<Ival_slider*>(p)) { //
does p point to an Ival_slider?
// ... use is ...
}
else {
//
... *p not a slider; handle alternatives ...
}
}

On the other hand, we may legitimately assume that a reference refers to an object (§7.7.4). Consequently, dynamic_cast<T&>(r) of a reference r is not a question but an assertion: “The object referred to by r is of type T.” The result of a dynamic_cast for a reference is implicitly tested by the implementation of dynamic_cast itself. If the operand of a dynamic_cast to a reference isn’t of the expected type, a bad_cast exception is thrown. For example:

void fr(Ival_box& r)
{
Ival_slider& is = dynamic_cast<Ival_slider&>(r); //
r references an Ival_slider!
// ... use is ...
}

The difference in results of a failed dynamic pointer cast and a failed dynamic reference cast reflects a fundamental difference between references and pointers. If a user wants to protect against bad casts to references, a suitable handler must be provided. For example:

void g(BB_ival_slider& slider, BB_ival_dial& dial)
{
try {
fp(&slider); //
pointer to BB_ival_slider passed as Ival_box*
fr(slider); // reference to BB_ival_slider passed as Ival_box&
fp(&dial); // pointer to BB_ival_dial passed as Ival_box*
fr(dial); // dial passed as Ival_box
}
catch (bad_cast) { //
§30.4.1.1
// ...
}
}

The calls to fp() and the first call to fr() will return normally (assuming that fp() really can cope with a BB_ival_dial), but the second call of fr() will cause a bad_cast exception that will be caught by g().

Explicit tests against nullptr can easily be accidentally omitted. If that worries you, you can write a conversion function that throws an exception instead of returning nullptr in case of failure.

22.2.2. Multiple Inheritance

When only single inheritance is used, a class and its base classes constitute a tree rooted in a single base class. This is simple but often constraining. When multiple inheritance is used, there is no single root. In itself, this doesn’t complicate matters much. However, if a class appears more than once in a hierarchy, we must be a bit careful when we refer to the object or objects that represent that class.

Naturally, we try to keep hierarchies as simple as our application allows (and no simpler). However, once a nontrivial hierarchy has been constructed, we sometimes need to navigate it to find a specific class to use. This need occurs in two variants:

• Sometimes, we want to explicitly name a base class for use as an interface, for example, to resolve an ambiguity or to call a specific function without relying on the virtual function mechanism (an explicitly qualified call; §21.3.3).

• Sometimes, we want to obtain a pointer to a subobject of a hierarchy given a pointer to another, for example, to get a pointer to the complete derived class object from a pointer to a base (a downcast; §22.2.1) or to get a pointer to a base class object from a pointer to another base (a crosscast; §22.2.4).

Here, we consider how to navigate a class hierarchy using type conversions (casts) to gain a pointer of the desired type. To illustrate the mechanisms available and the rules that guide them, consider a lattice containing both a replicated base and a virtual base:

class Component
: public virtual Storable { /*
... */ };
class Receiver
: public Component { /*
... */ };
class Transmitter
: public Component { /*
... */ };
class Radio
: public Receiver, public Transmitter { /*
... */ };

or graphically:

Image

Here, a Radio object has two subobjects of class Component. Consequently, a dynamic_cast from Storable to Component within a Radio will be ambiguous and return a 0. There is simply no way of knowing which Component the programmer wanted:

void h1(Radio& r)
{
Storable* ps = &r; //
a Radio has a unique Storable
// ...
Component* pc = dynamic_cast<Component*>(ps); // pc = 0; a Radio has two Components
// ...
}

In general – and typically – a programmer (and a compiler looking at a single translation unit) does not know the complete class lattice. Instead, code is written with the knowledge of some sublattice. For example, a programmer might know only about the Transmitter part of a Radio and write:

void h2(Storable* ps) // ps might or might not point to a Component
{
if (Component* pc = dynamic_cast<Component*>(ps)) {
//
we have a component!
}
else {
//
it wasn't a Component
}
}

The ambiguity for a pointer to a Radio object is not in general detectable at compile time.

This kind of run-time ambiguity detection is needed only for virtual bases. For ordinary bases, there is always a unique subobject of a given cast (or none) when downcasting (that is, toward a derived class; §22.2). The equivalent ambiguity for virtual bases occurs when upcasting (that is, toward a base), but such ambiguities are caught at compile time.

22.2.3. static_cast and dynamic_cast

A dynamic_cast can cast from a polymorphic virtual base class to a derived class or a sibling class (§22.2.1). A static_cast11.5.2) does not examine the object it casts from, so it cannot:

void g(Radio& r)
{
Receiver* prec = &r; //
Receiver is an ordinary base of Radio
Radio* pr = static_cast<Radio*>(prec); // OK, unchecked
pr = dynamic_cast<Radio*>(prec); // OK, run-time checked

Storable* ps = &r; // Storable is a virtual base of Radio
pr = static_cast<Radio*>(ps); // error: cannot cast from virtual base
pr = dynamic_cast<Radio* >(ps); // OK, run-time checked
}

The dynamic_cast requires a polymorphic operand because there is no information stored in a non-polymorphic object that can be used to find the objects for which it represents a base. In particular, an object of a type with layout constraints determined by some other language – such as Fortran or C – may be used as a virtual base class. For objects of such types, only static type information will be available. However, the information needed to provide run-time type identification includes the information needed to implement the dynamic_cast.

Why would anyone want to use a static_cast for class hierarchy navigation? There is a run-time cost associated with the use of a dynamic_cast22.2.1). More significantly, there are millions of lines of code that were written before dynamic_cast became available. This code relies on alternative ways of making sure that a cast is valid, so the checking done by dynamic_cast is seen as redundant. However, such code is typically written using the C-style cast (§11.5.3); often obscure errors remain. Where possible, use the safer dynamic_cast.

The compiler cannot assume anything about the memory pointed to by a void*. This implies that dynamic_cast – which must look into an object to determine its type – cannot cast from a void*. For that, a static_cast is needed. For example:

Radio* f1(void* p)
{
Storable* ps = static_cast<Storable*>(p); //
trust the programmer
return dynamic_cast<Radio*>(ps);
}

Both dynamic_cast and static_cast respect const and access controls. For example:

class Users : private set<Person> { /* ... */ };

void f2(Users* pu, const Receiver* pcr)
{
static_cast<set<Person>*>(pu); //
error: access violation
dynamic_cast<set<Person>*>(pu); // error: access violation
static_cast<Receiver*>(pcr); // error: can't cast away const
dynamic_cast<Receiver*>(pcr); // error: can't cast away const

Receiver* pr = const_cast<Receiver*>(pcr); // OK
// ...
}

It is not possible to cast to a private base class using static_cast or reinterpret_cast, and “casting away const” (or volatile) requires a const_cast11.5.2). Even then, using the result is safe only provided the object wasn’t originally declared const (or volatile) (§16.2.9).

22.2.4. Recovering an Interface

From a design perspective, dynamic_cast22.2.1) can be seen as a mechanism for asking an object if it provides a given interface.

As an example, consider a simple object I/O system. Users want to read objects from a stream, determine that they are of the expected types, and then use them. For example:

void user()
{
//
... open file assumed to hold shapes, and attach ss as an istream for that file ...

unique_ptr<Io_obj> p {get_obj(ss)}; // read object from stream

if (auto sp = dynamic_cast<Shape*>(p.get())) {
sp–>draw(); //
use the Shape
// ...
}
else {
//
oops: non-shape in Shape file
}
}

The function user() deals with shapes exclusively through the abstract class Shape and can therefore use every kind of shape. The use of dynamic_cast is essential because the object I/O system can deal with many other kinds of objects, and the user may accidentally have opened a file containing perfectly good objects of classes that the user has never heard of.

I used unique_ptr<Io_obj>5.2.1, §34.3.1) so that I would not forget to delete the object allocated by get_obj().

This object I/O system assumes that every object read or written is of a class derived from Io_obj. Class Io_obj must be a polymorphic type to allow the user of get_obj() to use dynamic_cast to recover the “true type” of a returned object. For example:

class Io_obj {
public:
virtual Io_obj* clone() const =0; //
polymorphic
virtual ~Io_obj() {}
};

The critical function in the object I/O system is get_obj(), which reads data from an istream and creates class objects based on that data. Assume that the data representing an object on an input stream is prefixed by a string identifying the object’s class. The job of get_obj() is to read that string and call a function capable of reading and creating an object of the right class. For example:

using Pf = Io_obj*(istream&); // pointer to function returning an Io_obj*

map<string,Pf> io_map; // maps strings to creation functions

string get_word(istream& is); // read a word from is; throw Read_error if the read failed

Io_obj* get_obj(istream& is)
{
string str = get_word(is); //
read initial word
if (auto f = io_map[str]) // look up str to get function
return f(is); // call function
throw Unknown_class{}; // no match for str
}

The map called io_map holds pairs of name strings and functions that can construct objects of the class with that name.

We could derive class Shape from Io_obj as needed by user():

class Shape : public Io_obj {
//
...
};

However, it would be more interesting (and in many cases more realistic) to use an already defined Shape3.2.4) unchanged:

struct Io_circle : Circle, Io_obj {
Io_circle(istream&); //
initialize from input stream
Io_circle* clone() const { return new Io_circle{*this}; } // use copy constructor
static Io_obj* new_circle(istream& is) { return new Io_circle{is}; } // for io_map
};

This is an example of how a class can be fitted into a hierarchy using an abstract class with less foresight than would have been required to build it as a node class in the first place (§21.2.2).

The Io_circle(istream&) constructor initializes an object with data from its istream argument. The new_circle() function is the one put into the io_map to make the class known to the object I/O system. For example:

io_map["Io_circle"]=&Io_circle::new_circle; // somewhere

Other shapes are constructed in the same way:

class Io_triangle : public Triangle, public Io_obj {
//
...
};

io_map["Io_triangle"]=&Io_circle::new_triangle; // somewhere

If the provision of the object I/O scaffolding becomes tedious, a template might help:

template<class T>
struct Io : T, Io_obj {
public:
Io(istream&); //
initialize from input stream
Io* clone() const override { return new Io{*this}; }
static Io* new_io(istream& is) { return new Io{is}; } //
for io_map
};

Given this, we can define Io_circle:

using Io_circle = Io<Circle>;

We still need to define Io<Circle>::Io(istream&) explicitly, though, because it needs to know about the details of Circle. Note that Io<Circle>::Io(istream&) does not have access to T’s private or protected data. The idea is that the transmission format for a type X is what is needed to construct an X using one of X’s constructors. The information of the stream is not necessarily the sequence of X’s member values.

The Io template is an example of a way to fit concrete types into a class hierarchy by providing a handle that is a node in that hierarchy. It derives from its template parameter to allow casting from Io_obj. For example:

void f(io<Shape>& ios)
{
Shape* ps = &ios;
//
...
}

Unfortunately, deriving from the template argument precludes using Io for a built-in type:

using Io_date = Io<Date>; // wrap concrete type
using Io_int = Io<int>; // error: cannot derive from built-in type

This problem can be handled by making the user’s object a member of Io_obj:

template<class T>
struct Io :Io_obj {
T val;

Io(istream&); //
initialize from input stream
Io* clone() const override { return new Io{*this}; }
static Io* new_io(istream& is) { return new Io{is}; } //
for io_map
};

Now we can handle

using Io_int = Io<int>; // wrap built-in type

Having made the value a member rather than a base, we can no longer directly cast an Io_obj<X> to an X, so we provide a function to do that:

template<typename T>
T* get_val<T>(Io_obj* p)
{
if (auto pp = dynamic_cast<Io<T>*>(p))
return &pp–>val;
return nullptr;
}

The user() function now becomes:

void user()
{
//
... open file assumed to hold shapes, and attach ss as an istream for that file ...

unique_ptr<Io_obj> p {get_obj(ss)}; // read object from stream

if (auto sp = get_val<Shape>(p.get())) {
sp–>draw(); //
use the Shape
// ...
}
else {

// ... oops: cope with non-shape in Shape file ...
}
}

This simple object I/O system does not do everything anyone ever wanted, but it almost fits on a single page and the key mechanisms have many uses. It is a blueprint for the “receiver end” of a system for transmitting arbitrary objects across a communication channel in a type-safe manner. More generally, these techniques can be used to invoke a function based on a string supplied by a user and to manipulate objects of unknown type through interfaces discovered through run-time type identification.

In general, the sender part of such an object I/O system will also use RTTI. Consider:

class Face : public Shape {
public:
Shape* outline;
array<Shape*> eyes;
Shape* mouth;

//
...
};

To correctly write out the Shape pointed to by outline, we need to figure out which kind of Shape it is. That’s a job for typeid()22.5). In general, we must also keep a table of (pointer,unique identifier) pairs to be able to transmit linked data structures and to avoid duplicating objects pointed to by more than one pointer (or reference).

22.3. Double Dispatch and Visitors

Classical object-oriented programming is based on selecting a virtual function based on the dynamic type (the type of the most derived class) of an object given only a pointer or a reference to an interface (a base class). In particular, C++ can do this run-time lookup (also called a dynamic dispatch) for one type at a time. In this, C++ resembles Simula and Smalltalk and more recent languages, such as Java and C#. Not being able to select a function based on two dynamic types can be a serious limitation. Also, a virtual function must be a member function. This implies that we cannot add a virtual function to a class hierarchy without modifying the base class(es) that provides the interface and all derived classes that should be affected. This too can be a serious problem. This section describes the basic workarounds for these problems:

§22.3.1 Double Dispatch shows how to select a virtual function based on two types.

§22.3.2 Visitors shows how to use double dispatch to add multiple functions to a class hierarchy with only a single additional virtual function in the hierarchy.

Most realistic examples of these techniques occur when we deal with data structures, such as vectors or graphs or pointers to objects of polymorphic types. In such cases, the actual type of an object (e.g., a vector element or a graph node) can only be known dynamically by (implicitly or explicitly) inspecting the interface provided by a base class.

22.3.1. Double Dispatch

Consider how to select a function based on two arguments. For example:

void do_someting(Shape& s1, Shape& s2)
{
if (s1.intersect(s2)) {
//
the two shapes overlap
}
//
...
}

We would like this to work for any two classes in the class hierarchy rooted in Shape, such as Circle and Triangle.

The basic strategy is to do a virtual function call to select the right function for s1 and then do a second call to select the right function for s2. To simplify, I will leave out the calculation of whether the two shapes actually intersect and just write the code skeleton for selecting the right functions. First we define Shape with a function for intersection:

class Circle;
class Triangle;

class Shape {
public:
virtual bool intersect(const Shape&) const =0;
virtual bool intersect(const Circle&) const =0;
virtual bool intersect(const Triangle&) const =0;
};

Next we need to define Circle and Triangle to override those virtual functions:

class Circle : public Shape {
public:
bool intersect(const Shape&) const override;
virtual bool intersect(const Circle&) const override;
virtual bool intersect(const Triangle&) const override

};

class Triangle : public Shape {
public:
bool intersect(const Shape&) const override;
virtual bool intersect(const Circle&) const override;
virtual bool intersect(const Triangle&) const override;
};

Now each class can handle all possible classes in the Shape hierarchy, so we just have to decide what should be done for each combination:

bool Circle::intersect(const Shape& s) const { return s.intersect(*this); }
bool Circle::intersect(const Circle&) const { cout <<"intersect(circle,circle)\n"; return true; }
bool Circle::intersect(const Triangle&) const { cout <<"intersect(circle,triangle)\n"; return true; }

bool Triangle::intersect(const Shape& s) const { return s.intersect(*this); }
bool Triangle::intersect(const Circle&) const { cout <<"intersect(triangle,circle)\n"; return true; }
bool Triangle::intersect(const Triangle&) const { cout <<"intersect(triangle,triangle)\n"; return true; }

The interesting functions here are Circle::intersect(const Shape&) and Triangle::intersect(const Shape&). These need to handle a Shape& argument because that argument must refer to a derived class. The trick/technique is to simply do a virtual call with the arguments in the reverse order. That done, we are in one of the four functions that can actually do an intersection calculation.

We can test this by making a vector of all pairs of Shape* values and calling intersect() for those:

void test(Triangle& t, Circle& c)
{
vector<pair<Shape*,Shape*>> vs { {&t,&t}, {&t,&c}, {&c,&t}, {&c,&c} };
for (auto p : vs)
p.first–>intersect(*p.second);
}

Using Shape*s ensures that we rely on run-time resolution of the types. We get:

intersect(triangle,triangle)
intersect(triangle,circle)
intersect(circle,triangle)
intersect(circle,circle)

If you consider this elegant, you need to raise your standards, but it gets the task done. As the class hierarchy grows, the need for virtual functions grows exponentially. That is not acceptable in most cases. Expanding this to three or more arguments is trivial, but tedious. Worst of all, each new operation and each new derived class require a modification to every class in the hierarchy: this double-dispatch technique is highly intrusive. Ideally, I would have preferred a simple intercept(Shape&,Shape&) function with overriders specified for the desired combinations of particular shapes. That is possible [Pirkelbauer,2009], but not in C++11.

The awkwardness of double dispatch does not make the problem it is trying to address less important. It is not unusual to want an action, such as intersect(x,y), that depends on the types of two (or more) operands. Workarounds abound. For example, finding the intersection of rectangles is simple and efficient. So, for many applications, people have found it sufficient to define a “bounding box” for each shape and then calculate intersections on bounding boxes. For example:

class Shape {
public:
virtual Rectangle box() const = 0; //
the rectangle encloses the shape
// ...
};

class Circle : public Shape {
public:
Rectangle box() const override;
//
...
};

class Triangle : public Shape {
public:
Rectangle box() const override;
//
...
};

bool intersect(const Rectangle&, const Rectangle&); // simple to calculate

bool intersect(const Shape& s1, const Shape& s2)
{
return intersect(s1.box(),s2.box());
}

Another technique is to precompute a lookup table for combinations of types [Stroustrup,1994]:

bool intersect(const Shape& s1, const Shape& s2)
{
auto i = index(type_id(s1),type_id(s2));
return intersect_tbl[i](s1,s2);
}

Variations of this idea are widely used. Many variants use precomputed values stored in objects to speed up type identification (§27.4.2).

22.3.2. Visitors

The visitor pattern [Gamma,1994] is a partial solution to the exponential growth of virtual functions and overriders and the unpleasent intrusiveness of the (too) simple double-dispatch technique.

Consider how to apply two (or more) operations to every class in a class hierarchy. Basically, we will do a double dispatch for a hierarchy of nodes and a hierarchy of operations to select the correct operation for the correct node. The operations are called visitors; here they are defined inclasses derived from class Visitor. The nodes are a hierarchy of classes with a virtual function accept() that takes Visitor&s. For this example, I use a hierarchy of Nodes that describe language constructs, as is common in tools based on abstract syntax trees (ASTs):

class Visitor;

class Node {
public:
virtual void accept(Visitor&) = 0;
};

class Expr : public Node {
public:
void accept(Visitor&) override;
};

class Stmt : public Node {
public:
void accept(Visitor&) override;
};

So far, so good: the Node hierarchy simply provides a virtual function accept() that takes a Visitor& argument representing what should be done to a Node of a given type.

I do not use const here, because in general an operation from a Visitor may update either the Node “visited” or the Visitor itself.

Now the Node’s accept() performs the double-dispatch trick and passes the Node itself to the Visitor’s accept():

void Expr::accept(Visitor& v) { v.accept(*this); }
void Stmt::accept(Visitor& v) { v.accept(*this); }

The Visitor declares a set of operations:

class Visitor {
public:
virtual void accept(Expr&) = 0;
virtual void accept(Stmt&) = 0;
};

We can then define sets of operations by deriving from Visitor and overriding its accept() functions. For example:

class Do1_visitor : public Visitor {
void accept(Expr&) { cout << "do1 to Expr\n"; }
void accept(Stmt&) { cout << "do1 to Stmt\n"; }
};

class Do2_visitor : public Visitor {
void accept(Expr&) { cout << "do2 to Expr\n"; }
void accept(Stmt&) { cout << "do2 to Stmt\n"; }
};

We can test by making a vector of pairs of pointers to ensure that run-time type resolution is used:

void test(Expr& e, Stmt& s)
{
vector<pair<Node*,Visitor*>> vn {&e,&do1}, {&s,&do1}, {&e,&do2}, {&s,&do2}};
for (auto p : vn)
p.first–>accept(*p.second);
}

We get:

do1 to Expr
do1 to Stmt
do2 to Expr
do2 to Stmt

As opposed to the simple double dispatch, the visitor pattern is heavily used in real-world programming. It is only mildly intrusive (the accept() function), and many variations on the basic idea are used. However, many operations on class hierarchies are hard to express as visitors. For example, an operation that needs access to multiple nodes of different types in a graph cannot be trivially implemented as a visitor. So, I consider the visitor pattern an inelegant workaround. Alternatives exist, for example, [Solodkyy,2012], but not in plain C++11.

Most alternatives to visitors in C++ are based on the idea of explicit iteration over a homogeneous data structure (e.g., a vector or a graph of nodes containing pointers to polymorphic types). At each element or node, a call of a virtual function can perform the desired operation, or some optimization based on stored data can be applied (e.g., see §27.4.2).

22.4. Construction and Destruction

A class object is more than simply a region of memory (§6.4). A class object is built from “raw memory” by its constructors, and it reverts to “raw memory” as its destructors are executed. Construction is bottom-up, destruction is top-down, and a class object is an object to the extent that it has been constructed or destroyed. This order is necessary to ensure that an object is not accessed before it has been initialized. It is unwise to try to access base and member objects early or out of order through “clever” pointer manipulation (§17.2.3). The order of construction and destruction is reflected in the rules for RTTI, exception handling (§13.3), and virtual functions (§20.3.2).

It is unwise to rely on details of the order of construction and destruction, but you can observe that order by calling virtual functions, dynamic_cast22.2), or typeid22.5) at a point where the object isn’t complete. At such a point in a constructor, the (dynamic) type of the object reflects only what is constructed so far. For example, if the constructor for Component in the hierarchy from §22.2.2 calls a virtual function, it will invoke a version defined for Storable or Component, but not one from Receiver, Transmitter, or Radio. At that point of construction, the object isn’t yet a Radio. Similarly, calling a virtual function from a destructor will reflect only what is still not destroyed. It is best to avoid calling virtual functions during construction and destruction.

22.5. Type Identification

The dynamic_cast operator serves most needs for information about the type of an object at run time. Importantly, it ensures that code written using it works correctly with classes derived from those explicitly mentioned by the programmer. Thus, dynamic_cast preserves flexibility and extensibility in a manner similar to virtual functions.

However, it is occasionally essential to know the exact type of an object. For example, we might like to know the name of the object’s class or its layout. The typeid operator serves this purpose by yielding an object representing the type of its operand. Had typeid() been a function, its declaration would have looked something like this:

class type_info;
const type_info& typeid(expression); //
pseudo declaration

That is, typeid() returns a reference to a standard-library type called type_info defined in <typeinfo>:

• Given the name of a type as its operand, typeid(type_name) returns a reference to a type_info that represents the type_name; type_name must be a completely defined type (§8.2.2).

• Given an expression as its operand, typeid(expr) returns a reference to a type_info that represents the type of the object denoted by the expr; the expr must refer to a completely defined type (§8.2.2). If the value of expr is nullptr, typeid(expr) throws a std::bad_typeid.

A typeid() can find the type of an object referred to by a reference or a pointer:

void f(Shape& r, Shape* p)
{
typeid(r); //
type of the object referred to by r
typeid(*p); // type of the object pointed to by p
typeid(p); // type of the pointer, that is, Shape* (uncommon, except as a mistake)
}

If the operand of typeid() is a pointer or a reference of a polymorphic type with the value nullptr, typeid() throws a std::bad_typeid. If the operand of typeid() has a nonpolymorphic type or is not an lvalue, the result is determined at compile time without evaluating the operand expression.

If the object denoted by a dereferenced pointer or a reference to a polymorphic type, the type_info returned is that of the most derived class for the object, that is, the type used when the object was defined. For example:

struct Poly { // polymorphic base class
virtual void f();
//
...
};

struct Non_poly { /* ... */ }; // no virtual functions

struct D1
: Poly{ /*
... */ };
struct D2
: Non_poly { /*
... */ };
void f(Non_poly& npr, Poly& pr)
{
cout << typeid(npr).name() << '\n'; //
writes something like "Non_poly"
cout << typeid(pr).name() << '\n'; // name of Poly or a class derived from Poly
}

void g()
{
D1 d1;
D2 d2;
f(d2,d1); //
writes "Non_poly D1"
f(*static_cast<Poly*>(nullptr),*static_cast<Null_poly*>(nullptr)); // oops!
}

That last call will print just Non_poly (because typeid(npr) is not evaluated) before throwing a bad_typeid.

The definition of type_info looks like this:

class type_info {
//
data
public:
virtual ~type_info(); //
is polymorphic

bool operator==(const type_info&) const noexcept; // can be compared
bool operator!=(const type_info&) const noexcept;

bool before(const type_info&) const noexcept; // ordering
size_t hash_code() const noexcept; // for use by unordered_map and the like
const char* name() const noexcept; // name of type

type_info(const type_info&) = delete; // prevent copying
type_info& operator=(const type_info&) = delete; // prevent copying
};

The before() function allows type_infos to be sorted. In particular, it allows type_ids to be used as keys for ordered containers (such as map). There is no relation between the relationships defined by before and inheritance relationships. The hash_code() function allows type_ids be used as keys for hash tables (such as unordered_map).

It is not guaranteed that there is only one type_info object for each type in the system. In fact, where dynamically linked libraries are used, it can be hard for an implementation to avoid duplicate type_info objects. Consequently, we should use == on type_info objects to test equality, rather than == on pointers to such objects.

We sometimes want to know the exact type of an object so as to perform some service on the whole object (and not just on one of its bases). Ideally, such services are presented as virtual functions so that the exact type needn’t be known. In some cases, no common interface can be assumed for every object manipulated, so the detour through the exact type becomes necessary (§22.5.1). Another, much simpler use has been to obtain the name of a class for diagnostic output:

#include<typeinfo>

void g(Component* p)
{
cout << typeid(*p).name();
}

The character representation of a class’s name is implementation-defined. This C-style string resides in memory owned by the system, so the programmer should not attempt to delete[] it.

22.5.1. Extended Type Information

A type_info object contains only minimal information. Therefore, finding the exact type of an object is often just the first step to acquiring and using more detailed information about that type.

Consider how an implementation or a tool could make information about types available to users at run time. Suppose I have a tool that generates descriptions of object layouts for each class used. I can put these descriptors into a map to allow user code to find the layout information:

#include <typeinfo>

map<string, Layout> layout_table;

void f(B* p)
{
Layout& x = layout_table[typeid(*p).name()]; //
find the Layout based on *p's name
// ... use x ...
}

The resulting data structure looks like this:

Image

Someone else might provide a completely different kind of information:

unordered_map<type_index,Icon> icon_table; // §31.4.3.2

void g(B* p)
{
Icon& i = icon_table[type_index{typeid(*p)}];
//
... use i ...
}

The type_index is a standard-library type for comparing and hashing type_info objects (§35.5.4).

The resulting data structure looks like this:

Image

Associating typeids with information without modifying system headers allows several people or tools to associate different information with types independently of each other. This is important because the likelihood that someone can come up with a single set of information that satisfies every user is close to zero.

22.6. Uses and Misuses of RTTI

We should use explicit run-time type information only when necessary. Static (compile-time) checking is safer, implies less overhead, and – where applicable – leads to better-structured programs. Interfaces based on virtual functions combine static type checking with a run-time lookup in a way that gives both type safety and flexibility. However, programmers sometimes overlook these alternatives and use RTTI where it is unsuitable. For example, RTTI can be used to write thinly disguised switch-statements:

// misuse of run-time type information:

void rotate(const Shape& r)
{
if (typeid(r) == typeid(Circle)) {
//
do nothing
}
else if (typeid(r) == typeid(Triangle)) {
//
... rotate triangle ...
}
else if (typeid(r) == typeid(Square)) {
//
... rotate square ...
}
//
...
}

Using dynamic_cast rather than typeid would improve this code only marginally. Either way, this code is syntactically ugly and also inefficient in that it performs an expensive operation repeatedly.

Unfortunately, this is not a strawman example; such code really does get written. For many people trained in languages without equivalents to class hierachies and virtual functions, there is an almost irresistible urge to organize software as a set of switch-statements. This urge should usually be resisted. Use virtual functions (§3.2.3, §20.3.2) rather than RTTI to handle most cases when run-time discrimination based on type is needed.

Many examples of proper use of RTTI arise when some service code is expressed in terms of one class and a user wants to add functionality through derivation. The use of Ival_box in §22.2 is an example of this. If the user is willing and able to modify the definitions of the library classes, say BBwindow, then the use of RTTI can be avoided; otherwise, it is needed. Even if the user is willing to modify the base classes (e.g., to add a virtual function), such modification may cause its own problems. For example, it may be necessary to introduce dummy implementations of virtual functions in classes for which those functions are not needed or not meaningful. A use of RTTI to implement a simple object I/O system can be found in §22.2.4.

For people with a background in languages that rely heavily on dynamic type checking, such as Smalltalk, pre-generics Java, or Lisp, it is tempting to use RTTI in conjunction with overly general types. Consider:

// misuse of run-time type information:

class Object { // polymorphic
// ...
};

class Container : public Object {
public:
void put(Object*);
Object* get();
//
...
};

class Ship : public Object { /* ... */ };

Ship* f(Ship* ps, Container* c)
{
c–>put(ps); //
put the Ship into the container
// ...
Object* p = c–>get(); // retrieve an Object from the container
if (Ship* q = dynamic_cast<Ship*>(p)) { // run-time check that the Object is a Ship
return q;
}
else {
//
... do something else (typically, error handling) ...
}
}

Here, class Object is an unnecessary implementation artifact. It is overly general because it does not correspond to an abstraction in the application domain and forces the application programmer to use an implementation-level abstraction (Object). Problems of this kind are often better solved by using container templates that hold only a single kind of pointer:

Ship* f(Ship* ps, vector<Ship*>& c)
{
c.push_back(ps); //
put the Ship into the container
// ...
return c.pop_back(); // retrieve a Ship from the container
}

This style of code is less error-prone (better statically type checked) and less verbose than a pure-Object-based alternative. Combined with the use of virtual functions, this technique handles most cases. In a template, a template argument T takes the place of Object and enables static type checking (§27.2).

22.7. Advice

[1] Use virtual functions to ensure that the same operation is performed independently of which interface is used for an object; §22.1.

[2] Use dynamic_cast where class hierarchy navigation is unavoidable; §22.2.

[3] Use dynamic_cast for type-safe explicit navigation of a class hierarchy; §22.2.1.

[4] Use dynamic_cast to a reference type when failure to find the required class is considered a failure; §22.2.1.1.

[5] Use dynamic_cast to a pointer type when failure to find the required class is considered a valid alternative; §22.2.1.1.

[6] Use double dispatch or the visitor pattern to express operations on two dynamic types (unless you need an optimized lookup); §22.3.1.

[7] Don’t call virtual functions during construction or destruction; §22.4.

[8] Use typeid to implement extended type information; §22.5.1.

[9] Use typeid to find the type of an object (and not to find an interface to an object); §22.5.

[10] Prefer virtual functions to repeated switch-statements based on typeid or dynamic_cast; §22.6.