Math for 3D Graphics - Foundations - OpenGL SuperBible: Comprehensive Tutorial and Reference, Sixth Edition (2013)

OpenGL SuperBible: Comprehensive Tutorial and Reference, Sixth Edition (2013)

Part I: Foundations

Chapter 4. Math for 3D Graphics

What You’ll Learn in This Chapter

• What a vector is, and why you should care

• What a matrix is, and why you should care more

• How we use matrices and vectors to move geometry around

• The OpenGL conventions and coordinate spaces

So far, you have learned to draw points, lines, and triangles and have written simple shaders that pass your hard-coded vertex data through unmodified. We haven’t really been rendering in 3D—which is odd for a book on 3D graphics! Well, to turn a collection of shapes into a coherent scene, you must arrange them in relation to one another and to the viewer. In this chapter, you start moving shapes and objects around in your coordinate system. The ability to place and orient your objects in a scene is a crucial tool for any 3D graphics programmer. As you will see, it is actually convenient to describe your objects’ dimensions around the origin and then transform the objects into the desired positions.

Is This the Dreaded Math Chapter?

In most books on 3D graphics programming, yes, this would be the dreaded math chapter. However, you can relax; we take a more moderate approach to these principles than some texts.

One of the fundamental mathematical operations that will be performed by your shaders is the coordinate transform, which boils down to multiplying matrices with vectors and with each other. The keys to object and coordinate transformations are two matrix conventions used by OpenGL programmers. To familiarize you with these matrices, this chapter strikes a compromise between two extremes in computer graphics philosophy. On the one hand, we could warn you, “Please review a textbook on linear algebra before reading this chapter.” On the other hand, we could perpetuate the deceptive reassurance that you can “learn to do 3D graphics without all those complex mathematical formulas.” But we don’t agree with either camp.

In reality, you can get along just fine without understanding the finer mathematics of 3D graphics, just as you can drive your car every day without having to know anything at all about automotive mechanics and the internal combustion engine. But you had better know enough about your car to realize that you need an oil change every so often, that you have to fill the tank with gas regularly, and that you must change the tires when they get bald. This knowledge makes you a responsible (and safe!) automobile owner. If you want to be a responsible and capable OpenGL programmer, the same standards apply. You need to understand at least the basics so you know what can be done and what tools best suit the job. If you are a beginner, you will find that, with some practice, matrix math and vectors will gradually make more and more sense, and you will develop a more intuitive (and powerful) ability to make full use of the concepts we introduce in this chapter.

So even if you don’t already have the ability to multiply two matrices in your head, you need to know what matrices are and that they are the means to OpenGL’s 3D magic. But before you go dusting off that old linear algebra textbook (doesn’t everyone have one?), have no fear: The sb6library has a component called vmath that contains a number of useful classes and functions that can be used to represent and manipulate vectors and matrices. They can be used directly with OpenGL and are very similar in syntax and appearance to GLSL — the language you’ll be writing your shaders in. So, although you don’t have to do all your matrix and vector manipulation yourself, it’s still a good idea to know what they are and how to apply them. See — you can eat your cake and have it too!

A Crash Course in 3D Graphics Math

There are a good many books on the math behind 3D graphics, and a few of the better ones that we have found are listed in Appendix A, “Further Reading.” We do not pretend here that we are going to cover everything that is important for you to know. We are not even going to try and cover everything you should know. In this chapter, we are just going to cover what you really need to know. If you’re already a math whiz, you should skip immediately to the section ahead on the standard 3D transformations. Not only do you already know what we are about to cover, but most math fans will be somewhat offended that we did not give sufficient space to their favorite feature of homogeneous coordinate spaces. Imagine one of those reality TV shows where you must escape a virtual swamp filled with crocodiles. How much 3D math do you really need to know to survive? That’s what the next two sections are going to be about, 3D math survival skills. The crocodiles do not care if you really know what a homogeneous coordinate space is or not.

Vectors, or Which Way Is Which?

The main input to OpenGL is the vertex, which has a number of attributes that normally include a position. Basically, this is a position in xyz coordinate space, and a given position in space is defined by exactly one and only one unique xyz triplet. An xyz triplet, however, can be represented as a vector (in fact, for the mathematically pure in heart, a position is actually a vector too... there, we threw you a bone). A vector is perhaps the single most important foundational concept to understand when it comes to manipulating 3D geometry. Those three values (x, y, and z) combined represent two important values: a direction and a magnitude.

Figure 4.1 shows a point in space (picked arbitrarily) and an arrow drawn from the origin of the coordinate system to that point in space. The point can be thought of as a vertex when you are stitching together triangles, but the arrow can be thought of as a vector. A vector is first, simply a direction from the origin toward the point in space. We use vectors all the time in OpenGL to represent directional quantities. For example, the x axis is the vector (1, 0, 0). This says to go positive one unit in the x direction, and zero in the y and z directions. A vector is also how we pointwhere we are going, for example, which way is the camera pointing, or in which direction do we want to move to get away from that crocodile! The vector is so fundamental to the operation of OpenGL that vectors of various sizes are first-class types in GLSL and are given names such asvec3 and vec4 (representing 3- and 4-element vectors, respectively).

Image

Figure 4.1: A point in space is both a vertex and a vector.

The second quantity a vector can represent is the magnitude. The magnitude of a vector is the length of the vector. For our x axis vector (1, 0, 0), the length of the vector is one. A vector with a length of one we call a unit vector. If a vector is not a unit vector and we want to scale it to make it one, we call that normalization. Normalizing a vector scales it such that its length becomes one and the vector is then said to be normalized. Unit vectors are important when we only want to represent a direction and not a magnitude. Also, if vector lengths appear in the equations we’ll be using, they get a whole lot simpler when those lengths are 1! A magnitude can be important as well; for example, it can tell us how far we need to move in a given direction — how far away I need to get from that crocodile.

Vectors (and matrices) are such important concepts in 3D graphics that they are first class citizens in GLSL — the language in which you write your shaders. However, this is not so in languages like C++. To allow you to use them in your C++ programs, the vmath library that is provided with this book’s source code contains classes that can represent vectors and matrices that are named similarly to their GLSL counterparts: For instance, vmath::vec3 can represent a three-component floating-point vector (x, y, z), and vmath::vec4 can represent a four-component floating-point vector (x, y, z, w) and so on. The w coordinate is added to make the vector homogeneous but is typically set to 1.0. The x, y, and z values might later be divided by w, which when it is 1.0, essentially leaves the xyz values alone. The classes in vmath are actually templated classes with type definitions to represent common types such as single- and double-precision floating-point values, and signed- and unsigned-integer variables. vmath::vec3 and vmath::vec4 are defined simply as follows:

typedef Tvec3<float> vec3;
typedef Tvec4<float> vec4;

Declaring a three-component vector is as simple as

sb6::vmath::vec3 vVector;

If you include “using namespace vmath” in your source code, you can even write

vec3 vVector;

However, in these examples, we’ll always qualify our use of the vmath library by explicitly using the vmath:: namespace. All of the vmath classes define a number of constructors and copy operators, which means you can declare and initialize a vectors as follows:

vec3 vVertex1(0.0f, 0.0f, 1.0f);
vec4 vVertex2 = vec4(1.0f, 0.0f, 1.0f, 1.0f);
vec4 vVertex3(vVertex1, 1.0f);

Now, an array of three-component vertices, such as for a triangle can be declared as

vec3 vVerts[] = { vec3(-0.5f, 0.0f, 0.0f),
vec3( 0.5f, 0.0f, 0.0f),
vec3( 0.0f, 0.5f, 0.0f) } ;

This should look similar to the code that we introduced you to in the section “Drawing Our First Triangle” back in Chapter 2. The vmath library also includes lots and lots of math-related functions and overrides most operators on its class to allow vectors and matrices to be added, subtracted, multiplied, transposed, and so on.

We need to be careful here not to gloss over that fourth W component too much. Most of the time when you specify geometry with vertex positions, a three-component vertex is all you want to store and send to OpenGL. For many directional vectors, such as a surface normal (a vector pointing perpendicular to a surface that is used for lighting calculations), again, a three-component vector suffices. However, we soon delve into the world of matrices, and to transform a 3D vertex, you must multiply it by a 4 × 4 transformation matrix. The rules are you must multiply a four-component vector by a 4 × 4 matrix; if you try and use a three-component vector with a 4 × 4 matrix... the crocodiles will eat you! More on what all this means soon. Essentially, if you are going to do your own matrix operations on vectors, then you will probably want four-component vectors in many cases.

Common Vector Operators

Vectors behave as you would expect for operations such as addition, subtraction, unary negation, and so on. These operators perform a per-component calculation and result in a vector of the same size as their inputs. The vmath vector classes override the addition, subtraction, and unary negation operators, along with several others, to provide such functionality. This allows you to use code such as

vmath::vec3 a(1.0f, 2.0f, 3.0f);
vmath::vec3 b(4.0f, 5.0f, 6.0f);
vmath::vec3 c;

c = a + b;
c = a - b;
c += b;
c = -c;

However, there are many more operations on vectors that are explained from a mathematical perspective in the following subsections. They also have implementations in the vmath library, which will be outlined here.

Dot Product

Vectors can be added, subtracted, and scaled by simply adding, subtracting, or scaling their individual XYZ components. An interesting and useful operation, however, that can be applied only to two vectors is called the dot product, which is also sometimes known as the inner product. The dot product between two (three-component) vectors returns a scalar (just one value) that is the cosine of the angle between the two vectors scaled by the product of their lengths. If the two vectors are of unit length, the value returned falls between −1.0 and 1.0 and is equal to the cosine of the angle between them. Of course, to get the actual angle between the vectors, you’d need to take the inverse cosine (or arc-cosine) of this value. The dot product is used extensively during lighting calculations and is taken between a surface normal vector and a vector pointing toward a light source in diffuse lighting calculations. We will delve deeper into this type of shader code in the section “Lighting Models” in Chapter 12. Figure 4.2 shows two vectors, v1 and v2, and how the angle between them is represented by θ

Image

Figure 4.2: The dot product — cosine of the angle between two vectors

Mathematically, the dot product of two vectors v1 and v2 is calculated as

v1 × v2 = v1.x × v2.x + v1.y × v2.y + v1.z × v2.z

The vmath library has some useful functions that use the dot product operation. For starters, you can actually get the dot product itself between two vectors with the function vmath::dot, or with the dot member function of the vector classes.

vmath::vec3 a(...);
vmath::vec3 b(...);

float c = a.dot(b);
float d = dot(a, b);

As we mentioned, the dot product between a pair of unit vectors is a value between −1.0 and +1.0) that represents the cosine of the angle between them. A slightly higher level function, vmath::angle, actually returns this angle in radians.

float angle(const vmath::vec3& u, const vmath::vec3& v);

Cross Product

Another useful mathematical operation between two vectors is the cross product, which is also sometimes known as the vector product. The cross product between two vectors is a third vector that is perpendicular to the plane in which the first two vectors lie. The cross product of two vectorsv1 and v2 is defined as

Image

where Image is the unit vector that is perpendicular to both v1 and v2. This means that if you normalize the result of a cross product, you get the normal to the plane. If v1 and v2 are both unit length, and are known to be perpendicular to one another, then you don’t even need to normalize the result as it will also be unit length. Figure 4.3 shows two vectors, v1 and v2, and their cross product v3.

Image

Figure 4.3: A cross product returns a vector perpendicular to its parameters

The cross product of two three-dimensional vectors v1 and v2 can be calculated as

Image

Again, the vmath library has functions that take the cross product of two vectors and return the resulting vector: one member function of the three-component vector classes and one global function.

vec3 a(...);
vec3 b(...);

vec3 c = a.cross(b);
vec3 d = cross(a, b);

Unlike the dot product, the order of the vectors is important. In Figure 4.3, v3 is the result of v2 cross v1. If you were to reverse the order of v1 and v2, the resulting vector v3 would point in the opposite direction. Applications of the cross product are numerous, from finding surface normals of triangles to constructing transformation matrices.

Length of a Vector

As we have already discussed, vectors have a direction and a magnitude. The magnitude of a vector is also known as its length. The magnitude of a three-dimensional vector can be found by using the following equation:

Image

This can be generalized as the square root of the sum of the squares of the components of the vector.1 In only two dimensions, this is simply Pythagoras’s theorem — the square of the hypotenuse is equal to the sum of the squares of the other two sides. This extends to any number of dimensions, and the vmath library includes functions to calculate this for you.

1. The sum of the squares of the components of a vector is also the dot product of a vector with itself.

template <typename T, int len>
static inline T length(const vecN<T,len>& v) { ... }

Reflection and Refraction

Common operations in computer graphics are calculating reflection and refraction vectors. Given an incoming vector Rin and a normal to a surface N, we wish to know the direction in which Rin will be reflected (Rreflect), and given a particular index of refraction η, what direction Rin will be refracted. We show this in Figure 4.4, with the refracted vectors for various values of η shown as Rrefract,η1 through Rrefract,η4.

Image

Figure 4.4: Reflection and refraction

Although Figure 4.4 shows the system in only two dimensions, we are interested in computing this in three dimensions (this is a 3D graphics book, after all). The math for calculating Rreflect is

Rreflect = Rin (2N · Rin)N

and the math for calculating Rrefract for a given value of η is

Image

To get the desired result, both R and N must be unit-length vectors (i.e., they should be normalized before use). The two vmath functions, reflect() and refract(), implement these equations.

Matrices

The matrix is not just a Hollywood movie trilogy, but an exceptionally powerful mathematical tool that greatly simplifies the process of solving one or more equations with variables that have complex relationships to each other. One common example of this, near and dear to the hearts of graphics programmers, is coordinate transformations. For example, if you have a point in space represented by x, y, and z coordinates, and you need to know where that point is if you rotate it some number of degrees around some arbitrary point and orientation, you would use a matrix. Why? Because the new x coordinate depends not only on the old x coordinate and the other rotation parameters, but also on what the y and z coordinates were as well. This kind of dependency between the variables and solution is just the sort of problem that matrices excel at. For fans of Matrixmovies who have a mathematical inclination, the term matrix is indeed an appropriate title.

Mathematically, a matrix is nothing more than a set of numbers arranged in uniform rows and columns — in programming terms, a two-dimensional array. A matrix doesn’t have to be square, but all of the rows must have the same number of elements and all of the columns must have the same number of elements. The following are a selection of matrices. They don’t represent anything in particular but serve only to demonstrate matrix structure. Note that it is also valid for a matrix to have a single column or row. A single row or column of numbers would more simply be called a vector, as discussed previously. In fact, as you will soon see, we can think of some matrices as a table of column vectors.

Image

Matrix and vector are two important terms that you see often in 3D graphics programming literature. When dealing with these quantities, you also see the term scalar. A scalar is just an ordinary single number used to represent magnitude or a specific quantity (you know — a regular old, plain, simple number... like before you cared or had all this jargon added to your vocabulary). Matrices can be multiplied and added together, but they can also be multiplied by vectors and scalar values. Multiplying a point (represented by a vector) by a matrix (representing a transformation) yields a new transformed point (another vector). Matrix transformations are actually not too difficult to understand but can be intimidating at first. Because an understanding of matrix transformations is fundamental to many 3D tasks, you should still make an attempt to become familiar with them. Fortunately, only a little understanding is enough to get you going and doing some pretty incredible things with OpenGL. Over time, and with a little more practice and study, you will master this mathematical tool yourself.

In the meantime, as previously for vectors, you will find a number of useful matrix functions and features available in the vmath library. The source code to this library is also available in the file vmath.h in the book’s source code folder. This 3D math library greatly simplifies many tasks in this chapter and the ones to come. One useful feature of this library is that it lacks incredibly clever and highly optimized code! This makes the library highly portable and easy to understand. You’ll also find it has a very GLSL-like syntax.

In your 3D programming tasks with OpenGL, you will use three sizes of matrix extensively; 2 × 2, 3 × 3, and 4 × 4. The vmath library has matrix data types that match those defined by GLSL, such as

vmath::mat2 m1;
vmath::mat3 m2;
vmath::mat4 m3;

As in GLSL, the matrix classes in vmath define common operators such as addition, subtraction, unary negation, multiplication, and division, along with constructors and relational operators. Again, the matrix classes in vmath are built using templates and include type definitions for single- and double-precision floating-point and signed and unsigned integer matrix types.

Matrix Construction and Operators

OpenGL represents a 4 × 4 matrix not as a two-dimensional array of floating values but as a single array of 16 floating-point values. By default, OpenGL uses a column major or column primary layout for matrices. That means that, for a 4 × 4 matrix, the first four elements represent the first column of the matrix, the next four elements represent the second column, and so on. This approach is different from many math libraries, which do take the two-dimensional array approach. For example, OpenGL prefers the first of these two examples:

GLfloat matrix[16]; // Nice OpenGL-friendly matrix

GLfloat matrix[4][4]; // Not as convenient for OpenGL programmers

OpenGL can use the second variation, but the first is a more efficient representation. The reason for this becomes clear in a moment. These 16 elements represent the 4 × 4 matrix, as shown below. When the array elements traverse down the matrix columns one by one, we call this column-major matrix ordering. In memory, the 4 × 4 approach of the two-dimensional array (the second option in the preceding code) is laid out in a row-major order. In math terms, the two orientations are the transpose of one another.

Image

Representing the above matrix in column-major order in memory produces an array as follows:

static const float A[] =
{
A00, A01, A02, A03, A10, A11, A12, A13,
A20, A21, A22, A23, A30, A31, A32, A33
};

Whereas representing it in row-major order would require a layout such as

static const float A[] =
{
A00, A10, A20, A30, A01, A11, A21, A31,
A20, A21, A22, A23, A30, A31, A32, A33,
};

The real magic lies in the fact that these 16 values can represent a particular position in space and an orientation of the three axes with respect to the viewer. Interpreting these numbers is not hard at all. The four columns each represent a four-element vector.2 To keep things simple for this book, we focus our attention on just the first three elements of the vectors in the first three columns. The fourth column vector contains the x, y, and z values of the transformed coordinate system’s origin.

2. In fact, the vmath library internally represents matrices as arrays of its own vector classes, with each vector holding a column of the matrix.

The first three elements of the first three columns are just directional vectors that represent the orientation (vectors here are used to represent a direction) of the x, y, and z axes in space. For most purposes, these three vectors are always at 90° angles from each other and are usually each of unit length (unless you are also applying a scale or shear). The mathematical term for this (in case you want to impress your friends) is orthonormal when the vectors are unit length, and orthogonal when they are not. Figure 4.5 shows the 4 × 4 transformation matrix with its components highlighted. Notice that the last row of the matrix is all 0s with the exception of the very last element, which is 1.

Image

Figure 4.5: A 4 × 4 matrix representing rotation and translation

The upper left 3 × 3 submatrix of the matrix shown in Figure 4.5 represents a rotation or orientation. The last column of the matrix represents a translation or position.

The most amazing thing is that if you have a 4 × 4 matrix that contains the position and orientation of a coordinate system, and you multiply a vertex expressed in the identity coordinate system (written as a column matrix or vector) by this matrix, the result is a new vertex that has beentransformed to the new coordinate system. This means that any position in space and any desired orientation can be uniquely defined by a 4 × 4 matrix, and if you multiply all of an object’s vertices by this matrix, you transform the entire object to the given location and orientation in space!

Not only this, but if you transform an object’s vertices from one space to another using one matrix, you can then transform those vertices by yet another matrix, transforming them again into another coordinate space. Given matrices A and B and vector v, we know that

A · (B · v)

is equivalent to

(A · B) · v

This is because matrix multiplication is associative. Herein lies the magic — it is possible to stack a whole bunch of transforms together by multiplying the matrices that represent those transforms and using the resulting matrix as a single term in the final product.

The final appearance of your scene or object can depend greatly on the order in which the modeling transformations are applied. This is particularly true of translation and rotation. We can see this as a consequence of the associativity and commutativity rules for matrix multiplication — we can group together sequences of transformations in any way we like as matrix multiplication is associative, but the order that the matrices appear in the multiplication matters because matrix multiplication is not commutative.

Figure 4.6 illustrates a progression of a square rotated first about the z axis and then translated down the newly transformed x axis on the top, and first translating the same square along the x axis and then rotating it around the z axis on the bottom. The difference in the final dispositions of the square occurs because each transformation is performed with respect to the last transformation performed. On the top of Figure 4.6, the square is rotated with respect to the origin first. On the bottom of Figure 4.6, after the square is translated, the rotation is performed around the newly translated origin.

Image

Figure 4.6: Modeling transformations: rotation then translation, and translation then rotation

Understanding Transformations

If you think about it, most 3D graphics aren’t really 3D. We use 3D concepts and terminology to describe what something looks like; then this 3D data is “squished” onto a 2D computer screen. We call the process of squishing 3D data down into 2D data projection. We refer to the projection whenever we want to describe the type of transformation (orthographic or perspective) that occurs during vertex processing, but projection is only one of the types of transformations that occur in OpenGL. Transformations also allow you to rotate objects around; move them about; and even stretch, shrink, and warp them.

Coordinate Spaces in OpenGL

A series of one or more transforms can be represented as a matrix, and multiplication by that matrix effectively moves a vector from one coordinate space to another. There are several coordinate spaces that are commonly used in OpenGL programming. Any number of geometric transformations can occur between the time you specify your vertices and the time they appear on the screen, but the most common are modeling, viewing, and projection. In this section, we examine each of the coordinate spaces commonly used in 3D computer graphics (and summarized inTable 4.1) and the transforms used to move vectors between them.

Image

Table 4.1. Common Coordinate Spaces Used in 3D Graphics

A matrix that moves coordinates from one space to another is normally named for those spaces. For example, a matrix that transforms an object’s vertices from model space into view space is commonly referred to as a model-view matrix.

Object Coordinates

Most of your vertex data will typically begin life in object space, which is also commonly known as model space. In object space, positions of vertices are interpreted as relative to a local origin. Consider a spaceship model. The origin of the model is probably going to be somewhere logical such as the tip of the craft’s nose, at its center of gravity, or where the pilot might sit. In a 3D modeling program, returning to the origin and zooming out sufficiently should show you the whole spaceship. The origin of a model is often the point about which you might rotate it to place it into a new orientation. It wouldn’t make sense to place the origin far outside the model as rotating the object about that point would apply significant translation as well as rotation.

World Coordinates

The next common coordinate space is world space. This is where coordinates are stored relative to a fixed, global origin. To continue the spaceship analogy, this could be the center of a play-field or other fixed body such as a nearby planet. Once in world space, all objects exist in a common frame. Often, this is the space in which lighting and physics calculations are performed.

View Coordinates

An important concept throughout this chapter is that of view coordinates. These are often referred to as camera or eye coordinates. View coordinates are relative to the position of the observer (hence the terms camera and eye), regardless of any transformations that may occur; you can think of them as “absolute” coordinates. Thus, eye coordinates represent a virtual fixed coordinate system that is used as a common frame of reference.

Figure 4.7 shows the view coordinate system from two viewpoints. On the left, the view coordinates are represented as seen by the observer of the scene (that is, perpendicular to the monitor). On the right, the view coordinate system is translated slightly so you can better see the relation of the z axis. Positive x and y are pointed right and up, respectively, from the viewer’s perspective. Positive z travels away from the origin toward the user, and negative z values travel farther away from the viewpoint into the screen. The screen lies at the z coordinate 0.

Image

Figure 4.7: Two perspectives of view coordinates

When you draw in 3D with OpenGL, you use the Cartesian coordinate system. In the absence of any transformations, the system in use is identical to the eye coordinate system just described.

Clip and Normalized Device Space

Clip space is the coordinate space in which OpenGL performs clipping. When your vertex shader writes to gl_Position, this coordinate is considered to be in clip space. This is always a four-dimensional homogenous coordinate. Upon exiting clip space, all four of the vertex’s components are divided through by its w component. Obviously, after this, w becomes equal to 1.0. If w is not 1.0 before this division, the x, y, and z components are effectively scaled by the inverse of w. This allows for effects such as perspective foreshortening and projection. The result of the division is considered to be in normalized device coordinate space (NDC space). Clearly, if the resulting w component of a clip-space coordinate is 1.0, then clip space and NDC space become identical.

Coordinate Transformations

As noted, coordinates may be moved from space to space by multiplying their vector representations by transformation matrices. Transformations are used to manipulate your model and the particular objects within it. These transformations move objects into place, rotate them, and scale them. Figure 4.8 illustrates three of the most common modeling transformations that you will apply to your objects. Figure 4.8 (a) shows translation, in which an object is moved along a given axis. Figure 4.8 (b) shows a rotation, in which an object is rotated about one of the axes. Finally,Figure 4.8 (c) shows the effects of scaling, where the dimensions of the object are increased or decreased by a specified amount. Scaling can occur non-uniformly (the various dimensions can be scaled by different amounts), so you can use scaling to stretch and shrink objects.

Image

Figure 4.8: The modeling transformations

Each of these standard transforms can be represented as a matrix by which you can multiply your vertex coordinates to calculate their position after the transformation. The following subsections discuss the construction of those matrices, both mathematically and using the functions provided in the vmath library.

The Identity Matrix

There are a number of important types of transformation matrices you need to be familiar with before we start trying to use them. The first is the identity matrix. As shown below, the identity matrix contains all zeros except a series of ones that traverse the matrix diagonally. The 4 × 4 identity matrix looks like this:

Image

Multiplying a vertex by the identity matrix is equivalent to multiplying it by one; it does nothing to it.

Image

Objects drawn using the identity matrix are untransformed; they are at the origin (last column), and the x, y, and z axes are defined to be the same as those in eye coordinates.

Obviously, identity matrices for 2 × 2 matrices, 3 × 3 matrices, and matrices of other dimensions exist and simply have ones in their diagonal as you can see above. All identity matrices are square. There are no non-square identity matrices. Any identity matrix is its own transpose. You can make an identity matrix for OpenGL in C++ code like this:

// Using a raw array:
GLfloat m1[] = { 1.0f, 0.0f, 0.0f, 0.0f, // X Column
0.0f, 1.0f, 0.0f, 0.0f, // Y Column
0.0f, 0.0f, 1.0f, 0.0f, // Z Column
0.0f, 0.0f, 0.0f, 1.0f }; // W Column

// Or using the vmath::mat4 constructor:
vmath::mat4 m2( vmath::vec4(1.0f, 0.0f, 0.0f, 0.0f), // X Column
vmath::vec4(0.0f, 1.0f, 0.0f, 0.0f), // Y Column
vmath::vec4(0.0f, 0.0f, 1.0f, 0.0f), // Z Column
vmath::vec4(0.0f, 0.0f, 0.0f, 1.0f) }; // W Column

There are also a shortcut functions in the vmath library which construct identity matrices for you; each matrix class has a static member function which produces an identity matrix of the appropriate dimensions:

vmath::mat2 m2 = vmath::mat2::identity();
vmath::mat3 m3 = vmath::mat3::identity();
vmath::mat4 m4 = vmath::mat4::identity();

If you recall, the very first vertex shader we used in the book back in Chapter 2 was a pass-through shader. It did not transform your vertices at all, but simply passed its hard-coded data on untouched, in the default coordinate system with no matrix applied to the vertices at all. We could have multiplied them all by the identity matrix, but that would have been a wasteful and pointless operation.

Translation Matrices

A translation matrix simply moves your vertices along one or more of the three axes. Figure 4.9 shows, for example, translating a cube up the y axis ten units.

Image

Figure 4.9: A cube translated ten units in the positive y direction

The formulation of a 4 × 4 translation matrix is as follows:

Image

Here, tx, ty, and tz represent the translation in the x, y, and z axes, respectively. Examining the structure of the translation matrix reveals one of the reasons why we need to use four-dimensional homogeneous coordinates to represent positions in 3D graphics. Consider the position vector v, whose w component is 1.0. Multiplying by a translation matrix of the form above yields

Image

As you can see, tx, ty, and tz have been added to the components of v, producing translation. Had the w component of v not been 1.0, then using this matrix for translation would have resulted in tx, ty, and tz being scaled by that value, affecting the output of the transform. In practice, position vectors are almost always encoded using four components with w (the last) being 1.0, whereas direction vectors are either encoded simply using three components or as four components with w being zero. Thus, multiplying a four-component direction vector by a translation matrix doesn’t change it at all. The vmath library contains two functions that will construct a 4 × 4 translation matrix for you from either three separate components or from a 3D vector:

template <typename T>
static inline Tmat4<T> translate(T x, T y, T z) { ... }

template <typename T>
static inline Tmat4<T> translate(const vecN<T,3>& v) { ... }

Rotation Matrices

To rotate an object about one of the three coordinate axes, or indeed any arbitrary vector, you have to devise a rotation matrix. The form of a rotation matrix depends on the axis about which we wish to rotate. To rotate about the x axis, we use

Image

Here, Rx(θ) represents a rotation around the x axis by an angle of θ. Likewise, to rotate around the y or z axes, we can use

Image

It is possible to multiply these three matrices together in order to produce a composite transform to rotate by a given amount around each of the three axes in a single matrix-vector multiplication operation. The matrix to do this is

Image

Here, sψ, sθ, and sφ indicate the sine of ψ, θ, and φ, respectively, and cψ, cθ, and cφ indicate the cosine of ψ, θ, and φ. If this seems like a huge chunk of math, don’t worry — again, a couple of vmath functions come to the rescue:

template <typename T>
static inline Tmat4<T> rotate(T angle_x, T angle_y, T_angle_z);

You can also perform a rotation around an arbitrary axis by specifying x, y, and z values for that vector. To see the axis of rotation, you can just draw a line from the origin to the point represented by (x,y,z). The vmath library also includes code to produce this matrix from an angle-axis representation:

template <typename T>
static inline Tmat4<T> rotate(T angle, T x, T y, T z);

template <typename T>
static inline Tmat4<T> rotate(T angle, const vecN<T,3>& axis);

These two overloads of the vmath::rotate function produce a rotation matrix representing a rotation of angle degrees round the axis specified by x, y, and z for the first variant, or by the vector v for the second. Here, we perform a rotation around the vector specified by the x, y, and zarguments. The angle of rotation is in the counterclockwise direction measured in degrees and specified by the argument angle. In the simplest of cases, the rotation is around only one of the coordinate systems’ cardinal axes (x, y, or z).

The following code, for example, creates a rotation matrix that rotates vertices 45° around an arbitrary axis specified by (1,1,1), as illustrated in Figure 4.10.

vmath::mat4 rotation_matrix = vmath::rotate(45.0, 1.0, 1.0, 1.0);

Image

Figure 4.10: A cube rotated about an arbitrary axis

Notice in this example the use of degrees. This function internally converts degrees to radians because unlike computers, many programmers prefer to think in terms of degrees.

Euler Angles

Euler angles are a set of three angles3 that represent orientation in space. Each angle represents a rotation around one of three orthogonal vectors that define our frame (for example, the x, y and z axes). As you have read, the order that matrix transformations are performed is important as performing some transformations (such as rotations) in different orders will produce different results. This is due to the non-commutative nature of matrix multiplication. Thus, given a set of Euler angles, should you rotate first around the x axis, then around y and then z, or should you perform the rotations in the opposite order, or even do y first? Well, so long as you’re consistent, it doesn’t really matter.

3. In a three-dimensional frame.

Representation of orientations as a set of three angles has some advantages. For example, this type of representation is fairly intuitive, which is important if you plan to hook the angles up to a user interface. Another benefit is that it’s pretty straightforward to interpolate angles, construct a rotation matrix at each point, and see smooth, consistent motion in your final animation. However, Euler angles also come with a serious pitfall — gimbal lock.

Gimbal lock occurs when a rotation by one angle reorients one of the axes to be aligned with another of the axes. Any further rotation around either of the two now colinear axes will result in the same transformation of the model, removing a degree of freedom from the system. Thus, Euler angles are not suitable for concatenating transforms or accumulating rotations.

To avoid this, you will notice that our vmath::rotate functions are able to take an angle by which to rotate and an axis about which to rotate. Of course, stacking three rotations together, one in each of the x, y, and z axes, allows you to use Euler angles if you must, but it is much preferable to use angle-axis representation for rotations, or to use quaternions to represent transformations and convert them to matrices as needed.

Scaling Matrices

Our final “standard” transformation matrix is a scaling matrix. A scaling transform changes the size of your object by expanding or contracting all the vertices along the three axes by the factors specified. A scaling matrix has the form

Image

Here, sx, sy, and sz represent the scaling factors in the x, y, and z dimensions, respectively. Creating a scaling matrix with the vmath library is similar to the method for creating a translation or rotation matrix. Three functions exist to construct this matrix for you:

template <typename T>
static inline Tmat4<T> scale(T x, T y, T z) { ... }

template <typename T>
static inline Tmat4<T> scale(const Tvec3<T>& v) { ... }

template <typename T>
static inline Tmat4<T> scale(T x) { ... }

The first of these scales independently in the x, y, and z axes by the values given in the x, y, and z parameters. The second performs the same function but uses a three-component vector rather than three separate parameters to represent the scale factors. The final function scales by the same amount, x, in all three dimensions. Scaling does not have to be uniform, and you can use it to both stretch and squeeze objects along different directions. For example, a 10 x 10 x 10 cube could be scaled by two in the x and z directions as shown in Figure 4.11.

Image

Figure 4.11: A non-uniform scaling of a cube

Concatenating Transformations

As you have learned, coordinate transforms can be represented by matrices, and transformation of a vector from one space to another comprises a simple matrix-vector multiplication operation. Multiplying by a sequence of matrices can apply a sequence of transformations. It is not necessary to store the intermediate vectors after each matrix-vector multiplication. Rather, it is possible and generally preferable to first multiply together all of the matrices comprising a single set of related transformations to produce a single matrix representing the entire transformation sequence. This matrix can then be used to transform vectors directly from the source to the destination coordinate spaces.

Remember, order is important. When writing code with vmath or in GLSL, you should always multiply a matrix by a vector and read the sequence of transformations in reverse order. For example, consider the following code sequence:

vmath::mat4 translation_matrix = vmath::translate(4.0f, 10.0f, -20.0f);
vmath::mat4 rotation_matrix = vmath::rotate(45.0f,
vmath::vec3(0.0f, 1.0f, 0.0f));
vmath::vec4 input_vertex = vmath::vec4(...);

vmath::vec4 transformed_vertex = translation_matrix *
rotation_matrix *
input_vertex;

This code first rotates a model 45° around the y axis (due to rotation_matrix) and then translates it 4 units in the x axis, 10 units in the y axis and negative 20 units in the z axis (due to translation_matrix). This places the model in a particular orientation and then moves it into position. Reading the sequence of transformations backwards gives the order of operations (rotation then translation). We could rewrite this code as follows:

vmath::mat4 translation_matrix = vmath::translate(4.0f, 10.0f, -20.0f);
vmath::mat4 rotation_matrix = vmath::rotate(45.0f,
vmath::vec3(0.0f, 1.0f, 0.0f));
vmath::mat4 composite_matrix = translation_matrix * rotation_matrix;
vmath::vec4 input_vertex = vmath::vec4(...);

vmath::vec4 transformed_vertex = composite_matrix *
input_vertex;

Here, composite_matrix is formed by multiplying the translation matrix by the rotation matrix, forming a composite that represents the rotation followed by the translation. This matrix can then be used to transform any number of vertices or other vectors. If you have a lot of vertices to transform, this can greatly speed up your calculation. Each vertex now takes only one matrix-vector multiply rather than two.

Care must be taken here. It’s too easy to read (or write) the sequence of transformations left-to-right as you would code. If we were to multiply our translation and rotation matrices together in that order, then in the first transform we would move the origin of the model and the rotation operation would then take place around that new origin, potentially sending our model flying off into space!

Quaternions

A quaternion is a four-dimensional quantity that is similar in some ways to a complex number. It has a real part and three imaginary parts (as compared to a complex number’s one imaginary part). Just as a complex number has an imaginary part i, a quaternion has three imaginary parts, i, j, and k. Mathematically, a quaternion q is represented as

q = (x + yi + zj + wk)

The imaginary parts of the quaternion have properties similar to the imaginary part of a complex number. In particular,

i2 = j2 = k2 = ikj = −1

Also, the product of any two of i, j, and k gives whichever one was not part of that product. Thus,

i = jk

j = ik

k = jk

Given this, we can see that it is possible to multiply two quaternions together as follows:

Image

As with complex numbers, multiplication of quaternions is non-commutative. Addition and subtraction for quaternions is defined as simple vector addition and subtraction, with the terms being added or subtracted on a component-by-component basis. Other functions such as unary negation and magnitude also behave as expected for a four-component vector. Although a quaternion is a four-component entity, it is common to represent a quaternion as a real scalar part and a three-component imaginary vector part. Such representation is often written

Image

Okay, great — but this isn’t the dreaded math chapter, right? This is about computer graphics, OpenGL, and all that fun stuff. Well, here’s where quaternions get really useful. Remember that our rotation functions take an angle and an axis to rotate around? Well, we can represent those two quantities as a quaternion by stuffing the angle in the real part and the axis in the vector part, yielding a quaternion that represents a rotation around any axis.

A sequence of rotations can be represented by a series of quaternions multiplied together, producing a single resulting quaternion that encodes the whole lot in one go. While it’s possible to make a bunch of matrices that represent rotation around the various Cartesian axes and then multiply them all together, that method is susceptible to gimbal lock. If you do the same thing with a sequence of quaternions, gimbal lock cannot occur. For your coding pleasure, vmath includes the vmath::quaterion class that implements most of the functionality described here.

The Model-View Transform

In a simple OpenGL application, one of the most common transformations is to take a model from model space to view space in order to render it. In effect, we move the model first into world space (i.e., place it relative to the world’s origin) and then from there into view space (placing it relative to the viewer). This process establishes the vantage point of the scene. By default, the point of observation in a perspective projection is at the origin (0,0,0) looking down the negative z axis (into the monitor or screen). This point of observation is moved relative to the eye coordinate system to provide a specific vantage point. When the point of observation is located at the origin, as in a perspective projection, objects drawn with positive z values are behind the observer. In an orthographic projection, however, the viewer is assumed to be infinitely far away on the positivez axis and can see everything within the viewing volume.

Because this transform takes vertices from model space (which is also sometimes known as object space) directly into view space and effectively bypasses world space, it is often referred to as the model-view transform and the matrix that encodes this transformation is known as the model-view matrix.

The model transform essentially places objects into world space. Each object is likely to have its own model transform, which will generally consist of a sequence of scale, rotation, and translation operations. The result of multiplying the positions of vertices in model space by the model transform is a set of positions in world space. This transformation is sometimes called the model-world transform.

The view transformation allows you to place the point of observation anywhere you want and look in any direction. Determining the viewing transformation is like placing and pointing a camera at the scene. In the grand scheme of things, you must apply the viewing transformation before any other modeling transformations. The reason is that it appears to move the current working coordinate system with respect to the eye coordinate system. All subsequent transformations then occur based on the newly modified coordinate system. The transform that moves coordinates from world space to view space is sometimes called the world-view transform.

Concatenating the model-world and world-view transform matrices by multiplying them together yields the model-view matrix (i.e., the matrix that takes coordinates from model to view space). There are some advantages to doing this. First, there are likely to be many models in your scene and many vertices in each model. Using a single composite transform to move the model into view space is more efficient than moving it into world space and then into view space as explained earlier. The second advantage has more to do with the numerical accuracy of single-precision floating-point numbers — the world could be huge, and computation performed in world space will have different precision depending on how far the vertices are from the world origin. However, if you perform the same calculations in view space, then precision is dependent on how far vertices are from the viewer, which is probably what you want — a great deal of precision is applied to objects that are close to the viewer at the expense of precision very far from the viewer.

The Lookat Matrix

If you have a vantage point at a known location and a thing you want to look at, you would wish to place your virtual camera at that location and then point it in the right direction. In order to orient the camera correctly, you also need to know which way is up. Otherwise, the camera could spin around its forward axis, and even though it would still be technically be pointing in the right direction, this is almost certainly not what you want. So, given an origin, a point of interest, and a direction that we consider to be up, we would like to construct a sequence of transforms, ideally baked together into a single matrix, that will represent a rotation that will point a camera in the correct direction and a translation that will move the origin to the center of the camera. This matrix is known as a lookat matrix and can be constructed using only the math covered in this chapter so far.

First, we know that subtracting two positions gives us a vector that would move a point from the first position to the second and that normalizing that vector result gives us its directional. So, if we take the coordinates of a point of interest, subtract from that the position of our camera, and then normalize the resulting vector, we have a new vector that represents the direction of view from the camera to the point of interest. We call this the forward vector.

Next, we know that if we take the cross product of two vectors, we will receive a third vector that is orthogonal (which means, at a right angle) to both input vectors. Well, we have two vectors — the forward vector we just calculated and the up vector that represents the direction we consider to be upwards. Taking the cross product of those two vectors results in a third vector that is orthogonal to each of them and points sideways with respect to our camera. We call this the sideways vector. However, the up and forward vectors are not necessarily orthogonal to each other, and we need a third orthogonal vector to construct a rotation matrix. To obtain this vector, we can simply apply the same process again — taking the cross product of the forward vector and our sideways vector to produce a third that is orthogonal to both and represents up with respect to the camera.

These three vectors are of unit length and are all orthogonal to one another, and so they form a set of orthonormal basis vectors and represent our view frame. Given these three vectors, we can construct a rotation matrix that will take a point in the standard Cartesian basis and move it into the basis of our camera. In the following math, e is the eye (or camera) position, p is the point of interest, and u is the up vector. Here we go...

First, construct our forward vector, f:

Image

Next, take the cross of f and u to construct a side vector s:

s = f × u

Now, construct a new up vector, u′ in our camera’s reference:

u′ = s × f

Finally, we construct a rotation matrix representing a reorientation into our newly constructed orthonormal basis:

Image

Right, we’re not quite finished. In order to transform objects into the camera’s frame, not only do we need to orient everything correctly, but we also need to move the origin to the position of the camera. We do this by simply translating the resulting vectors by the negative of the camera’s position. Remember how a translation matrix is simply constructed by placing the offset into that rightmost column of the matrix? Well, we can do that here too:

Image

Finally, we have our lookat matrix, T. If this seems like a lot of steps to you, you’re in luck. There’s a function in the vmath library that will construct the matrix for you:

template <typename T>
static inline Tmat4<T> lookat(const vecN<T,3>& eye,
const vecN<T,3>& center,
const vecN<T,3>& up) { ... }

The matrix produced by the vmath::lookat function can be used as the basis for your camera matrix — the matrix that represents the position and orientation of your camera. In other words, this can be your view matrix.

Projection Transformations

The projection transformation is applied to your vertices after the model-view transformation. This projection actually defines the viewing volume and establishes clipping planes. The clipping planes are plane equations in 3D space that OpenGL uses to determine whether geometry can be seen by the viewer. More specifically, the projection transformation specifies how a finished scene (after all the modeling is done) is projected to the final image on the screen. You will learn more about two types of projections — orthographic and perspective.

In an orthographic, or parallel, projection, all the polygons are drawn on-screen with exactly the relative dimensions specified. Lines and polygons are mapped directly to the 2D screen using parallel lines, which means no matter how far away something is, it is still drawn the same size, just flattened against the screen. This type of projection is typically used for rendering two-dimensional images such as the front, top, and side elevations in blueprints or two-dimensional graphics such as text or on-screen menus.

A perspective projection shows scenes more as they appear in real life instead of as a blueprint. The hallmark of perspective projections is foreshortening, which makes distant objects appear smaller than nearby objects of the same size. Lines in 3D space that might be parallel do not always appear parallel to the viewer. With a railroad track, for instance, the rails are parallel, but using perspective projection, they appear to converge at some distant point. The benefit of perspective projection is that you don’t have to figure out where lines converge or how much smaller distant objects are. All you need to do is specify the scene using the model-view transformations and then apply the perspective projection matrix. Linear algebra works all the magic for you.

Figure 4.12 compares orthographic and perspective projections on two different scenes. As you can see, in the orthographic projection shown on the left, the cubes do not appear to change in size as they move further from the viewer. However, in the perspective projection shown on the right, the cubes get smaller and smaller as they get further from the viewer.

Image

Figure 4.12: A side-by-side example of an orthographic versus perspective projection

Orthographic projections are used most often for 2D drawing purposes where you want an exact correspondence between pixels and drawing units. You might use them for a schematic layout, text, or perhaps a 2D graphing application. You also can use an orthographic projection for 3D renderings when the depth of the rendering has a very small depth in comparison to the distance from the viewpoint. Perspective projections are used for rendering scenes that contain wide-open spaces or objects that need to have foreshortening applied. For the most part, perspective projections are typical for 3D graphics. In fact, looking at a 3D object with an orthographic projection can be somewhat unsettling.

Perspective Matrices

Once your vertices are in view space, we need to get them into clip space, which we do by applying our projection matrix, which may represent a perspective or orthographic projection (or some other projection all together). A commonly used perspective matrix is called a frustum matrix. A frustum matrix is a projection matrix that produces a perspective projection such that clip space takes the shape of a rectangular frustum, which is a truncated rectangular pyramid. Its parameters are the distance to the near and far planes and the world space coordinate of the left, right, top, and bottom clipping planes. It takes the following form:

Image

The vmath function to do this is vmath::frustum:

static inline mat4 frustum(float left,
float right,
float bottom,
float top,
float n,
float f) { ... }

Another common method for construction of a perspective matrix is to directly specify a field of view as an angle (in degrees, perhaps), an aspect ratio (generally derived by dividing the window’s width by its height), and the view-space positions of the near and far planes. This is somewhatsimpler to specify, and only produces symmetric frustra. However, this is almost always what you’ll want. The vmath function to do this is vmath::perspective:

static inline mat4 perspective(float fovy /* in degrees */,
float aspect,
float n,
float f) { ... }

Orthographic Matrices

If you wish to use an orthographic projection for your scene, then you can construct a (somewhat simpler) orthographic projection matrix. An orthographic projection matrix is simply a scaling matrix that linearly maps view-space coordinates into clip-space coordinates. The parameters to construct the orthographic projection matrix are the left, right, top, and bottom coordinates in view space of the bounds of the scene, and the position of the near and far planes. The form of the matrix is

Image

Again, there’s a vmath function to construct this matrix for you, vmath::ortho:

static inline mat4 ortho(float left,
float right,
float bottom,
float top,
float near,
float far) { ... }

Interpolation, Lines, Curves, and Splines

Interpolation is a term used to describe the process of finding values that lie between a set of known points. Consider the equation of the line passing through points A and B:

Image

where P is any point on the line and the Image is the vector from A to B:

Image

We can therefore write this equation as

P = A + t (B – A) or

P = (1 – t) A + tB

It is easy to see that when t is zero, P is equal to A, and when t is one, P is equal to A + B – A, which is simply B. Such a line is shown in Figure 4.13.

Image

Figure 4.13: Finding a point on a line

If t lies between 0.0 and 1.0, then P is going to end up somewhere between A and B. Values of t outside this range will push P off the ends of the line. You should be able to see that by smoothly varying t, we can move point P from A to B and back. This is known as linear interpolation. The values of A and B (and therefore P) can be have any number of dimensions. For example, they could be scalar values; two-dimensional values such as points on a graph; three-dimensional values such as coordinates in 3D space, colors, and so on; or even higher dimension quantities such as matrices, arrays, or even whole images. In many cases, linear interpolation doesn’t make much sense (for example, linearly interpolating between two matrices generally doesn’t produce a meaningful result), but angles, positions, and other coordinates can normally be interpolated safely.

Linear interpolation is such a common operation in graphics that GLSL includes a built-in function specifically for this purpose, mix:

vec4 mix(vec4 A, vec4 B, float t);

The mix function comes in several versions taking various different dimensionalities of vectors or scalars as the A and B inputs and taking scalars or matching vectors for t.

Curves

If moving everything along a straight line between two points is all we wanted to do, then this would be enough. However, in the real world, objects move in smooth curves and accelerate and decelerate smoothly. A curve can be represented by three or more control points. For most curves, there are more than three control points, two of which form the end-points and the others define the shape of the curve. Consider the simple curve shown in Figure 4.14.

Image

Figure 4.14: A simple Bézier curve

The curve shown in Figure 4.14 has three control points, A, B, and C. A and C are the end points of the curve and B defines the shape of the curve. If we join points A and B with one line and points B and C together with another line, then we can interpolate along the two lines using a simple linear interpolation to find a new pair of points, D and E. Now, given these two points, we can again join them with yet another line and interpolate along it to find a new point, P. As we vary our interpolation parameter, t, point P will move in a smooth curved path from A to D. Expressed mathematically, this is

D = A + t(B – A)
E = B + t(C – B)
P = D + t(E – D)

Substituting for D and E and doing a little crunching, we come up with the following:

P = A + t(B – A) + t((B + (t(C – B))) (A + t(B – A))))

P = A + t(B – A) + tB + t2(C – B) – tA – t2(B – A)

P = A + t(B – A + B – A) + t2(C – B – B + A)

P = A + 2t(B – A) + t2(C – 2B + A)

You should recognize this as a quadratic equation in t. The curve that it describes is known as a quadratic Bézier curve. We can actually implement this very easily in GLSL using the mix function as all we’re doing is linearly interpolating (mixing) the results of two previous interpolations.

vec4 quadratic_bezier(vec4 A, vec4 B, vec4 C, float t)
{
vec4 D = mix(A, B, t); // D = A + t(B - A)
vec4 E = mix(B, C, t); // E = B + t(C - B)

vec4 P = mix(D, E, t); // P = D + t(E - D)

return P;
}

By adding a fourth control point as shown in Figure 4.15, we can increase the order by one and produce a cubic Bézier curve.

Image

Figure 4.15: A cubic Bézier curve

We now have four control points, A, B, C, and D. The process for constructing the curve is similar to the quadratic Bézier curve. We form a first line from A to B, a second from B to C, and a third from C to D. Interpolating along each of the three lines gives rise to three new points, E, F, andG. Using these three points, we form two more lines, one from E to F and another from F to G, interpolating along which gives rise to points H and I, between which we can interpolate to find our final point, P. Therefore, we have the equations shown below.

E = A + t(B – A)
F = B + t(C – B)
G = C + t(D – C)

H = E + t(F – E)
I = F + t(G – F)

P = H + t(I – H)

If you think the equations above look familiar, you’d be right — our points E, F, and G form a quadratic Bézier curve that we use to interpolate to our final point P. If we were to substitute the equations for E, F, and G into the equations for H and I, substitute those into the equation for P, and crunch through the expansions, we would be left with a cubic equation with terms in t3 — hence the name cubic Bézier curve. Again, we can implement this simply and efficiently in terms of linear interpolations in GLSL using the mix function:

vec4 cubic_bezier(vec4 A, vec4 B, vec4 C, vec4 D, float t)
{
vec4 E = mix(A, B, t); // E = A + t(B - A)
vec4 F = mix(B, C, t); // F = B + t(C - B)
vec4 G = mix(C, D, t); // G = C + t(D - C)

vec4 H = mix(E, F, t); // H = E + t(F - E)
vec4 I = mix(F, G, t); // I = F + t(G - F)

vec4 P = mix(H, I, t); // P = H + t(I - H)

return P;
}

Just as the structure of the equations for a cubic Bézier curve “includes” the equations for a quadratic curve, so too does the code to implement them. In fact, we can layer these curves on top of each other, using the code for one to build the next:

vec4 cubic_bezier(vec4 A, vec4 B, vec4 C, vec4 D, float t)

{
vec4 E = mix(A, B, t); // E = A + t(B - A)
vec4 F = mix(B, C, t); // F = B + t(C - B)
vec4 G = mix(C, D, t); // G = C + t(D - C)

return quadratic_bezier(E, F, G, t);
}

Now that we see this pattern, we can take it further and produce even higher order curves. For example, a quintic Bézier curve (one with five control points) can be implemented as

vec4 quintic_bezier(vec4 A, vec4 B, vec4 C, vec4 D, vec4 E, float t)
{
vec4 F = mix(A, B, t); // F = A + t(B - A)
vec4 G = mix(B, C, t); // G = B + t(C - B)
vec4 H = mix(C, D, t); // H = C + t(D - C)
vec4 I = mix(D, E, t); // I = D + t(E - D)

return cubic_bezier(F, G, H, I, t);
}

This layering could theoretically be applied over and over for any number of control points. However, in practice, curves with more than four control points are not commonly used. Rather, we use splines.

Splines

A spline is effectively a long curve made up of several smaller curves (such as Béziers) that locally define their shape. At least the control points representing the ends of the curves are shared between segments,4 and often one or more of the interior control points are either shared or linked in some way between adjacent segments. Any number of curves can be joined together in this way allowing arbitrarily long paths to be formed. Take a look at the curve shown in Figure 4.16.

4. This is what sticks the curves together to form a spline. These control points are known as welds, and the control points in between are sometimes referred to as knots.

Image

Figure 4.16: A cubic Bézier spline

In Figure 4.16, the curve is defined by ten control points, A through J, which form three cubic Bézier curves. The first is defined by A, B, C, and D; the second shares D and further uses E, F, and G; with the third sharing G and adding H, I, and J. This type of spline is known as a cubic Bézier spline because it is constructed from a sequence of cubic Bézier curves. This is also known as a cubic B-spline — a term that may be familiar to anyone who has read much about graphics in the past.

To interpolate point P along the spline, we simply divide it into three regions, allowing t to range from 0.0 to 3.0. Between 0.0 and 1.0, we interpolate along the first curve, moving from A to D. Between 1.0 and 2.0, we interpolate along the second curve, moving from D to G, and when t is between 2.0 and 3.0, we interpolate along the final curve between G and J.

Thus, the integer part of t determines the curve segment along which we are interpolating, and the fractional part of t is used to interpolate along that segment. Of course, we can scale t as we wish. For example, if we take a value between 0.0 and 1.0 and multiply it by the number of segments in the curve, we can continue to use our original range of values for t regardless of the number of control points in a curve.

The following code will interpolate a vector along a cubic Bézier spline with ten control points (and thus three segments):

vec4 cubic_bspline_10(vec4 CP[10], float t)
{
float f = t * 3.0;
int i = int(floor(f));
float s = fract(t);

if (t <= 0.0)
return CP[0];

if (t >= 1.0)
return CP[9];

vec4 A = CP[i * 3];
vec4 B = CP[i * 3 + 1];
vec4 C = CP[i * 3 + 2];
vec4 D = CP[i * 3 + 3];

return cubic_bezier(A, B, C, D, s);
}

If we use a spline to determine the position or orientation of an object, we will find that we must be very careful about our choice of control point locations in order to keep motion smooth and fluid. The rate of change in the value of our interpolated point P (i.e., its velocity) is the differential of the equation of the curve with respect to t. If this function is discontinuous, then P will suddenly change direction and our objects will appear to jump around. Furthermore, the rate of change of P’s velocity (its acceleration) is the second-order derivative of the spline equation with respect to t. If the acceleration is not smooth, then P will appear to suddenly speed up or slow down.

A function that has a continuous first derivative is known as C1 continuous, and likewise a curve that has a continuous second derivative is known as C2 continuous. Bézier curve segments are both C1 and C2 continuous, but to ensure that we maintain continuity over the welds of a spline, we need to ensure that each segment starts off where the previous ended in position, direction of movement, and rate of change. Well, a rate of travel in a particular direction is simply a velocity. So, rather than assigning arbitrary control points to our spline, we can assign a velocity at each weld. If the same velocity of the curve at each weld is used in the computation of the curve segments on either side of that weld, then we will have a spline function that is both C1 and C2 continuous.

This should make sense if you take another look at Figure 4.16 — there are no kinks, and the curve is nice and smooth through the welds (points D and G). Now look at the control points on either side of the welds. For example, take points C and E, which surround D. C and E form a straight line, and D lies right in the middle of it. In fact, we can call the line segment from D to E the velocity at D, or Image. Given the position of point D (the weld) and velocity of the curve Image at D, then C and E can be calculated as

Image

Likewise, if Image represents the velocity at A, B can be calculated as

Image

Thus, you should be able to see that given the positions and velocities at the welds of a cubic B-spline, we can dispense with all of the other control points and compute them on the fly as we evaluate each of the control points. A cubic B-spline that is represented this way (as a set of weld positions and velocities) is known as a cubic Hermite spline, or sometimes simply a cspline. The cspline is an extremely useful tool for producing smooth and natural animations.

Summary

In this chapter, you learned some mathematical concepts crucial to using OpenGL for the creation of 3D scenes. Even if you can’t juggle matrices in your head, you now know what matrices are and how they are used to perform the various transformations. You also learned how to construct and manipulate the matrices that represent the viewer and viewport properties. You should now understand how to place your objects in the scene and determine how they are viewed on-screen. This chapter also introduced the powerful concept of a frame of reference, and you saw how easy it is to manipulate frames and convert them into transformations.

Finally, we introduced the use of the vmath library that accompanies this book. This library is written entirely in portable C++ and provides you with a handy toolkit of miscellaneous math and helper routines that can be used along with OpenGL.

Surprisingly, we did not cover a single new OpenGL function call in this entire chapter. Yes, this was the math chapter, and you might not have even noticed if you think math is just about formulas and calculations. Vectors, matrices, and the application thereof are absolutely crucial to being able to use OpenGL to render 3D objects and worlds. However, it’s important to note that OpenGL doesn’t impose any particular math convention upon you and does not itself provide any math functionality. Even if you use a different 3D math library, or even roll your own, you will still find yourself following the patterns laid out in this chapter for manipulating your geometry and 3D worlds. Now, go ahead and start making some!