Programming 3D Applications with HTML5 and WebGL (2013)
Part I. Foundations
Chapter 1. Introduction
We live in a 3D world. People move, think, and experience in three dimensions.
Much of our media is also 3D—though it is usually presented on flat screens. Animated films are created from computer-generated 3D images. Online map services allow us to explore our destination, virtually, in a 3D environment. Most video games, whether running on dedicated consoles or mobile phones, are rendered in 3D. Even the news has gone 3D: the sight of a CNN analyst meandering through a virtual set, comically awkward a few years ago, has become an accepted part of the broadcast milieu as cable channels vie for increasing attention in a 24-hour news cycle.
3D graphics is nearly as old as the computer itself, tracing its roots back to the 1960s. It has been used in applications spanning engineering, education, training, architecture, finance, sales and marketing, gaming, and entertainment. Historically, 3D applications have relied on high-end computer systems and expensive software. But that has changed in the last decade. 3D processing hardware is now shipped in every computer and mobile device, with the consumer smartphone of today possessing more graphics power than the professional workstation of 15 years ago. More importantly, the software required to render 3D is now not only universally accessible, it’s also free. It’s called a web browser.
Figure 1-1 shows an excerpt from 100,000 Stars, a browser-based 3D flythrough simulation of our stellar neighbors in the Milky Way. Using the mouse, you can rotate about the galactic plane and zoom in on a star of interest. Stars are represented with renderings that approximate their apparent magnitude and color. Each star is labeled with its common name; when you mouse over the label, it highlights. Click on the label, and an overlay appears displaying the Wikipedia entry for that star. Click on a hyperlink in the overlay text, and the browser will launch that link in a new tab. 100,000 Stars is a stunningly produced interactive experience featuring beautiful renderings, pulsing animations, a majestic soundtrack, and an artfully integrated 2D user interface.
Figure 1-1. The 100,000 Stars project by Google; image courtesy Google, Inc.
100,000 Stars was created as an experiment by Google’s Data Arts team to demonstrate the rich capabilities of the Chrome browser. While the application is experimental, the technologies underlying it are not: it was built with HTML5 features available today in most browsers. The galaxy and stars are rendered in real time via WebGL, the new standard for hardware-accelerated 3D web graphics; the labels are placed relative to their stars through 3D transforms now available in CSS3; and the overlays blend seamlessly with the 3D content because browsers combine, orcomposite, all page elements into a unified presentation.
Just a few years ago, an experience like 100,000 Stars could only have been achieved in a native client application requiring a large download and installation, produced by developers using complex tools in a time-consuming and expensive development process. Today, it can be built with a browser, free and open source tools, and a standard web technology stack. What’s more, you can instantly access updates by simply reloading the page, load information from anywhere on the Web via URL, and click hyperlinks from the 3D to access more information.
This book is about taking advantage of the awesome power of the modern browser to create a new breed of connected, visual application. Some of this breed will look a lot like its ancestors, essentially ports of traditional 3D products, refactored to reach new customers and reduce costs. But far more exciting are the possibilities for novel consumer applications in advertising, product marketing, customer support, education, training, tourism, gaming, and entertainment—to name a few. 3D brings a new dimension to the interactive experience; combined with web technology, the third dimension is now accessible to everyone on the planet.
100,000 Stars is a tour de force in interactive media development. Michael Chang, one of the creators, wrote a great case study of the project. To see what went into its development, go to http://www.html5rocks.com/en/tutorials/casestudies/100000stars/.
HTML5: A New Visual Medium
HTML has come a long way since the days of static pages, forms, and the Submit button. In the early 2000s, browsers introduced rich interaction by allowing portions of a page to be changed dynamically via Ajax techniques. Still, the ways in which pages could be changed with Ajax were constrained by the graphical features of HTML and CSS. If a developer wished to go beyond those limits, he had to use media plugins such as Flash and QuickTime.
§ CSS3 3D transforms, transitions, and custom filters for advanced page effects. CSS has evolved over the past several years to include hardware-accelerated 3D rendering and animation features accessible through style sheet language.
Each of these features has its strengths, weaknesses, and technical tradeoffs, and each has a role to play in delivering interactive and visually compelling 3D experiences. Which ones you use can depend on several factors—what you are trying to build, which platforms you have to support, performance concerns, and so on. Let’s say, for example, that you are creating a first-person shooter game and you need the highest-quality graphics. This will be hard to pull off without using WebGL’s extensive access to the rendering hardware. On the other hand, maybe you are developing a fancy channel tuner interface for a video website, including live video thumbnails, rotation effects on rollovers, and dissolve transitions between clips; in that case, CSS3 might have everything you need to deliver a killer experience.
And one standard to rule them all…
What most web developers think of informally as HTML5 is actually a collection of technologies and standards. Some of these are already fully ratified by the World Wide Web Consortium (W3C) and implemented in all browsers. Others are less mature as standards, but nevertheless widely supported. Still others, such as WebGL, are mature and stable standards, but not controlled by the W3C.
The Browser as Platform
HTML5 brings rich graphics to the Web; this would not amount to much without the presence of other essential browser improvements. In particular, a handful of advances have paved the way for true, rich Internet application development with HTML5:
The browser is responsible for combining, or compositing, the various elements on the page quickly and without unwanted visual artifacts. As content has become more dynamic, browsers have made huge improvements in compositing, including using the 3D hardware-rendering pipeline for all visual elements, both 2D and 3D.
The function requestAnimationFrame() was introduced as an improvement to using setInterval() and setTimeout() to drive animations. This new method can greatly enhance performance and eliminate visual artifacts by allowing the developer to redraw the contents of canvas elements in the same pass that the browser redraws built-in page elements.
HTML5 browsers also include features for multithreaded programming (Web Workers), full-duplex TCP/IP networking (WebSockets), local data storage, and more that developers can use to deliver world-class application functionality. These features—taken together with WebGL, CSS3 3D, and the Canvas element—represent a revolutionary new platform for delivering connected visual applications on any computer or device.
Figure 1-2. Epic Citadel demonstration running in Firefox: 60 fps browser gaming powered by WebGL and asm.js; image courtesy Epic Games
As of this writing, 3D feature coverage is not complete across the various browsers. Also, each browser supports a slightly different subset. We will explore these issues in detail in subsequent chapters, but here are the highlights:
§ WebGL is supported in all desktop browsers. Microsoft introduced WebGL support in Internet Explorer version 11 in late 2013. While the implementation lags behind the other desktop browsers, Microsoft will likely catch up quickly.
§ WebGL is supported in nearly all mobile browsers: mobile Chrome (Android), mobile Firefox (Android and Firefox OS), Amazon Silk (Kindle Fire HDX), Intel’s new Tizen operating system, and BlackBerry 10. WebGL is supported in a limited fashion in mobile Safari (in the iAds framework only).
§ CSS 3D transforms are supported in all browsers and mobile platforms. CSS Custom Filters are supported only experimentally in desktop Chrome, Safari, mobile Safari, and BlackBerry 10—not in IE or Firefox.
Clearly, this is not an optimal situation, but it’s the sort of thing that comes with the web application development territory. Cross-browser support has always been notoriously difficult; with the explosion of features in HTML5 and the proliferation of devices and operating systems, it hasn’t gotten any better. The only consolation is that the alternative is far worse: native applications are even harder to build, test, deploy, and port. Oh well…such is the life of a web developer in the 21st century.
With all these standards, we should be approaching a state where we have to write our code only once. However, as we have become painfully aware, the mantra “write once—run anywhere” has been replaced by the lament “write once—debug everywhere.”
3D Graphics Basics
This section provides a basic introduction to 3D graphics core concepts and terminology. Developers experienced with 2D Canvas drawing and animation may find some of the ideas new. If so, please take time to become familiar with them, as we will use them throughout the book. If you already have experience with 3D and/or OpenGL development, feel free to skip to the next chapter.
What Is 3D?
Given that you picked up this book, chances are you have at least an informal idea about what I am talking about when I use the term 3D graphics. But to make sure you are clear, we are going to get formal and examine a definition. Here is the Wikipedia entry:
3D computer graphics (in contrast to 2D computer graphics) are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of performing calculations and rendering 2D images. Such images may be stored for viewing later or displayed in real-time.
Let’s break this down into its components: 1) the data is represented in a 3D coordinate system; 2) it is ultimately drawn (rendered) as a 2D image (for example, on your computer monitor); and 3) it can be displayed in real time: when the 3D data changes as it is being animated or manipulated by the user, the rendered image is updated without a perceivable delay. This last part is key for creating interactive applications. In fact, it is so important that it has spawned a multibillion-dollar industry dedicated to specialized graphics hardware supporting real-time 3D rendering, with several companies you have probably heard of such as NVIDIA, ATI, and Qualcomm leading the charge.
As important as what this definition says is what it doesn’t say: 3D graphics does not require special input hardware like trackballs and joysticks—though those can greatly enhance a 3D experience. Nor does it require custom display hardware: no stereo glasses required, no OmniMax theater tickets as the price of entry. 3D graphics are most commonly rendered on a flat, 2D display. This is not to say that 3D can’t be displayed in stereo and seen with glasses or on a stereo TV—simply that it’s not a requirement.
3D programming requires new skills and knowledge beyond that of the typical web developer. However, armed with a little starter knowledge and the right tools, we can get going fairly quickly. The remainder of this chapter is devoted to understanding basic 3D programming concepts that will be used throughout the book. It is by no means exhaustive—entire books are devoted to learning the subject in detail—but it should be enough to get started. If you already have experience with 3D programming, feel free to move on to Chapter 2.
3D Coordinate Systems
If you are familiar with 2D Cartesian coordinate systems such as the window coordinates of an HTML document, you know about x and y values. These 2D coordinates define where <div> tags are located on a page, or where the virtual pen or brush draws in the HTML Canvas element. Similarly, 3D drawing takes place (not surprisingly) in a 3D coordinate system, where the additional coordinate, z, describes depth (i.e., how far into or out of the screen an object is drawn). The coordinate systems we will work with in this book are arranged as depicted in Figure 1-3, with xrunning horizontally (left to right), y running vertically, and positive z coming out of the screen. If you are already comfortable with the concept of the 2D coordinate system, the transition to a 3D coordinate system should be straightforward.
Note that WebGL defines positive y as going from the bottom to the top of the window, while the 2D Canvas API and CSS transforms define positive y as going down. This is unfortunate, but it reflects the different heritages of the two technologies: WebGL is based on long-lived graphics standards that use the y-up convention, while Canvas and CSS are based on the HTMLcoordinate y-down convention—itself a descendant of time-worn, window-system coordinate schemes. If you end up working in both technologies on a project, you will have to keep this distinction straight. But it could be worse: z could also be reversed! Fortunately, it’s not.
Figure 1-3. A 3D coordinate system; Creative Commons Attribution-Share Alike 3.0 unported license
Meshes, Polygons, and Vertices
While there are several ways to draw 3D graphics, by far the most common is to use a mesh. A mesh is an object composed of one or more polygonal shapes, constructed out of vertices (x, y, z triples) defining coordinate positions in 3D space. The polygons most typically used in meshes are triangles (groups of three vertices) and quads (groups of four vertices). 3D meshes are often referred to as models.
Figure 1-4 illustrates a 3D mesh. The dark lines outline the quads that compose the mesh, defining the shape of the face. (You would not see these lines in the final rendered image; they are included for reference.) The x, y, and z components of the mesh’s vertices define the shape only; surface properties of the mesh, such as the color and shading, are defined through additional attributes, as we will discuss shortly.
Figure 1-4. A 3D mesh; Creative Commons Attribution-Share Alike 3.0 unported license
Materials, Textures, and Lights
You define the surface of a mesh using additional attributes beyond the x, y, and z vertex positions. Surface attributes can be as simple as a single solid color, or they can be complex, comprising several pieces of information that define, for example, how light reflects off the object or how shiny the object looks. You can also represent surface information using one or more bitmaps, known as texture maps (or simply textures). Textures can define the literal surface look (such as an image printed on a T-shirt), or they can be combined with other textures to achieve sophisticated effects such as bumpiness or iridescence. In most graphics systems, the surface properties of a mesh are referred to collectively as materials. Materials typically rely on the presence of one or more lights, which (as you may have guessed) define how a scene is illuminated.
The head in Figure 1-4 has a material with a purple color and shading defined by a light source emanating from the left of the model. Note the shadows on the right side of the face.
Transforms and Matrices
3D meshes are defined by the positions of their vertices. It would get really tedious to change a mesh’s vertex positions every time you want to move it to a different part of the view, especially if the mesh were continually animating. For this reason, most 3D systems support transforms, operations that allow you to move the mesh by a relative amount without having to loop through every vertex, explicitly changing its position. Transforms allow you to scale, rotate, and translate (move) a rendered mesh without actually changing any values in its vertices.
Figure 1-5 depicts 3D transforms in action. In this scene we see three cubes. Each of these objects is a cube mesh that contains the same values for its vertices. To move, rotate, or scale the mesh, we do not modify the vertices; rather, we apply transforms. The red cube on the left has been translated 4 units to the left (−4 on the x-axis), and rotated about its x- and y-axes. (Note that rotation values are specified in radians—units that will be discussed in more detail in Chapter 4.) The blue cube on the right has been translated 4 units to the right, and scaled to be 1.5 times larger in all three dimensions. The green cube in the center has not been transformed.
Figure 1-5. 3D transforms: translation, rotation, and scale
A 3D transform is typically represented by a transformation matrix, a mathematical entity containing an array of values used to compute the transformed positions of vertices. Most WebGL transforms use a 4×4 matrix—that is, an array of 16 numbers organized into 4 rows and 4 columns.Figure 1-6 shows the layout of a 4×4 matrix. The translation is stored in elements m12, m13, and m14, corresponding to the x, y, and z translation values. x, y, and z scale values are stored in elements m0, m5, and m10 (known as the diagonal of the matrix). Rotation values are stored in the elements m1 and m2 (x-axis), m4 and m6 (y-axis), and m8 and m9 (z-axis). Multiplying a 3D vector by this matrix results in the transformed value.
Figure 1-6. A 4×4 transformation matrix; adapted with permission
If you are a linear algebra geek like I am, you probably feel comfortable with this idea. If not, please don’t break into a cold sweat. The toolkits used to develop the examples in this book allow us to treat matrices like black boxes: we just say translate, rotate, or scale, and the right thinghappens.
Cameras, Perspective, Viewports, and Projections
Every rendered scene requires a point of view from which the user will be viewing it. 3D systems typically use a camera, an object that defines where (relative to the scene) the user is positioned and oriented, as well as other real-world camera properties such as the size of the field of view, which defines perspective (i.e., objects farther away appearing smaller). The camera’s properties combine to deliver the final rendered image of a 3D scene into a 2D viewport defined by the window or canvas.
Cameras are almost always represented via a couple of matrices. The first matrix defines the position and orientation of the camera, much like the matrix used for transforms (as just discussed). The second matrix is a specialized one that represents the translation from the 3D coordinates of the camera into the 2D drawing space of the viewport. It is called the projection matrix. I know: more math. But the details of camera matrices are nicely hidden in most tools, so you usually can just point, shoot, and render.
Figure 1-7 depicts the core concepts of the camera, viewport, and projection. At the lower left we see an icon of an eye; this represents the location of the camera. The red vector pointing to the right (in this diagram, labeled as the x-axis) represents the direction in which the camera is pointing. The blue cubes are the objects in the 3D scene. The green and red rectangles are, respectively, the near and far clipping planes. These two planes define the boundaries of a subset of the 3D space, known as the view volume or view frustum. Only objects within the view volume are actually rendered to the screen. The near clipping plane is equivalent to the viewport, where we will see the final rendered image.
Figure 1-7. Camera, viewport, and projection; adapted with permission
Cameras are extremely powerful, as they ultimately define the viewer’s relationship to a 3D scene and provide a sense of realism. They also provide another weapon in the animator’s arsenal: by dynamically moving around the camera, you can create cinematic effects and control the narrative experience.
In order to render the final image for a mesh, a developer must define exactly how vertices, transforms, materials, lights, and the camera interact with one another to create that image. The developer does this using shaders. A shader (also known as a programmable shader) is a chunk of program code that implements algorithms to get the pixels for a mesh onto the screen. The graphics hardware understands vertices, textures, and little else; it has no concept of material, light, transform, or camera. Those high-level structures are interpreted by the shader program. Shaders are typically defined in a high-level C-like language and compiled into code that can be used by the graphics-processing unit (GPU).
All modern computers and devices come equipped with a graphics-processing unit, a separate processor from the CPU that is dedicated to rendering 3D graphics. The majority of the 3D programming techniques discussed in this book assume the presence of a GPU.
Shaders put amazing power at the programmer’s fingertips: full control over every pixel, each time the image is rendered. Shaders power the incredible visuals we see in Hollywood special effects, “CG” animated films, and real-time rendering in today’s video games. With shader support now in web browsers, we can get the same production value as a top video game in our WebGL applications, as well as fine control over how CSS elements are presented and animated on a page.
Figure 1-8 shows a WebGL water simulation rendered by a programmable shader. The rippling water and dancing lights are incredibly realistic, and you can interact with the scene while it is simulating, all in real time. Reminder: this is running in a web browser!
Figure 1-8. WebGL water simulation using programmable shaders, by Evan Wallace; reproduced with permission
Shader-based effects aren’t limited to WebGL; they can also be applied to DOM elements through an experimental technology called CSS Custom Filters. We will discuss this feature in Chapter 6.
Here are a few subtle things to note about shaders relative to the technologies we will cover in the book:
§ WebGL and CSS Custom Filters both use shaders defined in the OpenGL ES Shader Language (called GLSL ES). There are some differences between the shaders you write for WebGL versus CSS, but the base languages are identical.
§ WebGL requires the developer to supply shaders in order for objects to be drawn. If no shader is supplied, or there is an error in compiling or loading the shader, nothing will render on the screen.
§ With CSS3 Filters, shaders are optional. When shaders are used with a CSS3 Filter, it is referred to as a custom filter.
§ The 2D Canvas API does not support programmable shaders. If you plan to employ 2D Canvas drawing as a fallback to WebGL rendering, you will need to accommodate for this in your rendering code. More on this in Chapter 7.
Shaders represent a bit of a learning curve, with new concepts, another programming language, and great care required. If you find this daunting, don’t worry. There are many popular open source libraries and tools to choose from that hide the gory details of shaders. You may even be able to get through your entire 3D programming career without ever writing a line of GLSL code—though I recommend you try it anyway, just to be able to say you did.
Those are the basics of 3D graphics. Each of the technologies in the book treats the details a little differently, but the concepts translate fairly well across each technology. In the next several chapters we are going to dive deep into the details of creating and animating 3D content with WebGL, CSS3, and Canvas 2D.