Chart components - The pillars of information visualization - D3.js in Action (2015)

D3.js in Action (2015)

Part 2. The pillars of information visualization

The next five chapters provide an exhaustive look into the layouts, components, behaviors, and controls that D3 provides to create the varieties of data visualization you’ve seen all over the web. In chapter 4 you’ll learn how to create line and area charts, deploying D3 axes to make them readable, as well as how to build complex multipart boxplots that encode several different data variables at the same time. Chapter 5 walks through seven different D3 layouts, from the simple pie chart to the exotic Sankey diagram, and shows you how to implement each layout in a few different ways. Chapter 6 focuses entirely on representing network structures, showing you how to visualize them using arc diagrams, adjacency matrices, and force-directed layouts, and introduces several new techniques like SVG markers. Chapter 7 also focuses on a single domain, this time geospatial data, and demonstrates how to leverage D3’s incredible geospatial functionality to build different kinds of maps. Chapter 8 shifts to creating more traditional DOM elements using D3 data-binding that result in a spreadsheet and simple image gallery. Whether you’re interested in all of these areas or diving deeply into just one, part 2 provides you with the tools to represent any kind of data using advanced data visualization not available in standard charting libraries and applications.

Chapter 4. Chart components

This chapter covers

· Creating and formatting axis components

· Using line and area generators for charts

· Creating complex shapes consisting of multiple types of SVG elements

D3 provides an enormous library of examples of charts, and GitHub is also packed with implementations. It’s easy to format your data to match the existing data used in an implementation and, voilà, you have a chart. Likewise, D3 includes layouts that allow you to create complex data visualizations from a properly formatted dataset. But before you get started with default layouts—which allow you to create basic charts like pie charts, as well as more exotic charts—you should first understand the basics of creating the elements that typically make up a chart and in the process produce charts like those seen in figure 4.1. This chapter focuses on widely used pieces of charts created with D3, such as a labeled axis or a line. It also touches on the formatting, data modeling, and analytical methods most closely tied to creating charts.

Figure 4.1. The charts we’ll create in this chapter using D3 generators and components. From left to right: a line chart, a boxplot, and a streamgraph.

Obviously, this isn’t your first exposure to charts, because you created a scatterplot and bar chart in chapter 2. This chapter introduces you to components and generators. A D3 component, like an axis, is a function for drawing all the graphical elements necessary for an axis. A generator, liked3.svg.line(), lets you draw a straight or curved line across many points. The chapter begins by showing you how to add axes to scatterplots as well as create line charts, but before the end you’ll create an exotic yet simple chart: the streamgraph. By understanding how D3 generators and components work, you’ll be able do more than re-create the charts that other people have made and posted online (many of which they’re just re-creating from somewhere else).

A chart (and notice here that I don’t use the term graph because that’s a synonym for network) refers to any flat layout of data in a graphical manner. The datapoints, which can be individual values or objects in arrays, may contain categorical, quantitative, topological, or unstructured data. In this chapter we’ll use several datasets to create the charts shown in figure 4.1. Although it may seem more useful to use a single dataset for the various charts, as the old saying goes, “Horses for courses,” which is to say that different charts are more suitable to different kinds of datasets, as you’ll see in this chapter.

4.1. General charting principles

All charts consist of several graphical elements that are drawn or derived from the dataset being represented. These graphical elements may be graphical primitives, like circles or rectangles, or more-complex, multipart, graphical objects like the boxplots we’ll look at later in the chapter. Or they may be supplemental pieces like axes and labels. Although you use the same general processes you explored in previous chapters to create any of these elements in D3, it’s important to differentiate between the methods available in D3 to create graphics for charts.

You’ve learned how to directly create simple and complex elements with data-binding. You’ve also learned how to measure your data and transform it for display. Along with these two types of functions, D3 functionality can be placed into three broader categories: generators, components, and layouts, which are shown in figure 4.2 along with a general overview of how they’re used.

Figure 4.2. The three main types of functions found in D3 can be classified as generators, components, and layouts. You’ll see components and generators in this chapter and layouts in the next chapter.

4.1.1. Generators

D3 generators consist of functions that take data and return the necessary SVG drawing code to create a graphical object based on that data. For instance, if you have an array of points and you want to draw a line from one point to another, or turn it into a polygon or an area, a few D3 functions can help you with this process. These generators simplify the process of creating a complex SVG <path> by abstracting the process needed to write a <path> d attribute. In this chapter, we’ll look at d3.svg.line and d3.svg.area, and in the next chapter you’ll seed3.svg.arc, which is used to create the pie pieces of pie charts. Another generator that you’ll see in chapter 5 is d3.svg.diagonal, used for drawing curved connecting lines in dendrograms.

4.1.2. Components

In contrast with generators, which produce the d attribute string necessary for a <path> element, components create an entire set of graphical objects necessary for a particular chart component. The most commonly used D3 component (which you’ll see in this chapter) is d3.svg.axis, which creates a bunch of <line>, <path>, <g>, and <text> elements that are needed for an axis based on the scale and settings you provide the function. Another component is d3.svg.brush (which you’ll see later), which creates all the graphical elements necessary for a brush selector.

4.1.3. Layouts

In contrast to generators and components, D3 layouts can be rather straightforward, like the pie chart layout, or complex, like a force-directed network layout. Layouts take in one or more arrays of data, and sometimes generators, and append attributes to the data necessary to draw it in certain positions or sizes, either statically or dynamically. You’ll see some of the simpler layouts in chapter 5, and then focus on the force-directed network layout and other network layouts in chapter 6.

4.2. Creating an axis

Scatterplots, which you worked with in chapters 1 and 2, are a simple and extremely effective charting method for displaying data. For most charts, the x position is a point in time and the y position is magnitude. For example, in chapter 2 you placed your tweets along the x-axis according to when the tweets were made and along the y-axis according to their impact factor. In contrast, a scatterplot places a single symbol on a chart with its xy position determined by quantitative data for that datapoint. For instance, you can place a tweet on the y-axis based on the number of favorites and on the x-axis based on the number of retweets. Scatterplots are common in scientific discourse and have grown increasingly common in journalism and public discourse for presenting data such as the cost compared to the quality of health care.

4.2.1. Plotting data

Scatterplots require multidimensional data. Each datapoint needs to have more than one piece of data connected with it, and for a scatterplot that data must be numerical. You need only an array of data with two different numerical values for a scatterplot to work. We’ll use an array where every object represents a person for whom we know the number of friends they have and the amount of money they make. We can see if having more or less friends positively correlates to a high salary.

var scatterData = [{friends: 5, salary: 22000},

{friends: 3, salary: 18000}, {friends: 10, salary: 88000},

{friends: 0, salary: 180000}, {friends: 27, salary: 56000},

{friends: 8, salary: 74000}];

If you think these salary numbers are too high or too low, pretend they’re in a foreign currency with an exchange rate that would make them more reasonable.

Representing this data graphically using circles is easy. You’ve done it several times:

d3.select("svg").selectAll("circle")

.data(scatterData).enter()

.append("circle").attr("r", 5).attr("cx", function(d,i) {

return i * 10;

}).attr("cy", function(d) {

return d.friends;

});

By designating d.friends for the cy position, we get circles placed with their depth based on the value of the friends attribute. Circles placed lower in the chart represent people in our dataset who have more friends. Circles are arranged from left to right using the old array-position trick you learned earlier in chapter 2. In figure 4.3, you can see that it’s not much of a scatterplot.

Figure 4.3. Circle positions indicate the number of friends and the array position of each datapoint.

Next, we need to build scales to make this fit better on our SVG canvas:

var xExtent = d3.extent(scatterData, function(d) {

return d.salary;

});

var yExtent = d3.extent(scatterData, function(d) {

return d.friends;

});

var xScale = d3.scale.linear().domain(xExtent).range([0,500]);

var yScale = d3.scale.linear().domain(yExtent).range([0,500]);

d3.select("svg").selectAll("circle")

.data(scatterData).enter().append("circle")

.attr("r", 5).attr("cx", function(d) {

return xScale(d.salary);

}).attr("cy", function(d) {

return yScale(d.friends);

});

The result, in figure 4.4, is a true scatterplot, with points representing people arranged by number of friends along the y-axis and amount of salary along the x-axis.

Figure 4.4. Any point closer to the bottom has more friends, and any point closer to the right has a higher salary. But that’s not clear at all without labels, which we’re going to make.

This chart, like most charts, is practically useless without a way of expressing to the reader what the position of the elements means. One way of accomplishing this is using well-formatted axis labels. Although we could use the same method for binding data and appending elements to create lines and ticks (which are just lines representing equidistant points along an axis) and labels for an axis, D3 provides d3.svg.axis(), which we can use to create these elements based on the scales we used to display the data. After we create an axis function, we define how we want our axis to appear. Then we can draw the axis via a selection’s .call() method from a selection on a <g> element where we want these graphical elements to be drawn.

var yAxis = d3.svg.axis().scale(yScale).orient("right");

d3.select("svg").append("g").attr("id", "yAxisG").call(yAxis);

var xAxis = d3.svg.axis().scale(xScale).orient("bottom");

d3.select("svg").append("g").attr("id", "xAxisG").call(xAxis);

Notice that the .call() method of a selection invokes a function with the selection that’s active in the method chain, and is the equivalent of writing

xAxis(d3.select("svg").append("g").attr("id", "xAxisG"));

Figure 4.5 shows a result that’s more legible, with the xy positions of the circles denoted by labels in a pair of axes. The labels are derived from the scales that we used to create each axis, and provide the context necessary to interpret this chart.

Figure 4.5. The same scatterplot from figure 4.4, but with a pair of labeled axes. The x-axis is drawn in such a way as to obscure one of the points.

The axis lines are thick enough to overlap with one of our scatterplot points because the domain of the axis being drawn is a path. Recall from chapter 3 that paths are by default filled in black. We can adjust the display by setting the fill style of those two axis domain paths to "none". Doing so reveals that the ticks for the axes aren’t being drawn, because those elements don’t have default “stroke” styles applied.

Figure 4.6 demonstrates why we don’t see any of our ticks and why we have thick black regions for our axis domains. To improve our axes, we need to style them properly.

Figure 4.6. Elements of an axis created from d3.svg.axis are a <path.domain> with a size equal to the extent of the axis, a <g.tick.major> that contains a <line> and a <text> for each major tick, and a <line.tick.minor> for each minor tick (this will only be the case when using the deprecated tickSubdivide function in D3 version 3.2 and earlier). Not shown, and invisible, is the <g> element that’s called and in which these elements are created. In our example, region 1 is filled with black and none of the lines have strokes, because that’s the default way that SVG draws <line> and <path> elements.

4.2.2. Styling axes

These elements are standard SVG elements created by the axis function, and they don’t have any more or less formatting than any other elements would when first created. This may seem counterintuitive, but SVG is meant to be paired with CSS, so it’s better that elements don’t have any “helpful” styles assigned to them, or you’d have a hard time overwriting those styles with your CSS. For now, we can set the domain path to fill:none and the lines to stroke: black using d3.select() and .style() to see what we’re missing, as shown in figure 4.7.

Figure 4.7. If we change the <path.domain> fill value to "none" and set its and the <line> stroke values to "black", we see the ticks and the stroke of <path.domain>. It also reveals our hidden datapoint.

If we set the .orient() option of the y-axis to "left" or the .orient() option of the x-axis to "top", is seems like they aren’t drawn. This is because they’re drawn outside the canvas, like our earlier rectangles. To move our axes around, we need to adjust the .attr("translate")of their parent <g> elements, either when we draw them or later. This is why it’s important to assign an ID to our elements when we append them to the canvas. We can move the x-axis to the bottom of this drawing easily:

d3.selectAll("#xAxisG").attr("transform","translate(0,500)");

Here’s our updated code. It uses the .tickSize() function to change the ticks to lines and manually sets the number of ticks using the ticks() function:

The effect all these functions is uninspiring, as shown in figure 4.8.

Figure 4.8. Setting axis ticks to the size of your canvas also sets <path.domain> to the size of your canvas. Because paths are, by default, filled with black, the result is illegible.

Let’s examine the elements created by the axis code and shown in figure 4.8 as a giant black square. The <g> element that we created with the ID of "xAxisG" contains <g> elements that each have a line and text:

<g class="tick major" transform="translate(0,0)" style="opacity: 1;">

<line x2="6" y2="0"></line>

<text x="9" y="0" dy=".32em" style="text-anchor: start;">0</text>

</g>

Notice that the <g> element has been created with classes, so we can style the child elements (our line and our label) using CSS, or select them with D3. This is necessary if we want our axes to be displayed properly, with lines corresponding to the labeled points. Why? Because along with lines and labels, the axis code has drawn the <path.domain> to cover the entire region contained by the axis elements. This domain element needs to be set to "fill: none", or we’ll end up with a big black square. You’ll also see examples where the tick lines are drawn with negative lengths to create a slightly different visual style. For our axis to make sense, we could continue to apply inline styles by using d3.select to modify the styles of the necessary elements, but instead we should use CSS, because it’s easier to maintain and doesn’t require us to write styles on the fly in JavaScript. The following listing shows a short CSS style sheet that corresponds to the elements created by the axis function.

Listing 4.1. ch4stylesheet.css

With this in place, we get something a bit more legible, as shown in figure 4.9.

Figure 4.9. With <path.domain> fill set to "none" and CSS settings also corresponding to the tick <line> elements, we can draw a rather attractive grid based on our two axes.

Take a look at the elements created by the axis() function in figure 4.9, and see in figure 4.10 how the CSS classes are associated with those elements.

Figure 4.10. The DOM shows how tick <line> elements are appended along with a <text> element for the label to one of a set of <g.tick.major> elements corresponding to the number of ticks.

As you create more-complex information visualization, you’ll get used to creating your own elements with classes referenced by your style sheet. You’ll also learn where D3 components create elements in the DOM and how they’re classed so that you can style them properly.

4.3. Complex graphical objects

Using circles or rectangles for your data won’t work with some datasets, for example, if an important aspect of your data has to do with distribution, like user demographics or statistical data. Often, the distribution of data gets lost in information visualization, or is only noted with a reference to standard deviation or other first-year statistics terms that indicate the average doesn’t tell the whole story. One particularly useful way of representing data that has a distribution (such as a fluctuating stock price) is the use of a boxplot in place of a traditional scatterplot. The boxplot uses a complex graphic that encodes distribution in its shape. The box in a boxplot typically looks like the one shown in figure 4.11. It uses quartiles that have been preprocessed, but you could easily use d3.scale.quartile() to create your own values from your own dataset.

Figure 4.11. A box from a boxplot consists of five pieces of information encoded in a single shape: (1) the maximum value, (2) the high value of some distribution, such as the third quartile, (3) the median or mean value, (4) the corresponding low value of the distribution, such as the first quartile, and (5) the minimum value.

Take a moment to examine the amount of data that’s encoded in the graphic in figure 4.11. The median value is represented as a gray line. The rectangle shows the amount of whatever you’re measuring that falls in a set range that represents the majority of the data. The two lines above and below the rectangle indicate the minimum and maximum values. Everything except the information in the gray line is lost when you map only the average or median value at a datapoint.

To build a reasonable boxplot, we’ll need a set of data with interesting variation in those areas. Let’s assume we want to plot the number of registered visitors coming to our website by day of the week so that we can compare our stats week to week (or so that we can present this info to our boss, or for some other reason). We have the data for the age of the visitors (based on their registration details) and derived the quartiles from that. Maybe we used Excel, Python, or d3.scale.quartile(), or maybe it was part of a dataset we downloaded. As you work with data, you’ll be exposed to common statistical summaries like this and you’ll have to represent them as part of your charts, so don’t be too intimidated by it. We’ll use a CSV format for the information.

The following listing shows our dataset with the number of registered users that visit the site each day, and the quartiles of their ages.

Listing 4.2. boxplots.csv

day,min,max,median,q1,q3,number

1,14,65,33,20,35,22

2,25,73,25,25,30,170

3,15,40,25,17,28,185

4,18,55,33,28,42,135

5,14,66,35,22,45,150

6,22,70,34,28,42,170

7,14,65,33,30,50,28

When we map the median age as a scatterplot, as in figure 4.12, it looks like there’s not too much variation in our user base throughout the week. We do that by drawing scatterplot points for each day at the median age of the visitor for that day. We’ll also invert the y-axis so that it makes a bit more sense.

Figure 4.12. The median age of visitors (y-axis) by day of the week (x-axis) as represented by a scatterplot. It shows a slight dip in age on the second and third days.

Listing 4.3. Scatterplot of average age

But to get a better view of this data, we’ll need to create a boxplot. Building a boxplot is similar to building a scatterplot, but instead of appending circles for each point of data, you append a <g> element. It’s a good rule to always use <g> elements for your charts, because they allow you to apply labels or other important information to your graphical representations. But that means you’ll need to use the transform attribute, which is how <g> elements are positioned on the canvas. Elements appended to a <g> base their coordinates off of the coordinates of their parent. When applying x and y attributes to child elements, you need to set them relative to the parent <g>.

Rather than selecting all the <g> elements and appending child elements one at a time, as we did in earlier chapters, we’ll use the .each() function of a selection, which allows us to perform the same code on each element in a selection, to create the new elements. Like any D3 selection function, .each() allows you to access the bound data, array position, and DOM element. Earlier on, in chapter 1, we achieved the same functionality by using selectAll to select the <g> elements and directly append <circle> and <text> elements. That’s a clean method, and the only reasons to use .each() to add child elements are if you prefer the syntax, you plan on doing complex operations involving each data element, or you want to add conditional tests to change whether or what child elements you’re appending. You can see how to use .each() to add child elements in action in the following listing, which takes advantage of the scales we created in listing 4.3 and draws rectangles on top of the circles we’ve already drawn.

Listing 4.4. Initial boxplot drawing code

The new rectangles indicating the distribution of visitor ages, as shown in figure 4.13, are not only offset to the right, but also showing the wrong values. Day 7, for instance, should range in value from 30 to 50, but instead is shown as ranging from 13 to 32. We know it’s doing that because that’s the way SVG draws rectangles. We have to update our code a bit to make it accurately reflect the distribution of visitor ages:

Figure 4.13. The <rect> elements represent the scaled range of the first and third quartiles of visitor age. They’re placed on top of a gray <circle> in each <g> element, which is placed on the chart at the median age. The rectangles are drawn, as per SVG convention, from the <g> down and to the right.

We’ll use the same technique we used to create the chart in figure 4.14 to add the remaining elements of the boxplot (described in detail in figure 4.15) by including several append functions in the .each() function. They all select the parent <g> element created during the data-binding process and append the shapes necessary to build a boxplot.

Figure 4.14. The <rect> elements are now properly placed so that their top and bottom correspond with the visitor age between the first and third quartiles of visitors for each day. The circles are completely covered, except for the second rectangle where the first quartile value is the same as the median age, and so we can see half the gray circle peeking out from underneath it.

Figure 4.15. How a boxplot can be drawn in D3. Pay particular attention to the relative positioning necessary to draw child elements of a <g>. The 0 positions for all elements are where the parent <g> has been placed, so that <line.max>, <rect.distribution>, and <line.range> all need to be drawn with an offset placing their top-left corner above this center, whereas <line.min> is drawn below the center and <line.median> has a 0 y-value, because our center is the median value.

Listing 4.5. The .each() function of the boxplot drawing five child elements

Listing 4.6 fulfills the requirement that we should also add an x-axis to remind us which day each box is associated with. This takes advantage of the explicit .tick-Values() function you saw earlier. It also uses negative tickSize() and the corresponding offset of the <g> that we use to call the axis function.

Listing 4.6. Adding an axis using tickValues

The end result of all this is a chart where each of our datapoints is represented, not by a single circle, but by a multipart graphical element designed to emphasize distribution.

The boxplot in figure 4.16 encodes not just the median age of visitors for that day, but the minimum, maximum, and distribution of the age of the majority of visitors. This expresses in detail the demographics of visitorship clearly and cleanly. It doesn’t include the number of visitors, but we could encode that with color, make it available on a click of each boxplot, or make the width of the boxplot correspond to the number of visitors.

Figure 4.16. Our final boxplot chart. Each day now shows not only the median age of visitors but also the range of visiting ages, allowing for a more extensive examination of the demographics of site visitorship.

We looked at boxplots because a boxplot allows you to explore the creation of multipart objects while using lines and rectangles. But what’s the value of a visualization like this that shows distribution? It encodes a graphical summary of the data, providing information about visitor age for the site on Wednesday, such as, “Most visitors were between the ages of 18 and 28. The oldest was 40. The youngest was 15. The median age was 25.” It also allows you to quickly perform visual queries, checking to see if the median age of one day was within the majority of visitor ages of another day.

We’ll stop exploring boxplots, and take a look at a different kind of complex graphical object: an interpolated line.

4.4. Line charts and interpolations

You create line charts by drawing connections between points. A line that connects points, and the shaded regions inside or outside the area constrained by the line, tell a story about the data. Although a line chart is technically a static data visualization, it’s also a representation of change, typically over time.

We’ll start with a new dataset in listing 4.7 that better represents change over time. Let’s imagine we have a Twitter account and we’ve been tracking the number of tweets, favorites, and retweets to determine at what time we have the greatest response to our social media. Although we’ll ultimately deal with this kind of data as JSON, we’ll want to start with a comma-delimited file, because it’s the most efficient for this kind of data.

Listing 4.7. tweetdata.csv

day,tweets,retweets,favorites

1,1,2,5

2,6,11,3

3,3,0,1

4,5,2,6

5,10,29,16

6,4,22,10

7,3,14,1

8,5,7,7

9,1,35,22

10,4,16,15

First we pull this CSV in using d3.csv() as we did in chapter 2, and then we create circles for each datapoint. We do this for each variation on the data, with the .day attribute determining x position and the other datapoint determining y position. We create the usual x and y scales to draw the shapes in the confines of our canvas. We also have a couple of axes to frame our results. Notice that we differentiated between the three datatypes by coloring them differently.

Listing 4.8. Callback function to draw a scatterplot from tweetdata

The graphical results of this code, as shown in figure 4.17, which take advantage of the CSS rules we defined earlier, aren’t easily interpreted.

Figure 4.17. A scatterplot showing the datapoints for 10 days of activity on Twitter, with the number of tweets in light gray, the number of retweets in dark gray, and the number of favorites in black

4.4.1. Drawing a line from points

By drawing a line that intersects each point of the same category, we can compare the number of tweets, retweets, and favorites. We can start by drawing a line for tweets using d3.svg.line(). This line generator expects an array of points as data, and we’ll need to tell the generator what values constitute the x and y coordinates for each point. By default, this generator expects a two-part array, where the first part is the x value and the second part is the y value. We can’t use that, because our x value is based on the day of the activity and our y value is based on the amount of activity.

The .x() accessor function of the line generator needs to point at the scaled day value, while the .y() accessor function needs to point to the scaled value of the appropriate activity. The line function itself takes the entire dataset that we loaded from tweetdata, and returns the SVG drawing code necessary for a line between the points in that dataset. To generate three lines, we use the dataset three times, with a slightly different generator for each. We not only need to write the generator function and define how it accesses the data it uses to draw the line, but we also need to append a <path> to our canvas and set its d attribute to equal the generator function we defined.

Listing 4.9. New line generator code inside the callback function

We draw the line above the circles we already drew, and the line generator produces the plot shown in figure 4.18.

Figure 4.18. The line generator takes the entire dataset and draws a line where the x,y position of every point on the canvas is based on its accessor. In this case, each point on the line corresponds to the day, and tweets are scaled to fit the x and y scales we created to display the data on the canvas.

4.4.2. Drawing many lines with multiple generators

If we build a line constructor for each datatype in our set and call each with its own path, as shown in the following listing, then you can see the variation over time for each of your datapoints. Listing 4.10 demonstrates how to build those generators with our dataset, and figure 4.19 shows the results of that code.

Figure 4.19. The dataset is first used to draw a set of circles, which creates the scatterplot from the beginning of this section. The dataset is then used three more times to draw each line.

Listing 4.10. Line generators for each tweetdata

4.4.3. Exploring line interpolators

D3 provides a number of interpolation methods with which to draw these lines, so that they can more accurately represent the data. In cases like tweetdata, where you have discrete points that represent data accurately and not samples, then the default “linear” method shown in figure 4.19 is appropriate. But in other cases, a different interpolation method for the lines, like the ones shown in figure 4.20, may be appropriate. Here’s the same data but with the d3.svg.line() generator using different interpolation methods:

Figure 4.20. Light gray: “basis” interpolation; dark gray: “step” interpolation; black: “cardinal” interpolation

What’s the best interpolation?

Interpolation modifies the representation of data. Experiment with this drawing code to see how the different interpolation settings show different information than other interpolators. Data can be visualized in different ways, all correct from a programming perspective, and it’s up to you to make sure the information you’re visualizing reflects the actual phenomena.

Data visualization deals with the visual representation of statistical principles, which means it’s subject to all the dangers of the misuse of statistics. The interpolation of lines is particularly vulnerable to misuse, because it changes a clunky-looking line into a smooth, “natural” line.

4.5. Complex accessor functions

All of the previous chart types we built were based on points. The scatterplot is points on a grid, the boxplot consists of complex graphical objects in place of points, and line charts use points as the basis for drawing a line. In this and earlier chapters, we’ve dealt with rather staid examples of information visualization that we might easily create in any traditional spreadsheet. But you didn’t get into this business to make Excel charts. You want to wow your audience with beautiful data, win awards for your aesthetic je ne sais quoi, and evoke deep emotional responses with your representation of change over time. You want to make streamgraphs like the one in figure 4.21.

Figure 4.21. Behold the glory of the streamgraph. Look on my works, ye mighty, and despair! (figure from The New York Times, February 23, 2008; http://mng.bz/rV7M)

The streamgraph is a sublime piece of information visualization that represents variation and change, like the boxplot. It may seem like a difficult thing to create, until you start to put the pieces together. Ultimately, a streamgraph is what’s known as a stacked chart. The layers accrete upon each other and adjust the area of the elements above and below, based on the space taken up by the components closer to the center. It appears organic because that accretive nature mimics the way many organisms grow, and seems to imply the kinds of emergent properties that govern the growth and decay of organisms. We’ll interpret its appearance later, but first let’s figure out how to build it.

The reason we’re looking at a streamgraph is because it’s not that exotic. A streamgraph is a stacked graph, which means it’s fundamentally similar to your earlier line charts. By learning how to make it, you can better understand another kind of generator, d3.svg.area(). The first thing you need is data that’s amenable to this kind of visualization. Let’s follow the New York Times, from which we get the streamgraph in figure 4.21, and work with the gross earnings for six movies over the course of nine days. Each datapoint is therefore the amount of money a movie made on a particular day.

Listing 4.11. movies.csv

day,movie1,movie2,movie3,movie4,movie5,movie6

1,20,8,3,0,0,0

2,18,5,1,13,0,0

3,14,3,1,10,0,0

4,7,3,0,5,27,15

5,4,3,0,2,20,14

6,3,1,0,0,10,13

7,2,0,0,0,8,12

8,0,0,0,0,6,11

9,0,0,0,0,3,9

10,0,0,0,0,1,8

To build a streamgraph, you need to get more sophisticated with the way you access data and feed it to generators when drawing lines. In our earlier example, we created three different line generators for our dataset, but that’s terribly inefficient. We also used simple functions to draw the lines. But we’ll need more than that to draw something like a streamgraph. Even if you think you won’t want to draw streamgraphs (and there are reasons why you may not, which we’ll get into at the end of this section), the important thing to focus on when you look at listing 4.11 is how you use accessors with D3’s line and, later, area generators.

Listing 4.12. The callback function to draw movies.csv as a line chart

The line-drawing code produces a cluttered line chart, as shown in figure 4.22. As you learned in chapter 1, lines and filled areas are almost exactly the same thing in SVG. You can differentiate them by a Z at the end of the drawing code that indicates the shape is closed, or the presence or absence of a "fill" style. D3 provides d3.svg.line and d3.svg.area generators to draw lines or areas. Both of these constructors produce <path> elements, but d3.svg.area provides helper functions to bound the lower end of your path to produce areas in charts. This means we need to define a .y0() accessor that corresponds to our y accessor and determines the shape of the bottom of our area. Let’s see how d3.svg.area() works.

Figure 4.22. Each movie column is drawn as a separate line. Notice how the “cardinal” interpolation creates a graphical artifact, where it seems like some movies made negative money.

Listing 4.13. Area accessors

Should you always draw filled paths with d3.svg.area?

No. Counterintuitively, you should use d3.svg.line to draw filled areas. To do so, though, you need to append Z to the created d attribute. This indicates that the path is closed.

Open path

Closed path changes

Explanation

movieArea = d3.svg.line()
.x(function(d) {
return xScale(d.day)
})
.y(function(d) {
return yScale(d[x])
})
.interpolate("cardinal");

You write the constructor for the linedrawing code the same regardless of whether you want a line or shape, filled or unfilled.

d3.select("svg")
.append("path")
.attr("d", movieArea(data))
.attr("fill", "none")
.attr("stroke", "black")
.attr("stroke-width", 3);

d3.select("svg")
.append("path")
.attr("d", movieArea(data)
+ "Z")
.attr("fill", "none")
.attr("stroke", "black")
.attr("stroke-width", 3);

When you call the constructor, you append a <path> element. You specify whether the line is “closed” by concatenating a Z to the string created by your line constructor for the d attribute of the <path>.

When you add a Z to the end of an SVG <path> element’s d attribute, it draws a line connecting the two end points.

d3.select("svg")
.append("path")
.attr("d", movieArea(data))
.attr("fill", "none")
.attr("stroke", "black")
.attr("stroke-width", 3);

d3.select("svg")
.append("path")
.attr("d", movieArea(data)
+ "Z")
.attr("fill", "gray")
.attr("stroke", "black")
.attr("stroke-width", 3);

You may think that only a closed path could be filled, but the fill of a path is the same whether or not you close the line by appending Z.

The area of a path filled is always the same, whether it’s closed or not.

You use d3.svg.line when you want to draw most shapes and lines, whether filled or unfilled, or closed or open. You should use d3.svg.area() when you want to draw a shape where the bottom of the shape can be calculated based on the top of the shape as you’re drawing it. It’s suitable for drawing bands of data, such as that found in a stacked area chart or streamgraph.

By defining the y0 function of d3.svg.area, we’ve mirrored the path created and filled it as shown in figure 4.23, which is a step in the right direction. Notice that we’re presenting inaccurate data now, because the area of the path is twice the area of the data. We want our areas to draw one on top of the other, so we need .y0() to point to a complex stacking function that makes the bottom of an area equal to the top of the previously drawn area. D3 comes with a stacking function, .stack(), which we’ll look at later, but for the purpose of our example, we’ll write our own.

Figure 4.23. By using an area generator and defining the bottom of the area as the inverse of the top, we can mirror our lines to create an area chart. Here they’re drawn with semitransparent fills, so that we can see how they overlap.

Listing 4.14. Callback function for drawing stacked areas

The stacked area chart in figure 4.24 is already complex. To make it a proper streamgraph, the stacks need to alternate. This requires a more complicated stacking function.

Figure 4.24. Our stacked area code represents a movie by drawing an area, where the bottom of that area equals the total amount of money made by any movies drawn earlier for that day.

Listing 4.15. A stacking function that alternates vertical position of area drawn

The streamgraph in figure 4.25 has some obvious issues, but we’re not going to correct them. For one thing, we’re over-representing the gross of the first movie by drawing it at twice the height. If we wanted to, we could easily make the stacking function account for this by halving the values of that first area. Another issue is that the areas being drawn are different from the areas being displayed, which isn’t a problem when our data visualization is going to be read from only one perspective and not multiple perspectives.

Figure 4.25. A streamgraph that shows the accreted values for movies by day. The problems of using different interpolation methods are clear. The basis method here shows some inaccuracies, and the difficulty of labeling the scale is also apparent.

But the purpose of this section is to focus on building complex accessor functions to create, from scratch, the kinds of data visualization you’ve seen and likely thought of as exotic. Let’s assume this data is correct and take a moment to analyze the effectiveness of this admittedly attractive method of visualizing data. Is this really a better way to show movie grosses than a simpler stacked graph or line chart? That depends on the scale of the questions being addressed by the chart. If you’re trying to discover overall patterns of variation in movie grosses, as well as spot interactions between them (for instance, seeing if a particularly high-grossing-over-time movie interferes with the opening of another movie), then it may be useful. If you’re trying to impress an audience with a complex-looking chart, it would also be useful. Otherwise, you’ll be better off with something simpler than this. But even if you only build less-visually impressive charts, you’ll still use the same techniques we’ve gone over in this section.

4.6. Summary

In this chapter you’ve learned the basics of creating charts:

· Integrating generators and components with the selection and binding process

· Learning about D3 components and the axis component to create chart elements like an x-axis and a y-axis

· Interpolating graphical elements, such as lines or areas from point data, using D3 generators

· Creating complex SVG objects that use the <g> element’s ability to create child shapes, which can be drawn based on the bound dataset, using .each()

· Exploring the representation of multidimensional data using boxplots

· Combining and extending these methods to implement a sophisticated charting method, the streamgraph, while learning how such charts may outstrip their audience’s ability to successfully interpret such data

These skills and methods will help you to better understand the D3 layouts, which we’ll explore in more detail in the following chapters. The incredible breadth of data visualization techniques possible with D3 is based on the fundamental similarity between different methods of displaying data, at the visual level, at the functional level, and at the data level. By understanding how the processes work and how they can be combined to create more interactive and rich representation, you’ll be better equipped to choose and deploy the right one for your data.