Data Science For Dummies (2016)
Part 3
Creating Data Visualizations That Clearly Communicate Meaning
Chapter 10
Using D3.js for Data Visualization
IN THIS CHAPTER
Understanding the core features and characteristics of D3.js data visualizations
Getting familiar with basic concepts in HTML, JavaScript, CSS, and PHP
Figuring out the more advanced features of D3.js
The truth of the matter is, you can come up with the perfect data visualization you need to meet the exact needs of your target audience gathered in that meeting room down the hall, but what works in the physical meeting room may not work in that virtual meeting room out on the World Wide Web. In this chapter, I show you how you can use D3.js technology to build custom web-based visualizations — the type of visualizations you need if you’re showcasing your work online. The power of programming in D3.js is that it enables you to create absolutely stunning, dynamic data visualizations with which your audience can interact and explore, straight from their browsers, with no add-ons required.
D3.js is not the only option available for constructing dynamic, web-based data visualizations — it’s just an excellent one. Other options include jQuery Javascript library (http://jquery.com), JpGraph PHP library (http://jpgraph.net), HighCharts (www.highcharts.com), and iCharts (http://icharts.net).
Introducing the D3.js Library
D3.js is an open-source JavaScript library that’s taken the data visualization world by storm since its first release in 2011. It was created (and is maintained) by Mike Bostock — famous data visualization guru and graphics editor of the New York Times. You can use this library to create high-quality, Data-Driven Documents (D3) in a fraction of the usual time and with a fraction of the effort required to code in plain (a.k.a. vanilla) JavaScript.
In its essence, D3.js is a collection of classes and functions that, when you apply just a little coding, you can use to execute much longer strings of lower-level JavaScript. D3.js calls on only a special class of commands in the JavaScript library — the ones that are typically used in data visualization. You use these commands to do things like draw axes, plot elements, and recalculate positions when resizing graphs.
If your goal is to create dynamic web-based data visualizations — visualizations that change in response to user interactions — D3.js is the perfect JavaScript library to use.
If you want users to be able to interact with your data visualization and choose what data to display, you need to create a dynamic visualization.
With dynamic data visualizations, your users can
· Interact with the visualization to choose what data to display
· See additional data when they hover the mouse cursor over the visualization or click parts of it
· Drill down into deeper levels of related data to see more detailed views on interesting parts of the data
· Explore animated visualizations that show changes over time
· Choose from a variety of different transitions between views
The D3.js library is still being developed. With Mike Bostock and countless other users contributing new types of visualizations, the library’s capabilities are expanding daily. The D3.js design philosophy is rather open-ended. It doesn’t limit you to using predefined, cookie-cutter data visualizations. Rather, this library can accommodate the individual creativity and imagination of each unique user.
Knowing When to Use D3.js (and When Not To)
The main purpose of D3.js in the data visualization world is to make creative, interactive, web-based visualizations using only a small bit of code. Because D3.js is so powerful and open-ended, it’s also more complex than some other JavaScript libraries, web applications, and software packages. D3.js has the same basic syntax as JavaScript, so you need a basic working knowledge of JavaScript in order to code in D3.js. Beyond using a bit of JavaScript, you need to know a whole vocabulary and programming style before getting started. This chapter covers the basics of getting started in D3.js, but if you want more extensive training, work through the tutorials that Mike Bostock has developed, and hosts, at his GitHub repository: https://github.com/mbostock/d3/wiki/Tutorials.
With the help of a few D3.js learning resources, you might be able to learn both JavaScript and D3.js at the same time. But many online tutorials presuppose a level of programming experience that you might not have.
Because you have to face a significant learning curve in mastering even the basic concepts of D3.js, use this library only if you want to create unique, interactive, scalable web-based visualizations. Otherwise, just stick with static data visualizations. In this case, the simpler, less open-ended frameworks available for creating static data visualizations are easier and can provide everything you might need.
If you choose to go with D3.js, however, you can find thousands of online open-source examples from which to learn and create. In addition to Mike Bostock’s tutorials, another good place to start is with the Dashing D3.js Tutorials at www.dashingd3js.com/table-of-contents.
These tutorials will get you started in learning how to build visualizations with data. From there, you can get into more advanced topics, such as building drillable sunburst diagrams and adjustable, force-directed network graphs. You can even use D3.js to make a heat map calendar, to visualize time-series trends in your data. The options and alternatives expand almost daily.
Getting Started in D3.js
I want to introduce you to the underlying concepts you need to master in order to create dynamic, web-based data visualizations using D3.js. In the following sections, I cover the basics of JavaScript, HTML, CSS, and PHP as they pertain to creating visualizations using D3.js. Also, I tell you how you can maximize the portion of client-side work that’s done by the JavaScript language. (Client-side work is the portion that’s done on your computer, rather than on the network server.) If you’re not content with stopping there, you can find out about JavaScript’s Document Object Model (DOM) and how it interacts with HyperText Markup Language (HTML) to produce dynamic changes in a web page’s HTML.
In the section “Bringing in the JavaScript and SVG,” later in this chapter, I talk about how you can use D3.js to efficiently display and change Scalable Vector Graphics (SVG) — an XML-based image format that’s useful for serving images to interactive visualization designs and animation applications — in your data visualization. You can find out how to make sweeping style changes across a web page by using Cascading Style Sheets (CSS), in the section “Bringing in the Cascading Style Sheets (CSS),” later in this chapter. And lastly, this chapter’s section “Bringing in the web servers and PHP” explains how to minimize the amount of client-side work that’s required by deploying PHP programs across a web server.
Bringing in the HTML and DOM
HyperText Markup Language (HTML) is the backbone of a web page. It delivers the static content you see on many websites, especially older ones. HTML is recognizable by its plain text and limited interactivity. The only interactive features you get with plain-HTML websites are perhaps some hyperlinks that lead you to other boring static pages throughout the site.
You can use HTML to display plain text with a series of tags that give instructions to the client’s browser. The following HTML code is pretty basic, but at least it gives you an idea of what’s involved:
<html>
<head>
<title>This is a simple HTML page</title>
</head>
<body>
<p>This is a paragraph.</p>
</body>
</html>
Just in case you’re not aware, HTML relies on start tags and end tags. The preceding sample has a couple nice examples, like <p> </p> and <body> </body>.
JavaScript uses the HTML Document Object Model (DOM). Through the HTML DOM, JavaScript can interact with HTML tags and their content. DOM treats tags as hierarchical layers, just like objects in an object-oriented programming language (like JavaScript). For example, the <body> tag in the preceding HTML is a child of the top-level <html> tag; it has one sibling, <head>, and one child, <p>. In the DOM, that <p> tag is fully defined by its path while you traverse from the top of the model (html > body > p, for example). DOM allows you to control object selections based on object attribute properties.
In D3.js, the purpose of HTML is to provide a bare scaffold of static tags and web page content that can be interacted with via JavaScript’s DOM to produce dynamic, changeable HTML pages. D3.js is built on top of a bare backbone of HTML. Although HTML is static, it becomes dynamic in D3.js if a programmer or user interaction causes predetermined scripts to make on-the-fly changes to the underlying HTML code. The HTML that is displayed is then often dynamic and different from that which was originally sent to the browser.
Bringing in the JavaScript and SVG
Using the JavaScript language gives you a simple way to get work done client-side (on the user’s machine). The slowest part of any interaction between a user and a website is in sending data from the server to the client’s computer over the Internet. That interaction can be vastly accelerated if, instead of sending all the information needed for a browser display, you send a much shorter, simpler chain of instructions that the client’s web browser can use to re-create that information and then create the web page using the client computer’s own processing speed. This is how client-side work is carried out.
If your goal is to retain a decent level of security, without needing plug-ins or special permissions to run code on your browser, JavaScript offers you a terrific solution. What’s more, JavaScript is fast! Because it’s a programming language intended for browsers, JavaScript is unencumbered by the advanced features that make other languages more versatile but less speedy.
A note about Microsoft Internet Explorer: Different versions of Internet Explorer are compatible with different versions of JavaScript. It’s a complex problem that can sometimes cause even experienced JavaScript programmers to pull their hair out. As a rule of thumb, JavaScript doesn’t run on IE versions that are older than IE8.
In JavaScript, graphics rendering is based on Scalable Vector Graphics (SVG) — a vector image format that delivers images to interactive visualizations and web-based animations. In D3.js, SVG functions as a file format that stores vector graphics for use in 2-dimensional, interactive, web-based data visualizations. Vector graphics require a lot less bandwidth than images because vector graphics contain only instructions for how to draw them, as opposed to the final pixel-by-pixel raster renderings of images. If your goal is to rapidly deploy web-based graphics that also provide you lossless scaling capabilities, then SVG is a perfect solution. SVG is optimal for use in creating graphical elements such as bars, lines, and markers.
You can use D3.js to select, add, modify, or remove SVG elements on a page, in just the same way you do with HTML elements. Since D3.js is most useful for working with web-based graphical elements, most of your D3.js scripting will interact with SVG elements.
Bringing in the Cascading Style Sheets (CSS)
The purpose of using Cascading Style Sheets (CSS) is to define the look of repeated visual elements, such as fonts, line widths, and colors. You can use CSS to specify the visual characteristics of page elements all at one time and then efficiently apply these characteristics to an entire HTML document (or to only the parts of the document defined in the DOM, if you wish). If you want to make sweeping, all-at-once changes to the look and feel of your web page elements, use CSS.
If you’re building visualizations on several pages of a website, you can create a separate document .css file, and then call that document instead of putting CSS in each page. If you do this, then when you make a change to this .css document, it will be applied to the pages collectively.
As an example, the basic CSS for a simple web page might include the following:
<style type="text/css">
p {
font-family: arial, verdana, sans-serif;
font-size: 12 pt;
color: black;
}
.highlighted {
color: red;
}
</style>
The preceding example would render the text in this HTML example:
<p>This text is black and <span class="highlighted">this text is red.</span></p>
The preceding CSS and HTML code generates text in the <p> tag that has a default value of black, whereas the inner object is designated as a highlighted class and generates a red-colored text.
D3.js leverages CSS for drawing and styling text elements and drawn elements so that you can define and change the overall look of your visualization in one compact, easy-to-read block of code.
Bringing in the web servers and PHP
Though one of the main purposes behind using JavaScript and D3.js is to maximize the portion of work that’s carried out on the client’s machine, some work is just better carried out on the web server. (In case you aren’t familiar with the term, a web server can be thought of as a server-based computer that sends information over the Internet to users when they visit a website.)
In web programming, the words client and user can be used interchangeably. Both words refer to either the end user or the end user’s computer.
As an example of how web servers work, think of a sales site that has millions of products available for purchase. When you go to search for a type of product, of course the website doesn’t send the company’s entire product catalog to your computer and expect your PC to do the work of paring down the product information. Instead, the site’s web server processes the search parameters you’ve defined and then sends only the information that’s relevant to answer your particular search questions.
In web programming, you commonly have a SQL database set up as a main information source and also a PHP program that defines the HTML code to be sent to the client’s computer. You use PHP programs to query the SQL database and determine what information to send over to the client. PHP is a scripting language that’s run on a server and produces on-the-fly HTML code in response to user interactions.
In pre-D3.js days, you’d have had to use a lot more time and bandwidth constructing web-based, interactive data visualizations. Due to the effectiveness of the PHP/D3.js combination, things are simpler now. In response to a request from the user, PHP selects information from the server and sends it to the client’s computer, in the form of HTML with embedded CSS, JavaScript, and D3.js code. At this point, the D3.js can take over and expand the HTML. If necessary, D3.js can even make on-the-fly HTML expansions in response to additional user interactions. This process uses only a fraction of the bandwidth and time that would have been required in a PHP-only or JavaScript-only setup.
Implementing More Advanced Concepts and Practices in D3.js
In this section, you can see the ways that more advanced D3.js features can help you become more efficient in using the library to create dynamic data visualizations. In the following section, I explain how to use so-called chain syntax — syntax that chains together functions and methods into a single line of code — to minimize your coding requirements. You can also find out how to use the scale and axis functions to automatically define or change the proportions and elements for graphic outputs in the section “Getting to know scales,” later in this chapter. In the section “Getting to know transitions and interactions,” later in this chapter, you see how to use transitions and interactions to maximize audience exploration, analysis, and learning from your end-product data visualization.
For the rest of this chapter, refer to Listing 10-1 (a complete HTML file that contains CSS, JavaScript, and D3.js elements) when working through the examples. Don’t get too stressed out by the listing itself. I know it’s lengthy, but simply look through it, and then focus on the snippets I pull from it for later discussion.
And for future reference, the code shown in Listing 10-1 produces the interactive bar chart that’s shown in Figure 10-1.
FIGURE 10-1: An interactive, web-based bar chart made in D3.js.
LISTING 10-1 An HTML File with CSS, JavaScript, and D3.js Elements
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>D3.js example</title>
<script type='text/javascript' src="http://d3js.org/d3.v3.min.js"></script>
<style type='text/css'>
rect:hover {
fill: brown;
}
</style>
<script type='text/javascript'>//<![CDATA[
window.onload=function(){
var column_data = [{
position: 0,
quantity: 5
}, {
position: 1,
quantity: 20
}, {
position: 2,
quantity: 15
}, {
position: 3,
quantity: 25
}, {
position: 4,
quantity: 10
}];
var total_width = 400;
var total_height = 200;
var scale_y = d3.scale.linear()
.domain([0, d3.max(column_data, function (d) {
return d.quantity;
})])
.range([0, total_height]);
var scale_x = d3.scale.ordinal()
.domain(d3.range(column_data.length))
.rangeRoundBands([0, total_width], 0.05);
var position = function (d) {
return d.position;
};
var svg_container = d3.select("body")
.append("svg")
.attr("width", total_width)
.attr("height", total_height);
svg_container.selectAll("rect")
.data(column_data, position)
.enter()
.append("rect")
.attr("x", function (d, i) {
return scale_x(i);
})
.attr("y", function (d) {
return total_height - scale_y(d.quantity);
})
.attr("width", scale_x.rangeBand())
.attr("height", function (d) {
return scale_y(d.quantity);
})
.attr("fill", "teal");
var sort = function () {
bars_to_sort = function (a, b) {
return b.quantity - a.quantity;
};
svg_container.selectAll("rect")
.sort(bars_to_sort)
.transition()
.delay(0)
.duration(300)
.attr("x", function (d, n) {
return scale_x(n);
});
};
d3.select("#sort").on("click", sort);
d3.select("#unsort").on("click", unsort);
function unsort() {
svg_container.selectAll("rect")
.sort(function (a, b) {
return a.position - b.position;
})
.transition()
.delay(0)
.duration(1200)
.attr("x", function (d, i) {
return scale_x(i);
});
};
}//]]>
</script>
</head>
<body>
<button id="sort" onclick="sortBars()">Sort</button>
<button id="unsort" onclick="unsort()">Unsort</button>
<p>
</body>
</html>
Getting to know chain syntax
As I mention in the section “Bringing in the HTML and DOM,” earlier in this chapter, you can use D3.js to turn a bare scaffold of HTML into a complex visualization by modifying page elements. The D3.js library uses an efficient operator syntax called chain syntax. The purpose of chain syntax is to chain together several methods, thereby allowing you to perform multiple actions using only a single line of code. Instead of name-value pair syntax (like what you would see in CSS), D3.js chains together multiple expressions, each one creating a new object and selection for the next.
The fundamental concept behind D3.js is the Append, Enter, and Exit selections. These methods select, add, or remove HTML tags that are bound, or assigned, to your data. The Append selection refers to existing data elements that are paired with existing HTML tags. When there are more data elements than tags, the Enter selection adds tags paired with the surplus data elements. When there are more tags than data elements, you can use the Exit selection to remove those tags.
Taking another look at a section of Listing 10-1, notice the code block that draws the bars of the graph:
svg_container.selectAll("rect")
.data(column_data, position)
.enter()
.append("rect")
.attr("x", function (d, i) {
return scale_x(i);
This D3.js script defines an object svg_container. The first element, selectAll(“rect”), defines the container to be all the “rect” elements in the document. To this container, the script then goes through the data in column_dataand position, and binds each data item to one of the rects in the container. To handle data items that don’t (yet) have a rect in the container, the enter expression creates new placeholder elements that can be turned into real document elements by further expressions. And, in fact, this is just what the next expression — append(“rect”) — does, creating a new rect element for each new data item. The key idea is that datasets can be bound to document elements and used to create, destroy, and modify them in a variety of very flexible ways.
Getting to know scales
In D3.js, the scale function plots input domains to output ranges so that the output data visualization graphics are drawn at appropriate, to-scale proportions. Looking at the following section of Listing 10-1, notice how scalevariables are defined using D3.js:
var scale_y = d3.scale.linear()
.domain([0, d3.max(column_data, function (d) {
return d.quantity;
})])
.range([0, total_height]);
var scale_x = d3.scale.ordinal()
.domain(d3.range(column_data.length))
.rangeRoundBands([0, total_width], 0.05);
One of the main features of D3.js is its ability to do difficult and tedious calculations under the hood. A key part of this work is done in scaling plots. If you want to automatically map the range of your data to actual pixel dimensions in your graph, you can use the scale function to change either or both parameters without having to do any manual recalculations.
From this snippet, you can see that the total height of the graph has been specified as a .range. This means you no longer need to calculate margins or positions or how the values map to fixed positions.
The following section from Listing 10-1 shows that the greatest quantity value in the total height range is 25:
var column_data = [{
position: 0,
quantity: 5
}, {
position: 1,
quantity: 20
}, {
position: 2,
quantity: 15
}, {
position: 3,
quantity: 25
}, {
position: 4,
quantity: 10
}];
var total_width = 400;
var total_height = 200;
By automatically mapping the range of your data (0[nd]25) to the actual pixel height of your graph, you can change either or both parameters without having to do any manual recalculations.
Although you can use D3.js to automatically handle the placement of axis labels and tick marks, the library can also handle all calculations involved in date ranges. This functionality leaves you free to concentrate on the overall look and feel of the visualization, instead of having to tinker with its mechanics.
Getting to know transitions and interactions
The true beauty of D3.js lies in how you can use it to easily incorporate dynamic elements into your web-based data visualization. If you want to encourage users to explore and analyze the data in your dynamic visualization, create features that offer a lot of user interactivity. Also, incorporating transitions into your dynamic visualization can help you capture the interest of your audience. Transitions in D3.js build the aesthetic appeal of a data visualization by incorporating elements of motion into the design.
As a prime example of D3.js interactivity, take a look at the following code from Listing 10-1:
<style type='text/css'>
rect:hover {
fill: brown;
}
</style>
Here, a single piece of CSS code changes the color of the bars whenever the user hovers the cursor over them.
And, looking at another snippet taken from Listing 10-1 (shown next), you can see code that defines a sort function and then creates buttons to transition the bars between sorted and unsorted states (if you tried for the same effect using vanilla JavaScript, it would be more tedious and time-consuming):
var sort = function () {
bars_to_sort = function (a, b) {
return b.quantity - a.quantity;
};
svg_container.selectAll("rect")
.sort(bars_to_sort)
.transition()
.delay(0)
.duration(300)
.attr("x", function (d, n) {
return scale_x(n);
});
};
d3.select("#sort").on("click", sort);
d3.select("#unsort").on("click", unsort);
function unsort() {
svg_container.selectAll("rect")
.sort(function (a, b) {
return a.position - b.position;
})
.transition()
.delay(0)
.duration(1200)
.attr("x", function (d, i) {
return scale_x(i);
});
};
}//]]>
</script>
</head>
<body>
<button id="sort" onclick="sortBars()">Sort</button>
<button id="unsort" onclick="unsort()">Unsort</button>
<p>
</body>
</html>
The D3.js wiki has a gallery of visualizations that give you an idea of this library’s enormous potential (https://github.com/d3/d3/wiki/Gallery). Across the web, people are finding new ways to use the library or adding to it for improved usability in specialized applications. As a modern interactive data visualization designer, you can use skills in D3.js to create almost anything you can imagine.