Data Visualization with JavaScript (2015)
Chapter 8. Managing Data in the Browser
So far in the book, we’ve looked at a lot of visual ization tools and techniques, but we haven’t spent much time considering the data part of data visualization. The emphasis on visualization is appropriate in many cases. Especially if the data is static, we can take all the time we need to clean and organize it before it’s even represented in JavaScript. But what if the data is dynamic and we have no choice but to import the raw source directly into our JavaScript application? We have much less control over data from third-party REST APIs, Google Docs spreadsheets, or automatically generated CSV files. With those types of data sources, we often need to validate, reformat, recalculate, or otherwise manipulate the data in the browser.
This chapter considers a JavaScript library that is particularly helpful for managing large data sets in the web browser: Underscore.js (http://underscorejs.org/). We’ll cover the following aspects of Underscore.js:
§ Functional programming, the programming style that Underscore.js encourages
§ Working with simple arrays using Underscore.js utilities
§ Enhancing JavaScript objects
§ Manipulating collections of objects
The format of this chapter differs from the other chapters in the book. Instead of covering a few examples of moderate complexity, we’ll look at a lot of simple, short examples. Each section collects several related examples together, but each of the short examples is independent. The first section differs even further. It’s a brief introduction to functional programming cast as a step-by-step migration from the more common imperative programming style. Understanding functional programming is very helpful, as its philosophy underlies almost all of the Underscore.js utilities.
This chapter serves as a tour of the Underscore.js library with a special focus on managing data. (As a concession to the book’s overall focus on data visualization, it also includes several illustrations.) We’ll see many of the Underscore.js utilities covered here at work in a larger web application project in the subsequent chapters.
Using Functional Programming
When we’re working with data that’s part of a visualization, we often have to iterate through the data one item at a time to transform, extract, or otherwise manipulate it to fit our application. Using only the core JavaScript language, our code may rely on a for loop like the following:
for (var i=0, len=data.length; i<len; i++) {
// Code continues...
}
Although this style, known as imperative programming, is a common JavaScript idiom, it can present a few problems in large, complex applications. In particular, it might result in code that’s harder than necessary to debug, test, and maintain. This section introduces a different programming style—functional programming—that eliminates many of those problems. As you’ll see, functional programming can result in code that’s much more concise and readable, and therefore often much less error prone.
To compare these two programming styles, let’s consider a simple programming problem: writing a function to calculate the Fibonacci numbers. The first two Fibonacci numbers are 0 and 1, and subsequent numbers are the sum of the two preceding values. The sequence starts like this:
§ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . .
Step 1: Start with an Imperative Version
To begin, let’s consider a traditional, imperative approach to the problem. Here’s a first attempt:
var fib = function(n) {
// If 0th or 1st, just return n itself
if (n < 2) return n;
// Otherwise, initialize variable to compute result
var f0=0, f1=1, f=1;
// Iterate until we reach n
for (i=2; i<=n; i++) {
// At each iteration, slide the intermediate
// values down a step
f0 = f1 = f;
// And calculate sum for the next pass
f = f0 + f1;
}
// After all the iterations, return the result
return f;
}
This fib() function takes as its input a parameter n and returns as its output the nth Fibonacci number. (By convention, the 0th and 1st Fibonacci numbers are 0 and 1.)
Step 2: Debug the Imperative Code
If you aren’t checking closely, you might be surprised to find that the preceding trivial example contains three bugs. Of course, it’s a contrived example and the bugs are deliberate, but can you find all of them without reading any further? More to the point, if even a trivial example can hide so many bugs, can you imagine what might be lurking in a complex web application?
To understand why imperative programming can introduce these bugs, let’s fix them one at a time.
One bug is in the for loop:
for (i=2; i<=n; i++) {
The conditional that determines the loop termination checks for a less-than-or-equal (<=) value; instead, it should check for a less-than (<) value.
A second bug occurs in this line:
f0 = f1 = f;
Although we think and read left to right (at least in English), JavaScript executes multiple assignments from right to left. Instead of shifting the values in our variables, this statement simply assigns the value of f to all three. We need to break the single statement into two:
f0 = f1;
f1 = f;
The final bug is the most subtle, and it’s also in the for loop. We’re using the local variable i, but we haven’t declared it. As a result, JavaScript will treat it as a global variable. That won’t cause our function to return incorrect results, but it could well introduce a conflict—and a hard-to-find bug—elsewhere in our application. The correct code declares the variable as local:
for (var i=2; i<n; i++) {
Step 3: Understand the Problems Imperative Programming May Introduce
The bugs in this short and straightforward piece of code are meant to demonstrate some problematic features of imperative programming in general. In particular, conditional logic and state variables, by their very nature, tend to invite certain errors.
Consider the first bug. Its error was using an incorrect test (<= instead of <) for the conditional that terminates the loop. Precise conditional logic is critical for computer programs, but such precision doesn’t always come naturally to most people, including programmers. Conditional logic has to be perfect, and sometimes making it perfect is tricky.
The other two errors both relate to state variables, f0 and f1 in the first case and i in the second. Here again there’s a difference between how programmers think and how programs operate. When programmers write the code to iterate through the numbers, they’re probably concentrating on the specific problem at hand. It may be easy to neglect the potential effect on other areas of the application. More technically, state variables can introduce side effects into a program, and side effects may result in bugs.
Step 4: Rewrite Using Functional Programming Style
Proponents of functional programming claim that by eliminating conditionals and state variables, a functional programming style can produce code that’s more concise, more maintainable, and less prone to errors than imperative programming.
The “functional” in “functional programming” does not refer to functions in programming languages but rather to mathematical functions such as y=f(x). Functional programming attempts to emulate mathematical functions in the context of computer programming. Instead of iterating over values by using a for loop, functional programming often uses recursion, where a function calls itself multiple times to make a calculation or manipulate values.
Here’s how we can implement the Fibonacci algorithm with functional programming:
var fib = function(n) { return n < 2 ? n : fib(n-1) + fib(n-2); }
Notice that this version has no state variables and, except for the edge case to handle 0 or 1, no conditional statements. It’s much more concise, and notice how the code mirrors almost word-for-word the statement of the original problem: “The first two Fibonacci numbers are 0 and 1” corresponds to n < 2 ? n, and “subsequent numbers are the sum of the two preceding values” corresponds to fib(n-1) + fib(n-2).
Functional programming implementations often express the desired outcome directly. They can therefore minimize the chance of misinterpretations or errors in an intermediate algorithm.
Step 5: Evaluate Performance
From what we’ve seen so far, it may seem that we should always adopt a functional programming style. Certainly functional programming has its advantages, but it can have some significant disadvantages as well. The Fibonacci code is a perfect example. Since functional programming eschews the notion of loops, our example relies instead on recursion.
In our specific case the fib() function calls itself twice at every level until the recursion reaches 0 or 1. Since each intermediate call itself results in more intermediate calls, the number of calls to fib() increases exponentially. Finding the 28th Fibonacci number by executing fib(28) results in over one million calls to the fib() function.
As you might imagine, the resulting performance is simply unacceptable. Table 8-1 shows the execution times for both the functional and the imperative versions of fib().
Table 8-1. Execution Times for fib()
Version |
Parameter |
Execution time (ms) |
Imperative |
28 |
0.231 |
Functional |
28 |
296.9 |
As you can see, the functional programming version is over a thousand times slower. In the real world, such performance is rarely acceptable.
Step 6: Fix the Performance Problem
Fortunately, we can reap the benefits of functional programming without suffering the performance penalty. We simply turn to the tiny but powerful Underscore.js library. As the library’s web page explains,
Underscore is a utility-belt library for JavaScript that provides . . . functional programming support.
Of course, we need to include that library in our web pages. If you’re including libraries individually, Underscore.js is available on many content distribution networks, such as CloudFlare.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title></title>
</head>
<body>
<!-- Content goes here -->
<script
src="//cdnjs.cloudflare.com/ajax/libs/underscore.js/1.4.4/"+
"underscore-min.js">
</script>
</body>
</html>
With Underscore.js in place, we can now optimize the performance of our Fibonacci implementation.
The problem with the recursive implementation is that it results in many unnecessary calls to fib(). For example, executing fib(28) requires more than 100,000 calls to fib(3). And each time fib(3) is called, the return value is re calculated from scratch. It would be better if the implementation called fib(3) only once, and every subsequent time it needed to know the value of fib(3) it reused the previous result instead of recalculating it from scratch. In effect, we’d like to implement a cache in front of the fib() function. The cache could eliminate the repetitive calculations.
This approach is known as memoizing, and the Underscore.js library has a simple method to automatically and transparently memoize JavaScript functions. Not surprisingly, that method is called memoize(). To use it, we first wrap the function we want to memoize within the Underscore object. Just as jQuery uses the dollar sign ($) for wrapping, Underscore.js uses the underscore character (_). After wrapping our function, we simply call the memoize() method. Here’s the complete code:
var fib = _( function(n) {
return n < 2 ? n : fib(n-1) + fib(n-2);
} ).memoize()
As you can see, we haven’t really lost any of the readability or conciseness of functional programming. And it would still be a challenge to introduce a bug in this implementation. The only real change is performance, and it’s substantially better, as shown in Table 8-2.
Table 8-2. Execution Times for fib(), Continued
Version |
Parameter |
Execution time (ms) |
Imperative fib() |
28 |
0.231 |
Functional fib() |
28 |
296.9 |
Memoized fib() |
28 |
0.352 |
Just by including the Underscore.js library and using one of its methods, our functional implementation has nearly the same performance as the imperative version.
For the rest of this chapter, we’ll look at many of the other improvements and utilities that Underscore.js provides. With its support for functional programming, Underscore.js makes it significantly easier to work with data in the browser.
Working with Arrays
If your visualization relies on a significant amount of data, that data is most likely contained in arrays. Unfortunately, it’s very tempting to resort to imperative programming when you are working with arrays. Arrays suggest the use of programming loops, and, as we saw earlier, programming loops are an imperative construct that often causes errors. If we can avoid loops and rely on functional programming instead, we can improve the quality of our JavaScript. The core JavaScript language includes a few utilities and methods to help applications cope with arrays in a functional style, but Underscore.js adds many others. This section describes many of the Underscore.js array utilities that are most helpful for data visualizations.
Extracting Elements by Position
If you need only a subset of an array for your visualization, Underscore.js has many utilities that make it easy to extract the right one. For the following examples, we’ll consider a simple array (shown in Figure 8-1).
var arr = [1,2,3,4,5,6,7,8,9];
Figure 8-1. Underscore.js has many utilities to make working with arrays easy.
Underscore.js’s first() method provides a simple way to extract the first element of an array, or the first n elements (see Figure 8-2):
> _(arr).first()
1
> _(arr).first(3)
[1, 2, 3]
Figure 8-2. The first() function returns the first element or the first n elements in an array.
Notice that first() (without any parameter) returns a simple element, while first(n) returns an array of elements. That means, for example, that first() and first(1) have different return values (1 versus [1] in the example).
As you might expect, Underscore.js also has a last() method to extract elements from the end of an array (see Figure 8-3).
> _(arr).last()
9
> _(arr).last(3)
[7, 8, 9]
Figure 8-3. The last() function returns the last element or the last n elements in an array.
Without any parameters, last() returns the last element in the array. With a parameter n, it returns a new array with the last n elements from the original.
The more general versions of both of these functions (.first(3) and .last(3)) would require some potentially tricky (and error-prone) code to implement in an imperative style. In the functional style that Underscore.js supports, however, our code is clean and simple.
What if you want to extract from the beginning of the array, but instead of knowing how many elements you want to include in the result, you know only how many elements you want to omit? In other words, you need “all but the last n” elements. The initial() method performs this extraction (see Figure 8-4). As with all of these methods, if you omit the optional parameter, Underscore.js assumes a value of 1.
> _(arr).initial()
[1, 2, 3, 4, 5, 6, 7, 8]
> _(arr).initial(3)
[1, 2, 3, 4, 5, 6]
Figure 8-4. The initial() function returns all but the last element or all but the last n elements in an array.
Finally, you may need the opposite of initial(). The rest() method skips past a defined number of elements in the beginning of the array and returns whatever remains (see Figure 8-5).
> _(arr).rest()
[2, 3, 4, 5, 6, 7, 8, 9]
> _(arr).rest(3)
[4, 5, 6, 7, 8, 9]
Figure 8-5. The rest() function returns all but the first element or all but the first n elements in an array.
Again, these functions would be tricky to implement using traditional, imperative programming, but they are a breeze with Underscore.js.
Combining Arrays
Underscore.js includes another set of utilities for combining two or more arrays. These include functions that mimic standard mathematical set operations, as well as more-sophisticated combinations. For the next few examples, we’ll use two arrays, one containing the first few Fibonacci numbers and the other containing the first five even integers (see Figure 8-6).
var fibs = [0, 1, 1, 2, 3, 5, 8];
var even = [0, 2, 4, 6, 8];
Figure 8-6. Underscore.js also has many utilities to work with multiple arrays.
The union() method is a straightforward combination of multiple arrays. It returns an array containing all elements that are in any of the inputs, and it removes any duplicates (Figure 8-7).
> _(fibs).union(even)
[0, 1, 2, 3, 5, 8, 4, 6]
Figure 8-7. The union() function creates the union of multiple arrays, removing any duplicates.
Notice that union() removes duplicates whether they appear in separate inputs (0, 2, and 8) or in the same array (1).
NOTE
Although this chapter considers combinations of just two arrays, most Underscore.js methods can accept an unlimited number of parameters. For example, _.union(a,b,c,d,e) returns the union of five different arrays. You can even find the union of an array of arrays with the JavaScript apply() function with something like _.union.prototype.apply(this, arrOfArrs).
The intersection() method acts just as you would expect, returning only those elements that appear in all of the input arrays (Figure 8-8).
> _(fibs).intersection(even)
[0, 2, 8]
Figure 8-8. The intersection() function returns elements in common among multiple arrays.
The difference() method is the opposite of intersection(). It returns those elements in the first input array that are not present in the other inputs (Figure 8-9).
> _(fibs).difference(even)
[1, 1, 3, 5]
Figure 8-9. The difference() function returns elements that are present only in the first of multiple arrays.
If you need to eliminate duplicate elements but have only one array—making union() inappropriate—then you can use the uniq() method (Figure 8-10).
> _(fibs).uniq()
[0, 1, 2, 3, 5, 8]
Figure 8-10. The uniq() function removes duplicate elements from an array.
Finally, Underscore.js has a zip() method. Its name doesn’t come from the popular compression algorithm but rather because it acts a bit like a zipper. It takes multiple input arrays and combines them, element by element, into an output array. That output is an array of arrays, where the inner arrays are the combined elements.
> var naturals = [1, 2, 3, 4, 5];
> var primes = [2, 3, 5, 7, 11];
> _.zip(naturals, primes)
[ [1,2], [2,3], [3,5], [4,7], [5,11] ]
The operation is perhaps most clearly understood through a picture; see Figure 8-11.
Figure 8-11. The zip() function pairs elements from multiple arrays together into a single array.
This example demonstrates an alternative style for Underscore.js. Instead of wrapping an array within the _ object as we’ve done so far, we call the zip() method on the _ object itself. The alternative style seems a better fit for the underlying functionality in this case, but if you prefer _(naturals).zip(prime), you’ll get the exact same result.
Removing Invalid Data Values
One of the banes of visualization applications is invalid data values. Although we’d like to think that our data sources ensure that all the data they provide is scrupulously correct, that is, unfortunately, rarely the case. More seriously, if JavaScript encounters an invalid value, the most common result is an unhandled exception, which halts all further JavaScript execution on the page.
To avoid such an unpleasant error, we should validate all data sets and remove invalid values before we pass the data to graphing or charting libraries. Underscore.js has several utilities to help.
The simplest of these Underscore.js methods is compact(). This function removes any data values that JavaScript treats as false from the input arrays. Eliminated values include the Boolean value false, the numeric value 0, an empty string, and the special values NaN (not a number; for example, 1/0), undefined, and null.
> var raw = [0, 1, false, 2, "", 3, NaN, 4, , 5, null];
> _(raw).compact()
[1, 2, 3, 4, 5]
It is worth emphasizing that compact() removes elements with a value of 0. If you use compact() to clean a data array, be sure that 0 isn’t a valid data value in your data set.
Another common problem with raw data is excessively nested arrays. If you want to eliminate extra nesting levels from a data set, the flatten() method is available to help.
> var raw = [1, 2, 3, [[4]], 5];
> _(raw).flatten()
[1, 2, 3, 4, 5]
By default, flatten() removes all nesting, even multiple levels of nesting, from arrays. If you set the shallow parameter to true, however, it removes only a single level of nesting.
> var raw = [1, 2, 3, [[4]], 5];
> _(raw).flatten(true)
[1, 2, 3, [4], 5]
Finally, if you have specific values that you want to eliminate from an array, you can use the without() method. Its parameters provide a list of values that the function should remove from the input array.
> var raw = [1, 2, 3, 4];
> _(raw).without(2, 3)
[1, 4]
Finding Elements in an Array
JavaScript has always defined the indexOf() method for strings. It returns the position of a given substring within a larger string. Recent versions of JavaScript have added this method to array objects, so you can easily find the first occurrence of a given value in an array. Unfortunately, older browsers (specifically IE8 and earlier) don’t support this method.
Underscore.js provides its own indexOf() method to fill the gap those older browsers create. If Underscore.js finds itself running in an environment with native support for array indexOf, then it defers to the native method to avoid any performance penalty.
> var primes = [2, 3, 5, 7, 11];
> _(primes).indexOf(5)
2
To begin your search somewhere in the middle of the array, you can specify that starting position as the second argument to indexOf().
> var arr = [2, 3, 5, 7, 11, 7, 5, 3, 2];
> _(arr).indexOf(5, 4)
6
You can also search backward from the end of an array using the lastIndexOf() method.
> var arr = [2, 3, 5, 7, 11, 7, 5, 3, 2];
> _(arr).lastIndexOf(5)
6
If you don’t want to start at the very end of the array, you can pass in the starting index as an optional parameter.
Underscore.js provides a few helpful optimizations for sorted arrays. Both the uniq() and the indexOf() methods accept an optional Boolean parameter. If that parameter is true, then the functions assume that the array is sorted. The performance improvements this assumption allows can be especially significant for large data sets.
The library also includes the special sortedIndex() function. This function also assumes that the input array is sorted. It finds the position at which a specific value should be inserted to maintain the array’s sort order.
> var arr = [2, 3, 5, 7, 11];
> _(arr).sortedIndex(6)
3
If you have a custom sorting function, you can pass that to sortedIndex() as well.
Generating Arrays
The final array utility I’ll mention is a convenient method to generate arrays. The range() method tells Underscore.js to create an array with the specified number of elements. You may also specify a starting value (the default is 0) and the increment between adjacent values (the default is 1).
> _.range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> _.range(20,10)
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
> _.range(0, 10, 100)
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900]
The range() function can be quite useful if you need to generate x-axis values to match an array of y-axis values.
> var yvalues = [0.1277, 1.2803, 1.7697, 3.1882]
> _.zip(_.range(yvalues.length),yvalues)
[ [0, 0.1277], [1, 1.2803], [2, 1.7697], [3, 3.1882] ]
Here we use range() to generate the matching x-axis values, and use zip() to combine them with the y-values.
Enhancing Objects
Although the previous section’s examples show numeric arrays, often our visualization data consists of JavaScript objects instead of simple numbers. That’s especially likely if we get the data via a REST interface, because such interfaces almost always deliver data in JavaScript Object Notation (JSON). If we need to enhance or transform objects without resorting to imperative constructs, Underscore.js has another set of utilities that can help. For the following examples, we can use a simple pizza object (see Figure 8-12).
var pizza = {
size: 10,
crust: "thin",
cheese: true,
toppings: [ "pepperoni","sausage"]
};
Figure 8-12. Underscore.js has many utilities for working with arbitrary JavaScript objects.
Working with Keys and Values
Underscore.js includes several methods to work with the keys and values that make up objects. For example, the keys() function creates an array consisting solely of an object’s keys (see Figure 8-13).
> _(pizza).keys()
[ "size", "crust", "cheese", "toppings"]]
Figure 8-13. The keys() function returns the keys of an object as an array.
Similarly, the values() function creates an array consisting solely of an object’s values (Figure 8-14).
> _(pizza).values()
[10, "thin", true, [ "pepperoni","sausage"]]
Figure 8-14. The values() function returns just the values of an object as an array.
The pairs() function creates a two-dimensional array. Each element of the outer array is itself an array that contains an object’s key and its corresponding value (Figure 8-15).
> _(pizza).pairs()
[
[ "size",10],
[ "crust","thin"],
[ "cheese",true],
[ "toppings",[ "pepperoni","sausage"]]
]
Figure 8-15. The pairs() function converts an object into an array of array pairs.
To reverse this transformation and convert an array into an object, we can use the object() function.
> var arr = [ [ "size",10], [ "crust","thin"], [ "cheese",true],
[ "toppings",[ "pepperoni","sausage"]] ]
> _(arr).object()
{ size: 10, crust: "thin", cheese: true, toppings: [ "pepperoni","sausage"]}
Finally, we can swap the roles of keys and values in an object with the invert() function (Figure 8-16).
> _(pizza).invert()
{10: "size", thin: "crust", true: "cheese", "pepperoni,sausage":
"toppings"}
Figure 8-16. The invert() function swaps keys and values in an object.
As the preceding example shows, Underscore.js can even invert an object if the value isn’t a simple type. In this case it takes an array, ["pepperoni","sausage"], and converts it to a value by joining the individual array elements with commas, creating the key "pepperoni,sausage".
Note also that JavaScript requires that all of an object’s keys are unique. That’s not necessarily the case for values. If you have an object in which multiple keys have the same value, then invert() keeps only the last of those keys in the inverted object. For example, _({key1: value, key2: value}).invert() returns {value: key2}.
Cleaning Up Object Subsets
When you want to clean up an object by eliminating unnecessary attributes, you can use Underscore.js’s pick() function. Simply pass it a list of attributes that you want to retain (Figure 8-17).
> _(pizza).pick( "size","crust")
{size: 10, crust: "thin"}
Figure 8-17. The pick() function selects specific properties from an object.
We can also do the opposite of pick() by using omit() and listing the attributes that we want to delete (Figure 8-18). Underscore.js keeps all the other attributes in the object.
> _(pizza).omit( "size","crust")
{cheese: true, toppings: [ "pepperoni","sausage"]}
Figure 8-18. The omit() function removes properties from an object.
Updating Attributes
When you are updating objects, a common requirement is to make sure that an object includes certain attributes and that those attributes have appropriate default values. Underscore.js includes two utilities for this purpose.
The two utilities, extend() and defaults(), both start with one object and adjust its properties based on those of other objects. If the secondary objects include attributes that the original object lacks, these utilities add those properties to the original. The utilities differ in how they handle properties that are already present in the original. The extend() function overrides the original properties with new values (see Figure 8-19):
> var standard = { size: 12, crust: "regular", cheese: true }
> var order = { size: 10, crust: "thin",
toppings: [ "pepperoni","sausage"] };
> _.extend(standard, order)
{ size: 10, crust: "thin", cheese: true,
toppings: [ "pepperoni","sausage"] };
Meanwhile, defaults() leaves the original properties unchanged (Figure 8-20):
> var order = { size: 10, crust: "thin",
toppings: [ "pepperoni","sausage"] };
> var standard = { size: 12, crust: "regular", cheese: true }
> _.defaults(order, standard)
{ size: 10, crust: "thin",
toppings [ "pepperoni","sausage"], cheese: true };
Figure 8-19. The extend() function updates and adds missing properties to an object.
Figure 8-20. The defaults() function adds missing properties to an object.
Note that both extend() and defaults() modify the original object directly; they do not make a copy of that object and return the copy. Consider, for example, the following:
> var order = { size: 10, crust: "thin",
toppings: [ "pepperoni","sausage"] };
> var standard = { size: 12, crust: "regular", cheese: true }
> var pizza = _.extend(standard, order)
{ size: 10, crust: "thin", cheese: true,
toppings: [ "pepperoni","sausage"] };
This code sets the pizza variable as you would expect, but it also sets the standard variable to that same object. More specifically, the code modifies standard with the properties from order, and then it sets a new variable pizza equal to standard. The modification of standard is probably not intended. If you need to use either extend() or defaults() in a way that does not modify input parameters, start with an empty object.
> var order = { size: 10, crust: "thin",
toppings: [ "pepperoni","sausage"] };
> var standard = { size: 12, crust: "regular", cheese: true }
> var pizza = _.extend({}, standard, order)
{ size: 10, crust: "thin", cheese: true,
toppings: [ "pepperoni","sausage"] };
This version gets us the desired pizza object without modifying standard.
Manipulating Collections
So far we’ve seen various Underscore.js tools that are suited specifically for either arrays or objects. Next, we’ll see some tools for manipulating collections in general. In Underscore.js both arrays and objects are collections, so the tools in this section can be applied to pure arrays, pure objects, or data structures that combine both. In this section, we’ll try out these utilities on an array of objects, since that’s the data structure we most often deal with in the context of data visualization.
Here’s a small data set we can use for the examples that follow. It contains a few statistics from the 2012 Major League Baseball season.
var national_league = [
{ name: "Arizona Diamondbacks", wins: 81, losses: 81,
division: "west" },
{ name: "Atlanta Braves", wins: 94, losses: 68,
division: "east" },
{ name: "Chicago Cubs", wins: 61, losses: 101,
division: "central" },
{ name: "Cincinnati Reds", wins: 97, losses: 65,
division: "central" },
{ name: "Colorado Rockies", wins: 64, losses: 98,
division: "west" },
{ name: "Houston Astros", wins: 55, losses: 107,
division: "central" },
{ name: "Los Angeles Dodgers", wins: 86, losses: 76,
division: "west" },
{ name: "Miami Marlins", wins: 69, losses: 93,
division: "east" },
{ name: "Milwaukee Brewers", wins: 83, losses: 79,
division: "central" },
{ name: "New York Mets", wins: 74, losses: 88,
division: "east" },
{ name: "Philadelphia Phillies", wins: 81, losses: 81,
division: "east" },
{ name: "Pittsburgh Pirates", wins: 79, losses: 83,
division: "central" },
{ name: "San Diego Padres", wins: 76, losses: 86,
division: "west" },
{ name: "San Francisco Giants", wins: 94, losses: 68,
division: "west" },
{ name: "St. Louis Cardinals", wins: 88, losses: 74,
division: "central" },
{ name: "Washington Nationals", wins: 98, losses: 64,
division: "east" }
];
Working with Iteration Utilities
In the first section, we saw some of the pitfalls of traditional JavaScript iteration loops as well as the improvements that functional programming can provide. Our Fibonacci example eliminated iteration by using recursion, but many algorithms don’t lend themselves to a recursive implementation. In those cases, we can still use a functional programming style, however, by taking advantage of the iteration utilities in Underscore.js.
The most basic Underscore utility is each(). It executes an arbitrary function on every element in a collection and often serves as a direct functional replacement for the traditional for (i=0; i<len; i++) loop.
> _(national_league).each(function(team) { console.log(team.name); })
Arizona Diamondbacks
Atlanta Braves
// Console output continues...
Washington Nationals
If you’re familiar with the jQuery library, you may know that jQuery includes a similar $.each() utility. There are two important differences between the Underscore .js and jQuery versions, however. First, the parameters passed to the iterator function differ between the two. Underscore.js passes (element, index, list) for arrays and (value, key, list) for simple objects, while jQuery passes (index, value). Secondly, at least as of this writing, the Underscore.js implementation can execute much faster than the jQuery version, depending on the browser. (jQuery also includes a $.map() function that’s similar to the Underscore.js method.)
The Underscore.js map() method iterates through a collection and transforms each element with an arbitrary function. It returns a new collection containing the transformed elements. Here, for example, is how to create an array of all the teams’ winning percentages:
> _(national_league).map(function(team) {
return Math.round(100*team.wins/(team.wins + team.losses);
})
[50, 58, 38, 60, 40, 34, 53, 43, 51, 46, 50, 49, 47, 58, 54, 60]
The reduce() method iterates through a collection and returns a single value. One parameter initializes this value, and the other parameter is an arbitrary function that updates the value for each element in the collection. We can use reduce(), for example, to calculate how many teams have a winning percentage over 500.
> _(national_league).reduce(
➊ function(count, team) {
➋ return count + (team.wins > team.losses);
},
➌ 0 // Starting point for reduced value
)
7
As the comment at ➊ indicates, we start our count at 0. That value is passed as the first parameter to the function at ➋, and the function returns an updated value at ➌.
NOTE
If you’ve followed the development of “big data” implementations such as Hadoop or Google’s search, you may know that the fundamental algorithm behind those technologies is MapReduce. Although the context differs, the same concepts underlie the map() and reduce() utilities in Underscore.js.
Finding Elements in a Collection
Underscore.js has several methods to help us find elements or sets of elements in a collection. We can, for example, use find() to get a team with more than 90 wins.
> _(national_league).find( function(team) { return team.wins > 90; })
{ name: "Atlanta Braves", wins: 94, losses: 68, division: "east" }
The find() function returns the first element in the array that meets the criterion. To find all elements that meet our criterion, use the filter() function.
> _(national_league).filter( function(team) { return team.wins > 90; })
[ { name: "Atlanta Braves", wins: 94, losses: 68, division: "east" },
{ name: "Cincinnati Reds", wins: 97, losses: 65, division: "central" },
{ name: "San Francisco Giants", wins: 94, losses: 68, division: "west" },
{ name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
]
The opposite of the filter() function is reject(). It returns an array of elements that don’t meet the criterion.
> _(national_league).reject( function(team) { return team.wins > 90; })
[ { name: "Arizona Diamondbacks", wins: 81, losses: 81, division: "west" },
{ name: "Chicago Cubs", wins: 61, losses: 101, division: "central" },
// Console output continues...
{ name: "St. Louis Cardinals", wins: 88, losses: 74, division: "central" }
]
If your criterion can be described as a property value, you can use a simpler version of filter(): the where() function. Instead of an arbitrary function to check for a match, where() takes for its parameter a set of properties that must match. We can use it to extract all the teams in the Eastern Division.
> _(national_league).where({division: "east"})
[ { name: "Atlanta Braves", wins: 94, losses: 68, division: "east" },
{ name: "Miami Marlins", wins: 69, losses: 93, division: "east" },
{ name: "New York Mets", wins: 74, losses: 88, division: "east" },
{ name: "Philadelphia Phillies", wins: 81, losses: 81, division: "east" },
{ name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
]
The findWhere() method combines the functionality of find() with the simplicity of where(). It returns the first element in a collection with properties that match specific values.
> _(national_league).where({name: "Atlanta Braves"})
{name: "Atlanta Braves", wins: 94, losses: 68, division: "east"}
Another Underscore.js utility that’s especially handy is pluck(). This function creates an array by extracting only the specified property from a collection. We could use it to extract an array of nothing but team names, for example.
> _(national_league).pluck( "team")
[
"Arizona Diamondbacks",
"Atlanta Braves",
/* Data continues... */,
"Washington Nationals"
]
Testing a Collection
Sometimes we don’t necessarily need to transform a collection; we simply want to check some aspect of it. Underscore.js provides several utilities to help with these tests.
The every() function tells us whether all elements in a collection pass an arbitrary test. We could use it to check if every team in our data set had at least 70 wins.
> _(national_league).every(function(team) { return team.wins >= 70; })
false
Perhaps we’d like to know if any team had at least 70 wins. In that case, the any() function provides an answer.
> _(national_league).any(function(team) { return team.wins >= 70; })
true
Underscore.js also lets us use arbitrary functions to find the maximum and minimum elements in a collection. If our criteria is number of wins, we use max() to find the “maximum” team.
> _(national_league).max(function(team) { return team.wins; })
{ name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
Not surprisingly, the min() function works the same way.
> _(national_league).min(function(team) { return team.wins; })
{ name: "Houston Astros", wins: 55, losses: 107, division: "central" }
Rearranging Collections
To sort a collection, we can use the sortBy() method and supply an arbitrary function to provide sortable values. Here’s how to reorder our collection in order of increasing wins.
> _(national_league).sortBy(function(team) { return team.wins; })
[ { name: "Houston Astros", wins: 55, losses: 107, division: "central" }
{ name: "Chicago Cubs", wins: 61, losses: 101, division: "central" },
// Data continues...
{ name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
We could also reorganize our collection by grouping its elements according to a property. The Underscore.js function that helps in this case is groupBy(). One possibility is reorganizing the teams according to their division.
> _(national_league).groupBy( "division")
{
{ west:
{ name: "Arizona Diamondbacks", wins: 81, losses: 81, division: "west" },
{ name: "Colorado Rockies", wins: 64, losses: 98, division: "west" },
{ name: "Los Angeles Dodgers", wins: 86, losses: 76, division: "west" },
{ name: "San Diego Padres", wins: 76, losses: 86, division: "west" },
{ name: "San Francisco Giants", wins: 94, losses: 68, division: "west" },
},
{ east:
{ name: "Atlanta Braves", wins: 94, losses: 68, division: "east" },
{ name: "Miami Marlins", wins: 69, losses: 93, division: "east" },
{ name: "New York Mets", wins: 74, losses: 88, division: "east" },
{ name: "Philadelphia Phillies", wins: 81, losses: 81,
division: "east" },
{ name: "Washington Nationals", wins: 98, losses: 64, division: "east" }
},
{ central:
{ name: "Chicago Cubs", wins: 61, losses: 101, division: "central" },
{ name: "Cincinnati Reds", wins: 97, losses: 65, division: "central" },
{ name: "Houston Astros", wins: 55, losses: 107, division: "central" },
{ name: "Milwaukee Brewers", wins: 83, losses: 79, division: "central" },
{ name: "Pittsburgh Pirates", wins: 79, losses: 83,
division: "central" },
{ name: "St. Louis Cardinals", wins: 88, losses: 74,
division: "central" },
}
}
We can also use the countBy() function to simply count the number of elements in each group.
> _(national_league).countBy( "division")
{west: 5, east: 5, central: 6}
NOTE
Although we’ve used a property value ("division") for groupBy() and countBy(), both methods also accept an arbitrary function if the criteria for grouping isn’t a simple property.
As a final trick, Underscore.js lets us randomly reorder a collection using the shuffle() function.
_(national_league).shuffle()
Summing Up
Although this chapter takes a different approach than the rest of the book, its ultimate focus is still on data visualizations. As we’ve seen in earlier chapters (and as you’ll certainly encounter in your own projects), the raw data for our visualizations isn’t always perfect as delivered. Sometimes we need to clean the data by removing invalid values, and other times we need to rearrange or transform it so that it’s appropriate for our visualization libraries.
The Underscore.js library contains a wealth of tools and utilities to help with those tasks. It lets us easily manage arrays, modify objects, and transform collections. Furthermore, Underscore.js supports an underlying philosophy based on functional programming, so our code that uses Underscore.js remains highly readable and resistant to bugs and defects.