Big Data Bootcamp: What Managers Need to Know to Profit from the Big Data Revolution (2014)
Introduction
Although earthquakes have been happening for millions of years and we have lots of data about them, we still can’t predict exactly when and where they’ll happen. Thousands of people die every year as a result and the costs of material damage from a single earthquake can run into the hundreds of billions of dollars.
The problem is that based on the data we have, earthquakes and almost-earthquakes look roughly the same, right up until the moment when an almost-earthquake becomes the real thing. But by then, of course, it’s too late.
And if scientists were to warn people every time they thought they recognized the data for what appeared to be an earthquake, there would be a lot of false-alarm evacuations. What’s more, much like the boy who cried wolf, people would eventually tire of false alarms and decide not to evacuate, leaving them in danger when the real event happened.
When Good Predictions Aren’t Good Enough
To make a good prediction, therefore, a few things need to be true. We must have enough data about the past to identify patterns. The events associated with those patterns have to happen consistently. And we have to be able to differentiate what looks like an event but isn’t from an actual event. This is known as ruling out false positives.
But a good prediction alone isn’t enough to be useful. For a prediction to be useful, we have to be able to act on a prediction early enough and fast enough for it to matter.
When a real earthquake is happening, the data very clearly indicates as much. The ground shakes, the earth moves, and, once the event is far enough along, the power goes out, explosions occur, poisonous gas escapes, and fires erupt. By that time, of course, it doesn’t take a lot of computers or talented scientists to figure out that something bad is happening.
So to be useful, the data that represents the present needs to look like that of the past far enough in advance for us to act on it. If we can only make the match a few seconds before the actual earthquake, it doesn’t matter. We need sufficient time to get the word out, mobilize help, and evacuate people.
What’s more, we need to be able to perform the analysis of the data itself fast enough to matter. Suppose we had data that could tell us a day in advance that an earthquake was going to happen. If it takes us two days to analyze that data, the data and our resulting prediction wouldn’t matter.
This at its core is both the challenge and the opportunity of Big Data. Just having data isn’t enough. We need relevant data early enough and we have to be able to analyze it fast enough that we have sufficient time to act on it. The sooner an event is going to happen, the faster we need to be able to make an accurate prediction. But at some point we hit the law of diminishing returns. Even if we can analyze immense amounts of data in seconds to predict an earthquake, such analysis doesn’t matter if there’s not enough time left to get people out of harm’s way.
Enter Big Data: Speedier Warnings and Lives Saved
On October 22, 2012, six engineers were sentenced to six-year jail sentences after being accused of inappropriately reassuring villagers about a possible upcoming earthquake. The earthquake occurred in 2009 in the town of L’Aquila, Italy; 300 villagers died.
Could Big Data have helped the geologists make better predictions?
Every year, some 7,000 earthquakes occur around the world of magnitude 4.0 or greater. Earthquakes are measured either on the well-known Richter scale, which assigns a number to the energy contained in an earthquake, or the more recent moment magnitude scale (MMS), which measures an earthquake in terms of the amount of energy released.1
When it comes to predicting earthquakes, there are three key questions that must be answered: when, where, and how big? In 2The Charlatan Game, Matthew A. Mabey of Brigham Young University argues that while there are precursors to earthquakes, “we can’t yet use them to reliably or usefully predict earthquakes.”
Instead, the best we can do is prepare for earthquakes, which happen a lot more often than people realize. Preparation means building bridges and buildings that are designed with earthquakes in mind and getting emergency kits together so that infrastructure and people are better prepared when a large earthquake strikes.
Earthquakes, as we all learned back in our grade school days, are caused by the rubbing together of tectonic plates—those pieces of the Earth that shift around from time to time.
Not only does such rubbing happen far below the Earth’s surface, but the interactions of the plates are complex. As a result, good earthquake data is hard to come by, and understanding what activity causes what earthquake results is virtually impossible.3
Ultimately, accurately predicting earthquakes—answering the questions of when, where, and how big—will require much better data about the natural elements that cause earthquakes to occur and their complex interactions.
Therein lies a critical lesson about Big Data: predictions are different than forecasts. Scientists can forecast earthquakes but they cannot predict them. When will San Francisco experience another quake like that of 1906, which resulted in more than 3,000 casualties? Scientists can’t say for sure.
They can forecast the probability that a quake of a certain magnitude will happen in a certain region in a certain time period. They can say, for example, that there is an 80% likelihood that a magnitude 8.4 earthquake will happen in the San Francisco Bay Area in the next 30 years. But they cannot say when, where, and how big that earthquake will happen with complete certainty. Thus the difference between a forecast and a prediction.4
But if there is a silver lining in the ugly cloud that is earthquake forecasting, it is that while earthquake prediction is still a long way off, scientists are getting smarter about buying potential earthquake victims a few more seconds. For that we have Big Data methods to thank.
Unlike traditional earthquake sensors, which can cost $3,000 or more, basic earthquake detection can now be done using low-cost sensors that attach to standard computers or even using the motion sensing capabilities built into many of today’s mobile devices for navigation and game-playing.5
The Stanford University Quake-Catcher Network (QCN) comprises the computers of some 2,000 volunteers who participate in the program’s distributed earthquake detection network. In some cases, the network can provide up to 10 seconds of early notification to those about to be impacted by an earthquake. While that may not seem like a lot, it can mean the difference between being in a moving elevator or a stationary one or being out in the open versus under a desk.
The QCN is a great example of the kinds of low-cost sensor networks that are generating vast quantities of data. In the past, capturing and storing such data would have been prohibitively expensive. But, as we will talk about in future chapters, recent technology advances have made the capture and storage of such data significantly cheaper—in some cases more than a hundred times cheaper than in the past.
Having access to both more and better data doesn’t just present the possibility for computers to make smarter decisions. It lets humans become smarter too. We’ll find out how in just a moment—but first let’s take a look at how we got here.
Big Data Overview
When it comes to Big Data, it’s not how much data we have that really matters, but what we do with that data.
Historically, much of the talk about Big Data has centered around the three Vs—volume, velocity and variety. Volume refers to the quantity of data you’re6 working with. Velocity means how quickly that data is flowing. Variety refers to the diversity of data that you’re working with, such as marketing data combined with financial data, or patient data combined with medical research and environmental data.
But the most important “V” of all is value. The real measure of Big Data is not its size but rather the scale of its impact—the value Big Data that delivers to your business or personal life. Data for data’s sake serves very little purpose. But data that has a positive and outsized impact on our business or personal lives truly is Big Data.
When it comes to Big Data, we’re generating more and more data every day. From the mobile phones we carry with us to the airplanes we fly in, today’s systems are creating more data than ever before. The software that operates these systems gathers immense amounts of data about what these systems are doing and how they are performing in the process. We refer to these measurements as event data and the software approach for gathering that data as instrumentation.
For example, in the case of a web site that processes financial transactions, instrumentation allows us to monitor not only how quickly users can access the web site, but also the speed at which the site can read information from a database, the amount of memory consumed at any given time by the servers the site is running on, and, of course, the kinds of transactions users are conducting on the site. By analyzing this stream of event data, software developers can dramatically improve response time, which has a significant impact on whether users and customers remain on a web site or abandon it.
In the case of web sites that handle financial or commerce transactions, developers can also use this kind of event stream data to reduce fraud by looking for patterns in how clients use the web site and detecting unusual behavior. Big Data-driven insights like these lead to more transactions processed and higher customer satisfaction.
Big Data provides insights into the behavior of complex systems in the real world as well. For example, an airplane manufacturer like Boeing can measure not only internal metrics such as engine fuel consumption and wing performance but also external metrics like air temperature and wind speed.
This is an example of how quite often the value in Big Data comes not from one data source by itself, but from bringing multiple data sources together. Data about wind speed alone might not be all that useful. But bringing data about wind speed, fuel consumption, and wing performance together can lead to new insights, resulting in better plane designs. These in turn provide greater comfort for passengers and improved fuel efficiency, resulting in lower operating costs for airlines.
When it comes to our personal lives, instrumentation can lead to greater insights about an altogether different complex system—the human body. Historically, it has often been expensive and cumbersome for doctors to monitor patient health and for us as individuals to monitor our own health. But now, three trends have come together to reduce the cost of gathering and analyzing health data.
These key trends are the widespread adoption of low-cost mobile devices that can be used for measurement and monitoring, the emergence of cloud-based applications to analyze the data these devices generate, and of course the Big Data itself, which in combination with the right analytics software and services can provide us with tremendous insights. As a result, Big Data is transforming personal health and medicine.
Big Data has the potential to have a positive impact on many other areas of our lives as well, from enabling us to learn faster to helping us stay in the relationships we care about longer. And as we’ll learn, Big Data doesn’t just make computers smarter—it makes human beings smarter too.
How Data Makes Us Smarter
If you’ve ever wished you were smarter, you’re not alone. The good news, according to recent studies, is that you can actually increase the size of your brain by adding more data.
To become licensed to drive, London cab drivers have to pass a test known somewhat ominously as “the Knowledge,” demonstrating that they know the layout of downtown London’s 25,000 streets as well as the location of some 20,000 landmarks. This task frequently takes three to four years to complete, if applicants are able to complete it at all. So do these cab drivers actually get smarter over the course of learning the data that comprises the Knowledge?7
It turns out that they do.
Data and the Brain
Scientists once thought that the human brain was a fixed size. But brains are “plastic” in nature and can change over time, according to a study by Professor Eleanor Maguire of the Wellcome Trust Centre for Neuroimaging at University College London.8
The study tracked the progress of 79 cab drivers, only 39 of whom ultimately passed the test. While drivers cited many reasons for not passing, such as a lack of time and money, certainly the difficulty of learning such an enormous body of information was one key factor. According to the City of London web site, there are just 25,000 licensed cab drivers in total, or about one cab driver for every street.9
After learning the city’s streets for years, drivers evaluated in the study showed “increased gray matter” in an area of the brain called the posterior hippocampus. In other words, the drivers actually grew more cells in order to store the necessary data, making them smarter as a result.
Now, these improvements in memory did not come without a cost. It was harder for drivers with expanded hippocampi to absorb new routes and to form new associations for retaining visual information, according to another study by Maguire. 10
Similarly, in computers, advantages in one area also come at a cost to other areas. Storing a lot of data can mean that it takes longer to process that data. Storing less data may produce faster results, but those results may be less informed.
Take for example the case of a computer program trying to analyze historical sales data about merchandise sold at a store so it can make predictions about sales that may happen in the future.
If the program only had access to quarterly sales data, it would likely be able to process that data quickly, but the data might not be detailed enough to offer any real insights. Store managers might know that certain products are in higher demand during certain times of the year, but they wouldn’t be able to make pricing or layout decisions that would impact hourly or daily sales.
Conversely, if the program tried to analyze historical sales data tracked on a minute-by-minute basis, it would have much more granular data that could generate better insights, but such insights might take more time to produce. For example, due to the volume of data, the program might not be able to process all the data at once. Instead, it might have to analyze one chunk of it at a time.
Big Data Makes Computers Smarter and More Efficient
One of the amazing things about licensed London cab drivers is that they’re able to store the entire map of London, within six miles of Charing Cross, in memory, instead of having to refer to a physical map or use a GPS.
Looking at a map wouldn’t be a problem for a London cab driver if the driver didn’t have to keep his eye on the road and hands on the steering wheel, and if he didn’t also have to make navigation decisions quickly. In a slower world, a driver could perhaps plot out a route at the start of a journey, then stop and make adjustments along the way as necessary.
The problem is that in London’s crowded streets no driver has the luxury to perform such slow calculations and recalculations. As a result, the driver has to store the whole map in memory. Computer systems that must deliver results based on processing large amounts of data do much the same thing: they store all the data in one storage system, sometimes all in memory, sometimes distributed across many different physical systems. We’ll talk more about that and other approaches to analyzing data quickly in the chapters ahead.
Fortunately if you want a bigger brain, memorizing the London city map isn’t the only way to increase the size of your hippocampus. The good news, according to another study, is that exercise can also make your brain bigger.11
As we age, our brains shrink, leading to memory impairment. According to the authors of the study, who did a trial with 120 older adults, exercise training increased the size of the hippocampal volume of these adults by 2%, which was associated with improved memory function. In other words, keeping sufficient blood flowing through our brains can help prevent us from getting dumber. So if you want to stay smart, work out.
Unlike humans, however, computers can’t just go to the gym to increase the size of their memory. When it comes to computers and memory, there are three options: add more memory, swap data in and out of memory, or compress the data.
A lot of data is redundant. Just think of the last time you wrote a sentence or multiplied some large numbers together. Computers can save a lot of space by compressing repeated characters, words, or even entire phrases in much the same way that court reporters use shorthand so they don’t have to type every word.
Adding more memory is expensive, and typically the faster the memory, the more expensive it is. According to one source, Random Access Memory or RAM is 100,000 times faster than disk memory. But it is also about 100 times more expensive.12
It’s not just the memory itself that costs so much. More memory comes with other costs as well.
There are only so many memory chips that can fit in a typical computer, and each memory stick can hold a certain number of chips. Power and cooling are issues too. More electronics require more electricity and more electricity generates more heat. Heat needs to be dissipated or cooled, which in and of itself requires more electricity (and generates more heat). All of these factors together make the seemingly simple task of adding more memory a fairly complex one.
Alternatively, computers can just use the memory they have available and swap the needed information in and out. Instead of trying to look at all available data about car accidents or stock prices at once, for example, a computer can load yesterday’s data, then replace that with data from the day before, and so on. The problem with such an approach is that if you’re looking for patterns that span multiple days, weeks, or years, swapping all that data in and out takes a lot of time and makes it hard to find patterns.
In contrast to machines, human beings don’t require a lot more energy to use more brainpower. According to an article in Scientific American, the brain “continuously slurps up huge amounts of energy.”13
But all that energy is remarkably small compared to that required by computers. According to the same article, “a typical adult human brain runs on around 12 watts—a fifth of the power required by a standard 60 watt light bulb.” In contrast, “IBM’s Watson, the supercomputer that defeatedJeopardy! champions, depends on ninety IBM Power 750 servers, each of which requires around one thousand watts.” What’s more, each server weighs about 120 pounds.
When it comes to Big Data, one challenge is to make computers smarter. But another challenge is to make them more efficient.
On February 16, 2011, a computer created by IBM known as Watson beat two Jeopardy! champions to win $77,147. Actually, Watson took home $1 million in prize money for winning the epic man versus machine battle. But was Watson really smart in the way that the other two contestants on the show were? Can Watson think for itself?
With an estimated $30 million in research and development investment, 200 million pages of stored content, and some 2,800 processor cores, there’s no doubt that Watson is very good at answering Jeopardy! questions.
But it’s difficult to argue that Watson is intelligent in the way that, say, HAL was in the movie 2001: A Space Odyssey. And Watson isn’t likely to express its dry humor like one of the show’s other contestants, Ken Jennings, who wrote “I for one welcome our new computer overlords,” alongside his final Jeopardy! answer. What’s more, Watson can’t understand human speech; rather, the computer is restricted to processing Jeopardy! answers in the form of written text.
Why can’t Watson understand speech? Watson’s designers felt that creating a computer system that could come up with correct Jeopardy! questions was hard enough. Introducing the problem of understanding human speech would have added an extra layer of complexity. And that layer is a very complex one indeed.
Although there have been significant advances in understanding human speech, the solution is nowhere near flawless. That’s because, as Markus Forsberg at the Chalmers Institute of Technology points out, understanding human speech is no simple matter.14
Speech would seem to fit at least some of the requirements for Big Data. There’s a lot of it and by analyzing it, computers should be able to create patterns for recognizing it when they see it again. But computers face many challenges in trying to understand speech.
As Forsberg points out, we not only use the actual sound of speech to understand it but also an immense amount of contextual knowledge. Although the words “two” and “too” sound alike, they have very different meanings. This is just the start of the complexity of understanding speech. Other issues are the variable speeds at which we speak, accents, background noise, and the continuous nature of speech—we don’t pause between each word, so trying to convert individual words into text is an insufficient approach to the speech recognition problem.
Even trying to group words together can be difficult. Consider the following examples cited by Forsberg:
· It’s not easy to wreck a nice beach.
· It’s not easy to recognize speech.
· It’s not easy to wreck an ice beach.
Such sentences sound very similar yet at the same time very different.
But computers are making gains, due to a combination of the power and speed of modern computers, combined with advanced new pattern-recognition approaches. The head of Microsoft’s15 research and development organization stated that the company’s most recent speech recognition technology is 30% more accurate than the previous version—meaning that instead of getting one out of every four or five words wrong, the software gets only one out of every seven or eight incorrect. Pattern recognition is also being used for tasks like machine-based translation—but as users of Google Translate will attest, these technologies still have a long way to go.
Likewise, computers are still far off from being able to create original works of content, although, somewhat amusingly, people have tried to get them to do so. In one recent experiment, a programmer created a series of virtual programs to simulate monkeys typing randomly on keyboards, with the goal of answering the classic question of whether monkeys could recreate the works of William Shakespeare.16 The effort failed, of course.
But computers are getting smarter. So smart, in fact, that they can now drive themselves.
How Big Data Helps Cars Drive Themselves
If you’ve used the Internet, you’ve probably used Google Maps. The company, well known for its market dominating search engine, has accumulated more than 20 petabytes of data for Google Maps. To put that in perspective, it would take more than 82,000 256 GB hard drives of a typical Apple MacBook Pro computer to store all that data.17
But does all that data really translate into cars that can drive themselves? In fact, it does. In an audacious project to build self-driving cars, Google combines a variety of mapping data with information from a real-time laser detection system, multiple radars, GPS, and other devices that allow the system to “see” traffic, traffic lights, and roads, according to Sebastian Thrun, a Stanford University professor who leads the project at Google. 18
Self-driving cars not only hold the promise of making roads safer, but also of making them more efficient by better utilizing the vast amount of empty space between cars on the road. According to one source, some 43,000 people in the United States die each year from car accidents and there are some five and a quarter million accidents per year in total.19
Google Cars can’t think for themselves, per se, but they can do a great job at pattern matching. By combining existing data from maps with real-time data from a car’s sensors, the cars can make driving decisions. For example, by matching against a database of what different traffic lights look like, self-driving cars can determine when to start and stop.
All of this would not be possible, of course, without three key elements that are a common theme of Big Data. First, the computer systems in the cars have access to an enormous amount of data. Second, the cars make use of sensors that take in all kinds of real-time information about the position of other cars, obstacles, traffic lights, and terrain. While these sensors are expensive today—the total cost of equipment for a self-driving equipped car is approximately $150,000—the sensors are expected to decrease in cost rapidly.
Finally, the cars can process all that data at a very high speed and make corresponding real-time decisions about what to do next as a result—all with a little computer equipment and a lot of software in the back seat.
To put that in perspective, consider that just a little over 60 years ago, the UNIVAC computer, known for successfully predicting the results of the Eisenhower presidential election, took up as much space as a single car garage.20
How Big Data Enables Computers to Detect Fraud
All of this goes to show that computers are very good at performing high-speed pattern matching. That’s a very useful ability not just on the road but off the road as well. When it comes to detecting fraud, fast pattern matching is critical.
We’ve all gotten that dreaded call from the fraud-prevention department of our credit card company. The news is never good—the company believes our credit card information has been stolen and that someone else is buying things at the local hardware store in our name. The only problem is that the local hardware store in question is 5,000 miles away.
Computers that can process greater amounts of data at the same time can make better decisions, decisions that have an impact on our daily lives. Consider the last time you bought something with your credit card online, for example.
When you clicked that Submit button, the action of the web site charging your card triggered a series of events. The proposed transaction was sent to computers running a complex set of algorithms used to determine whether you were you or whether someone was trying to use your credit card fraudulently.
The trouble is that figuring out whether someone is a fraudster or who they really claim to be is a hard problem. With so many data breaches and so much personal information available online, it’s often the case that fraudsters know almost as much about you as you do.
Computer systems detect whether you are who you say you are in a few basic ways. They verify information. When you call into your bank and they ask for your name, address, and mother’s maiden name, they compare the information you give them with the information they have on file. They may also look at the number you’re calling from and see if it matches the number they have for you on file. If those pieces of information match, it’s likely that you are who you say you are.
Computer systems also evaluate a set of data points about you to see if those seem to verify you are who you say you are or reduce that likelihood. The systems produce a confidence score based on the data points.
For example, if you live in Los Angeles and you’re calling in from Los Angeles, that might increase the confidence score. However, if you reside in Los Angeles and are calling from Toronto, that might reduce the score.
More advanced scoring mechanisms (called algorithms) compare data about you to data about fraudsters. If a caller has a lot of data points in common with fraudsters, that might indicate that someone is a fraudster.
If the user of a web site is connecting from a computer other than the one they’ve connected from in the past, they have an out-of-country location (say Russia when they typically log in from the United States), and they’ve attempted a few different passwords, that could be indicative of a fraudster. The computer system compares all of these identifiers to common patterns of behavior for fraudsters and common patterns of behavior for you, the user, to see whether the identity confidence score should go up or down.
Lots of matches with fraudster patterns or differences from your usual behavior and the score goes down. Lots of matches with your usual behavior and the score goes up.
The problem for computers, however, is two-fold. First, they need a lot of data to figure out what your usual behavior is and what the behavior of a fraudster is. Second, once the computer knows those things, it has to be able to compare your behavior to these patterns while also performing that task for millions of other customers at the same time.
So when it comes to data, computers can get smarter in two ways. Their algorithms for detecting normal and abnormal behavior can improve and the amount of data they can process at the same time can increase.
What really puts both computers and cab drivers to the test, therefore, is the need to make decisions quickly. The London cab driver, like the self-driving car, has to know which way to turn and make second-by-second decisions depending on traffic and other conditions. Similarly, the fraud-detection program has to decide whether to approve or deny your transaction in a matter of seconds.
As Robin Gilthorpe, former CEO of Terracotta, a technology company, put it, “no one wants to be the source of a ‘no,’ especially when it comes to e-commerce.”21 A denied transaction to a legitimate customer means not only a lost sale but an unhappy customer. And yet denying fraudulent transactions is the key to making non-fraudulent transactions work.
Peer-to-peer payments company PayPal found that out firsthand when the company had to build technology early on to combat fraudsters, as early PayPal analytics expert Mike Greenfield has pointed out. Without such technology, the company would not have survived and people wouldn’t have been able to make purchases and send money to each other as easily as they were able to.22
Better Decisions Through Big Data
As with any new technology, Big Data is not without its risks. Data in the wrong hands can be used for malicious purposes, and bad data can lead to bad decisions. As we continue to generate more data and as the software we use to analyze that data becomes more sophisticated, we must also become more sophisticated in how we manage and use the data and the insights we generate. Big Data is no substitute for good judgment.
When it comes to Big Data, human beings can still make bad decisions—such as running a red light, taking a wrong turn, or drawing a bad conclusion. But as we’ve seen here, we have the potential, through behavioral changes, to make ourselves smarter. We’ve also seen that technology can help us be more efficient and make fewer mistakes—the self-driving car, for example, can help us avoid driving through that red light or taking a wrong turn. In fact, over the next few decades, such technology has the potential to transform the entire transportation industry.
When it comes to making computers smarter, that is, enabling computers to make better decisions and predictions, what we’ve seen is that there are three main factors that come into play: data, algorithms, and speed.
Without enough data, it’s hard to recognize patterns. Enough data doesn’t just mean having all the data. It means being able to run analysis on enough of that data at the same time to create algorithms that can detect patterns. It means being able to test the results of the analysis to see if our conclusions are correct. Sampling one day of data might be useless, but sampling 10 years of data might produce results.
At the same time, all the data in the world doesn’t mean anything if we can’t process it fast enough. If you have to wait 10 minutes while standing in the grocery line for a fraud-detection algorithm to determine whether you can use your credit card, you’re not likely to use that credit card for much longer. Similarly, if self-driving cars can only go at a snail’s pace because they need more time to figure out whether to stop or move forward, no one will adopt self-driving cars. So speed plays a critical factor as well when it comes to Big Data.
We’ve also seen that computers are incredibly efficient at some tasks, such as detecting fraud by rapidly analyzing vast quantities of similar transactions. But they are still inefficient relative to human beings at other tasks, such as trying to convert the spoken word into text. That, as we’ll explore in the chapters ahead, constitutes one of the biggest opportunities in Big Data, an area called unstructured data.
Roadmap of the Book
In Big Data Bootcamp, we’ll explore a range of different topics related to Big Data. In Chapter 1, we’ll look at what Big Data is and how big companies like Amazon, Facebook, and Google are putting Big Data to work. We’ll explore the dramatic shift in information technology, in which competitive advantage is coming less and less from technology itself than from information that is enabled by technology. We’ll also dive into Big Data Applications (BDAs) and see how companies no longer need to build as much themselves and can instead rely on off-the-shelf applications to meet their Big Data needs, while they focus on the business problems they want to solve.
In Chapter 2, we’ll look at the Big Data Landscape in detail. Originally a way for me to map out the Big Data space, the Big Data Landscape has become an entity in its own right, now used as an industry and government reference. We’ll look at where venture capital investments are going and where exciting new companies are emerging to make Big Data ever more accessible to a wider audience.
Chapters 3, 4, and 5 explore Big Data from a few different angles. First, we’ll lay the groundwork in Chapter 3 as we cover how to create your own Big Data roadmap. We’ll look at how to choose new technologies and how to work with the ones you’ve already got—as well as at the emerging role of the chief data officer.
In Chapter 4 we’ll explore the intersection of Big Data and design and how leading companies like Apple and Facebook find the right balance between relying on data and intuition in designing new products. In Chapter 5, we’ll cover data visualization and the powerful ways in which it can make complex data sets easy to understand. We’ll also cover some popular tools, readily available public data sets, and how you can get started creating your own visualizations in the cloud or on your desktop.
Starting in Chapter 6, we look at the all-important intersection of Big Data, mobile, and cloud computing and how these technologies are coming together to disrupt multiple billion-dollar industries. You’ll learn what you need to know to transform your own with cloud, mobile, and Big Data capabilities.
In Chapter 7, we’ll go into detail about how to do your own Big Data project. We’ll cover the resources you need, the cloud technologies available, and who you’ll need on your team to accomplish your Big Data goals. We’ll cover three real-world case studies: churn reduction, marketing analytics, and the connected car. These critical lessons can be applied to nearly any Big Data business problem.
Building on everything we’ve learned about Big Data, we’ll jump back into the business of Big Data in Chapter 8, where we explore opportunities for new businesses that take advantage of the Big Data opportunity. We’ll also look at the disruptive subscription and cloud-based delivery models of Software as a Service (SaaS) and how to apply it to your Big Data endeavors. In Chapter 9, we’ll look at Big Data from the marketing perspective—how you can apply Big Data to reach and interact with customers more effectively.
Finally, in chapters 9, 10, and 11 we’ll explore how Big Data touches not just our business lives but our personal lives as well, in the areas of health and well-being, education, and relationships. We’ll cover not only some of the exciting new Big Data applications in these areas but also the many opportunities to create new businesses, applications, and products.
I look forward to joining you on the journey as we explore the fascinating topic of Big Data together. I hope you will enjoy reading about the tremendous Big Data opportunities available to you as much as I enjoy writing about them.
____________________
1http://www.gps.caltech.edu/uploads/File/People/kanamori/HKjgr79d.pdf
2http://www.dnr.wa.gov/Publications/ger_washington_geology_2001_v28_no3.pdf
3http://www.planet-science.com/categories/over-11s/natural-world/2011/03/can-we-predict-earthquakes.aspx
4http://ajw.asahi.com/article/globe/feature/earthquake/AJ201207220049
5http://news.stanford.edu/news/2012/march/quake-catcher-warning-030612.html
6This definition was first proposed by industry analyst Doug Laney in 2001.
7http://www.tfl.gov.uk/businessandpartners/taxisandprivatehire/1412.aspx
8http://www.scientificamerican.com/article.cfm?id=london-taxi-memory
9http://www.tfl.gov.uk/corporate/modesoftransport/7311.aspx
10http://www.ncbi.nlm.nih.gov/pubmed/19171158
11http://www.pnas.org/content/early/2011/01/25/1015950108.full.pdf
12http://research.microsoft.com/pubs/68636/ms_tr_99_100_rules_of_thumb_in_data_engineering.pdf
13http://www.scientificamerican.com/article.cfm?id=thinking-hard-calories
14http://www.speech.kth.se/~rolf/gslt_papers/MarkusForsberg.pdf
15http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html?pagewanted=2&_r=0
16http://www.bbc.co.uk/news/technology-15060310
17http://mashable.com/2012/08/22/google-maps-facts/
18http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/how-google-self-driving-car-works
19http://www.usacoverage.com/auto-insurance/how-many-driving-accidents-occur-each-year.html
20http://ed-thelen.org/comp-hist/UNIVAC-I.html
21Briefing with Robin Gilthorpe, October 30, 2012.
22http://numeratechoir.com/2012/05/