Time Series Data in Practical Machine Learning - Time Series Databases: New Ways to Store and Access Data (2014)

Time Series Databases: New Ways to Store and Access Data (2014)

Chapter 6. Time Series Data in Practical Machine Learning

With the increasing availability of large-scale data, machine learning is becoming a common tool that businesses use to unlock the potential value in their data. There are several factors at work to make machine learning more accessible, including the development of new technologies and practical approaches.

Many machine-learning approaches are available for application to time series data. We’ve already alluded to some in this book and in Practical Machine Learning: A New Look at Anomaly Detection, an earlier short book published by O’Reilly. In that book, we talked about how to address basic questions in anomaly detection, especially how determine what normal looks like, and how to detect deviations from normal.

Keep in mind that with anomaly detection, the machine-learning model is trained offline to learn what normal is and to set an adaptive threshold for anomaly alerts. Then new data, such as sensor data, can be assessed to determine how similar the new data is to what the model expects. The degree of mismatch to the model expectations can be used to trigger an alert that signals apparent faults or discrepancies as they occur. Sensor data is a natural fit to be collected and stored as a time series database. Sensors on equipment or system logs for servers can generate an enormous amount of time-based data, and with new technologies such as the Apache Hadoop–based NoSQL systems described in this book, it is now feasible to save months or even years of such data in time series databases.

But is it worthwhile to do so?

Predictive Maintenance Scheduling

Let’s consider a straightforward but very important example to answer this question. Suppose a particular piece of critically important equipment is about to fail. You would like to be able to replace the part before a costly disaster.



Sorry to interrupt the festivities, Dave, but I think we’ve got a problem.


What is it, Hal?


MY F.P.C. shows an impending failure of the antenna orientation unit.


The A.O. unit should be replaced within the next 72 hours.

It would be good if you could see signs leading up to the failure so that you could do preventive maintenance. If the piece is something such as a wind turbine, a pump in a drilling rig, or a component of a jet engine such as the one shown in Figure 6-1, the consequences of the failure can be dire. Part of the problem is that you may not know what to watch for. That’s where a retrospective study can help.

Predictive maintenance scheduling—replacing parts before a serious problem occurs—is a huge benefit in systems with expensive and highly critical equipment such as this turbine inside a jet engine.

Figure 6-1. Predictive maintenance scheduling—replacing parts before a serious problem occurs—is a huge benefit in systems with expensive and highly critical equipment such as this turbine inside a jet engine.

If you keep good, detailed long-term histories of maintenance on essential components of equipment down to the level of the part number, location, dates it went into use, notes on wear, and the dates of any failures, you may be able to reconstruct the events or conditions that led up to failures and thus build a model for how products wear out, or you may find predictive signs or even the cause of impending trouble. This type of precise, long-term maintenance history is not a time series, but coupled with a time series database of sensor data that records operating conditions, you have a powerful combination to unlock the insights you need. You can correlate the observations your sensors have made for a variety of parameters during the days, weeks, or months leading up to the part failure or up to an observed level of wear that is disturbing. This pattern of retrospective machine learning analysis on the combination of a detailed maintenance history and a long-term time series database has widespread applicability in transportation, manufacturing, health care, and more.

Why do you need to go to the trouble of saving the huge amount of time series sensor data for long time ranges, such as years, rather than perhaps just a month? It depends of course on your particular situation and what the opportunity cost of not being able to do this style of predictive maintenance may be. But part of the question to ask yourself is: what happens if you only save a month of sensor data at a time, but the critical events leading up to a catastrophic part failure happened six weeks or more before the event? Maybe temperatures exceeded a safe range or an outside situation caused an unusual level of vibration in the component for a short time two months earlier. When you try to reconstruct events before the failure or accident, you may not have the relevant data available any more. This situation is especially true if you need to look back over years of performance records to understand what happened in similar situations in the past.

The better alternative is to make use of the tools described in this report so that it is practical to keep much longer time spans for your sensor data along with careful maintenance histories. In the case of equipment used in jet aircraft, for instance, it is not only the airline that cares about a how equipment performs at different points in time and what the signs of wear or damage are. Some manufacturers of important equipment also monitor ongoing life histories of the parts they produce in order to improve their own design choices and to maintain quality.

Manufacturers are not only concerned with collecting sensor data to monitor how their equipment performs in factories during production; they also want to manufacture smart equipment that reports on its own condition as it is being used by the customer. The manufacturer can include a service to monitor and report on the status of a component in order to help the customer optimize function through tuning. This might involve better fuel consumption, for example. These “smart parts” are of more value than mute equipment, so they may give the manufacturer a competitive edge in the marketplace, not to mention the benefits they provide the customer who purchases them.

The benefits of this powerful combination of detailed maintenance histories plus long-term time series databases of sensor data for machine learning models can, in certain, industries, be enormous.