Building Models That Operate Internet-of-Things Devices - Using Data Science to Extract Meaning from Your Data - Data Science For Dummies (2016)

Data Science For Dummies (2016)

Part 2

Using Data Science to Extract Meaning from Your Data

Chapter 8

Building Models That Operate Internet-of-Things Devices

IN THIS CHAPTER

check Grasping IoT vocabulary and technology components

check Seeing how data science supports the IoT

check Grasping the powerful combination of and IoT

The Internet of things (IoT) is a network of connected devices that use the Internet to communicate amongst each other. That sounds sort of scary, right — like the movie Her, where machine-machine communications allow machines to begin thinking and acting autonomously? But actually, IoT represents the next level of insight, efficiency, and virtual assistance — the stuff we modern humans love and crave.

The rise of IoT has been facilitated by three major factors:

· An increased adoption of machine learning practices

· Increased deployment of sensor technologies

· Improved real-time data processing capabilities

The good news for data scientists is that data science is at the root of each of these three factors, making the IoT an ideal area for data scientists to develop expertise.

Just like data science, IoT itself is not the endgame. What’s most inspiring and impressive about IoT is how it’s deployed within different vertical markets — niche areas of commercial and industrial application (for example, manufacturing, oil and gas, retail, banking, and so on). For some examples, consider the following types of emerging technologies:

· Industrial processing: Early detection of equipment failure is accessible via real-time processing of vibration sensor data.

· Environmental: Sensor-enabled urban monitoring and recommendations that are constructed from real-time readings from devices that measure urban air quality, visibility, traffic patterns, water quality, fire hazards, and noise pollution (ambient or otherwise).

· Fitness: Real-time fitness tracking and exercise recommendations are accessible via real-time processing and analysis of 3-dimensional motion sensor data.

Read on to learn how IoT works, the technologies that support it, and the advancements it promises to foster.

Overviewing the Vocabulary and Technologies

The Internet of things is its own class of technology. It has its own vocabulary and its own set of underlying technologies. Before getting into IoT data science, take a moment to familiarize yourself with them in the next four sections.

Learning the lingo

Before delving into the data science and innovation that’s related to IoT, you need a grasp of the fundamental vocabulary. The fog — or IoT cloud — is a network of cloud services that connect to IoT-enabled devices. Cloud-based big data processing and analytics requirements are supported by these IoT cloud services. They use cloud-based data processing and analytics to support the IoT by facilitating intelligent, adaptive, and autonomous device operations.

Edge devices are the IoT-enabled devices that are connected to the IoT cloud. Besides being connected to the fog, these devices all share one thing in common: They generate data through any number of appliances, including sensors, odometers, cameras, contact sensors, pressure sensors, laser scanners, thermometers, smoke detectors, microphones, electric meters, gas meters, water flow meters, and much more.

warning IoT-connected devices generate too much data! Some estimates put data generation by these devices at more than 2.5 million terabytes per day. Without proper filtering and treatment, the volume and velocity of this data could render it useless. To optimize device operations using the data that these devices generate, data storage and processing must be strategic.

The good news is that not all the data that’s produced on edge devices needs to be moved to the cloud for processing, storage, and analytics. In fact, most edge devices come equipped with device-embedded applications that are capable of processing and deriving insights locally, using the data that’s created by device appliances in real-time. Local data processing and analytic deployment is called edge processing, and it helps save resources by

· Detecting data that is useful to the analytics operations running on the device and discarding the rest: This lowers the data transfer and storage overhead.

· Handling analytic deployments locally, doing away with the need to transfer data to and from the cloud: A side benefit of these device-embedded analytics applications is that they return results faster than if the data is processed in the cloud.

Whether processing happens locally or on the cloud, IoT analytic applications that implement adaptive machine learning algorithms are called adaptive IoT applications. These adaptive IoT applications enable devices to adjust and adapt to the local conditions in which the device is operating. Later in this chapter, you can see an overview of popular machine learning methods for data science in IoT. Figure 8-1 illustrates some of these components to help pull them together into a conceptual schematic.

image

FIGURE 8-1: Conceptual schematic of the IoT network.

Like most other things related to the IoT, IoT professionals are a breed of their own. IoT cloud application developers are data scientists and engineers who focus exclusively on building adaptive IoT applications for deployment on local devices. The more general IoT developer, on the other hand, is responsible for building products and systems that serve the greater needs of the IoT cloud at-large, including all its connected IoT devices, data sources, and cloud computing environments.

Procuring IoT platforms

IoT platforms are broken into hardware platforms and software platforms. IoT hardware platforms are hardware components that you can use to connect devices to the IoT cloud, to stream data, and to manage device operations locally. Each platform offers its own set of core features, so you’ll need to do some research into which meets your specific needs; some popular IoT hardware platforms are Raspberry Pi, Intel Edison, and Arduino products. IoT software platforms offer services such as device management, integration, security, data collection protocols, analytics, and limited data visualization. Again, each solution offers its own, unique blend of features, so do the research; major vendors are AWS IoT platform and IBM IoT Foundation Device Cloud.

Spark streaming for the IoT

Spark is an ideal framework for integrated real-time big data processing and analysis. With respect to the IoT, each IoT sensor stream can be transformed into Spark DStreams — discreet data streams that are the fundamental data abstraction in the Spark Streaming module (the module where data processing is carried out). After you have your data in DStreams, it’s then quite simple to construct automated analytical operations that filter, process, and detect based on DStream content. Depending on what’s detected, real-time notifications and alerts are issued back to IoT applications regarding mission-critical insights. You can use the Spark Streaming window operations on DStream sources to quickly and easily aggregate processing and alerting to any regular time intervals of your choosing. Lastly, for comparative analytics, you can use Spark’s Resilient Distributed Datasets (RDD) — an immutable collection of objects, and a fundamental Spark data structure — to store any relevant historical datasets in-memory.

warning Edge devices often experience data transmission delays due to things like network congestion and intermittent network connectivity. High latency is not uncommon. To get around this, make sure to analyze data by the timestamp from when the machine data was generated, not from when it arrives back at the IoT cloud.

Getting context-aware with sensor fusion

Major IoT advancements are being made in contextual-awareness — where sensors are generating data that can be used for real-time context-aware services rendered by the device that’s generating the data. This context awareness is facilitated by a technology called sensor fusion — where data from several different sensors is fused by a microcontroller to produce a broader, more detailed view on what’s happening in a local environment. Technologies that support sensor fusion include EM Microelectronic, NXP, and even Apache Flink.

Digging into the Data Science Approaches

If you want to build predictive IoT models and applications, you need to know Python and SQL, covered in Chapter 14 and Chapter 16, respectively. You can use Python for data wrangling, visualization, time series analysis, and machine learning. Knowing SQL is useful for querying data from traditional databases or from the Hadoop Distributed File System. (I tell you more about this topic in Chapter 2.) Read on to learn more about specific analytical methods as they relate to IoT data science.

warning An issue with IoT data, and sensor data specifically, is that it’s often sparse — most of its values are empty, or “NaN” — “not a number.” Be prepared to do a lot of imputing missing values — replacing missing values with approximations — when you’re preprocessing data for analysis.

Taking on time series

Most IoT sensor data is composed of time series, so you should be adept at building and using time series models. One way that time series models are useful in the IoT is for decreasing the data transmission overhead for a wireless sensor network. (You’ll understand why after you read the following list.) These two time series models are important for IoT data science:

· Moving average models: Moving average models make forecasts based on the average of past observations. These models update the forecast whenever any significant deviations from predicted values are detected. Moving average models are important in IoT because of their automated model update feature, as explained further in the following bullet.

· Autoregressive integrated moving average models (ARIMA): ARIMA combines the autoregressive moving average (ARMA) class of forecasting methods (which I tell you about in Chapter 5) with the process of differencing — making a time series stationary by calculating the difference between consecutive observations. By deploying the ARIMA model on a sensor node, you can significantly decrease the amount of data that is transmitted back to the IoT cloud for analysis. That’s because only the data that falls outside of the prediction error range will get sent, and because the model continually updates with significant changes in the sensor readings.

Geospatial analysis

Just as sensor nodes create data that’s labeled with a timestamp, they also produce data that’s labeled with a geospatial location stamp. Each observation occurs at its given time — and place — so location is a big deal when it comes to the IoT. Many IoT applications consider an edge device’s location, and nearness, with respect to other connected devices. All of this requires multidimensional geospatial data processing and analytics capabilities, which only a GIS application — a geographic information system application — is designed to offer. GIS, coupled with IoT network and data technologies, facilitate real-time geo-space-time analytics, enabling geo-insights to be delivered at the right time and place, precisely when these insights are actionable. Real-time geospatial analytics generate serious, sometimes life-saving, results when you use them to do things like this:

· Identify and engage local customers while they are in your vicinity.

· Monitor field assets for early-warning signs of equipment failure.

· Evoke real-time situationally aware emergency response.

Dabbling in deep learning

Deep learning is an exciting development within IoT. That’s because deep learning enables adaptive autonomous operations of the machine network. As you may recall from Chapter 4, deep learning is a machine learning algorithm that deploys layers of hierarchical neural networks to learn from data in an iterative and adaptive way. Similar to how the moving average and ARIMA models update on their own, deep learning models are able to adjust to and learn from data, despite changes and irregularities present in incoming sensor data.

I’ve listed some of the requirements that a deep learning model will face when deployed in the IoT environment:

· The model must autonomously handle sparse dataset pre-processing requirements.

· The model must learn from unlabeled data.

· The model should self-evaluate its results and optimize parameters, if necessary.

· The model should not be prone to overfitting.

technicalstuff Reinforcement learning can also be useful in generating analytics from IoT data. Reinforcement learning — or semisupervised machine learning — is an up-and-coming method that trains models via a reward system logic that closely resembles how humans learn. A reinforcement learning agent self-learns by interacting with its environment and choosing actions that will reap it as many rewards as possible. You set the rules for how rewards are given out, and the agent learns to take the actions that maximize the number of rewards it receives.

Advancing Artificial Intelligence Innovation

To understand artificial intelligence and its place in the IoT, you first need to grasp some key differences between the terms artificial intelligence, machine learning, and IoT. The term artificial intelligence (AI) refers to built-systems that mimic human behavior by making insightful decisions that are derived from artificial neural network model outputs. Many AI technologies implement deep learning or reinforcement learning, but, traditionally, the driving intelligence behind AI was artificial neural networks. As I explain in Chapter 4, neural nets are one type of machine learning method, among many. So, to be clear, machine learning is not AI, but it encompasses a few methods that drive the decisions that are made by AI technologies. In itself, machine learning is simply the practice of applying algorithmic models to data in order to discover hidden patterns or trends that you can use to make predictions.

The IoT is a network of connected, smart devices, many of which depend on output from machine learning models to direct and manage device operations. In this sense, some IoT devices are considered a form of artificial intelligence. But not all devices that are connected to the IoT are AI technologies. Some connected devices are managed by traditional control systems that don’t include machine learning or advanced analytics, like SCADA — Supervisory Control and Data Acquisition. These devices would still be IoT devices, but they would not be considered AI-driven technologies.

Artificial intelligence has been around awhile — since the 1940s, in fact. Some of the more recent AI-driven innovations include these objects:

· Self-driving cars: These cars deploy machine learning to make the decisions that are required in order to operate and drive themselves. Human supervision is still required in order to ensure passenger safety.

· Military robotics: These armed military robots deploy machine learning to act autonomously in hostile conflict environments.

· AlphaGo: The Google gaming application used deep learning to win $1 million, by beating Lee Sedol at the Chinese game called Go.

IoT is ushering in its own breed of AI advancements, though. One type of innovation that is already available is the smart home. To understand how IoT combines with AI to produce a smart home, imagine that it’s summertime and it’s very hot outside. When you leave for work, your air conditioning is always turned off, and then when you get home at 5 p.m., it takes a long time to cool the house. Well, with IoT and AI advancements, you can connect your phone’s GPS, an outdoor temperature sensor, and the air conditioner. The network can learn what features indicate your impending arrival — like departure from work, time of departure, and directionality of travel — to predict that you will arrive by a certain time. The network can use the outdoor temperature reading to learn how long the air conditioner should run and at what temperature, to bring the room temperature down to the temperature setting you’ve selected. So, when you arrive home, your house will be the perfect temperature without you having had to wait or to turn the systems on or off. They could act autonomously, based on what they’ve learned from the various connected devices, and based on the parameters you set for them.