Applying Data-Driven Insights to Business and Industry - Getting Started with Data Science - Data Science For Dummies (2016)

Data Science For Dummies (2016)

Part 1

Getting Started with Data Science

Chapter 3

Applying Data-Driven Insights to Business and Industry

IN THIS CHAPTER

check Seeing the benefits of business-centric data science

check Knowing business intelligence from business-centric data science

check Finding the expert to call when you want the job done right

check Seeing how a real-world business put data science to good use

To the nerds and geeks out there, data science is interesting in its own right, but to most people, it’s interesting only because of the benefits it can generate. Most business managers and organizational leaders couldn’t care less about coding and complex statistical algorithms. They are, on the other hand, extremely interested in finding new ways to increase business profits by increasing sales rates and decreasing inefficiencies. In this chapter, I introduce the concept of business-centric data science, discuss how it differs from traditional business intelligence, and talk about how you can use data-derived business insights to increase your business’s bottom line.

The modern business world is absolutely deluged with data. That’s because every line of business, every electronic system, every desktop computer, every laptop, every company-owned cellphone, and every employee is continually creating new business-related data as a natural and organic output of their work. This data is structured or unstructured some of it is big and some of it is small, fast or slow; maybe it’s tabular data, or video data, or spatial data, or data that no one has come up with a name for yet. But though there are many varieties and variations between the types of datasets produced, the challenge is only one — to extract data insights that add value to the organization when acted upon. In this chapter, I walk you through the challenges involved in deriving value from actionable insights that are generated from raw business data.

Benefiting from Business-Centric Data Science

Business is complex. Data science is complex. At times, it’s easy to get so caught up looking at the trees that you forget to look for a way out of the forest. That’s why, in all areas of business, it’s extremely important to stay focused on the end goal. Ultimately, no matter what line of business you’re in, true north is always the same: business profit growth. Whether you achieve that by creating greater efficiencies or by increasing sales rates and customer loyalty, the end goal is to create a more stable, solid profit-growth rate for your business. The following list describes some of the ways that you can use business-centric data science and business intelligence to help increase profits:

· Decrease financial risks. A business-centric data scientist can decrease financial risk in e-commerce business by using time series anomaly-detection methods for real-time fraud detection — to decrease Card-Not-Present fraud and to decrease the incidence of account takeovers, to take two examples.

· Increase the efficiencies of systems and processes. This is a business systems optimization function that’s performed by both the business-centric data scientist and the business analyst. Both use analytics to optimize business processes, structures, and systems, but their methods and data sources differ. The end goal here should be to decrease needless resource expenditures and to increase return on investment for justified expenditures.

· Increase sales rates. To increase sales rates for your offerings, you can employ a business-centric data scientist to help you find the best ways to upsell and cross-sell, increase customer loyalty, increase conversions in each layer of the funnel, and exact-target your advertising and discounts. It’s likely that your business is already employing many of these tactics, but a business-centric data scientist can look at all data related to the business and, from that, derive insights that supercharge these efforts.

Converting Raw Data into Actionable Insights with Data Analytics

Turning your raw data into actionable insights is the first step in the progression from the data you’ve collected to something that actually benefits you. Business-centric data scientists use data analytics to generate insights from raw data.

Types of analytics

Listed here, in order of increasing complexity, are the four types of data analytics you’ll most likely encounter:

· Descriptive analytics: This type of analytics answers the question, “What happened?” Descriptive analytics are based on historical and current data. A business analyst or a business-centric data scientist bases modern-day business intelligence on descriptive analytics.

· Diagnostic analytics: You use this type of analytics to find answers to the question, “Why did this particular something happen?” or “What went wrong?” Diagnostic analytics are useful for deducing and inferring the success or failure of subcomponents of any data-driven initiative.

· Predictive analytics: Although this type of analytics is based on historical and current data, predictive analytics go one step further than descriptive analytics. Predictive analytics involve complex model-building and analysis in order to predict a future event or trend. In a business context, these analyses would be performed by the business-centric data scientist.

· Prescriptive analytics: This type of analytics aims to optimize processes, structures, and systems through informed action that’s based on predictive analytics — essentially telling you what you should do based on an informed estimation of what will happen. Both business analysts and business-centric data scientists can generate prescriptive analytics, but their methods and data sources differ.

remember Ideally, a business should engage in all four types of data analytics, but prescriptive analytics is the most direct and effective means by which to generate value from data insights.

Common challenges in analytics

Analytics commonly pose at least two challenges in the business enterprise. First, organizations often have difficulty finding new hires with specific skill sets that include analytics. Second, even skilled analysts often have difficulty communicating complex insights in a way that’s understandable to management decision makers.

To overcome these challenges, the organization must create and nurture a culture that values and accepts analytics products. The business must work to educate all levels of the organization so that management has a basic concept of analytics and the success that can be achieved by implementing them. Conversely, business-centric data scientists must have a solid working knowledge about business in general and, in particular, a solid understanding of the business at hand. A strong business knowledge is one of the three main requirements of any business-centric data scientist; the other two are a strong coding acumen and strong quantitative analysis skills via math and statistical modeling.

Data wrangling

Data wrangling is another important portion of the work that’s required in order to convert data to insights. To build analytics from raw data, you’ll almost always need to use data wrangling — the processes and procedures that you use to clean and convert data from one format and structure to another so that the data is accurate and in the format that analytics tools and scripts require for consumption. The following list highlights a few of the practices and issues I consider most relevant to data wrangling:

· Data extraction: The business-centric data scientist must first identify which datasets are relevant to the problem at hand and then extract sufficient quantities of the data that’s required to solve the problem. (This extraction process is commonly referred to as data mining.)

· Data preparation: Data preparation involves cleaning the raw data extracted through data mining and then converting it into a format that allows for a more convenient consumption of the data. Six steps are involved, as you see in the next paragraph.

· Data governance: Data governance standards are used as a quality control measure to ensure that manual and automated data sources conform to the data standards of the model at hand. Data governance standards must be applied so that the data is at the right granularity when it’s stored and made ready for use.

remember Granularity is a measure of a dataset’s level of detail. Data granularity is determined by the relative size of the subgroupings into which the data is divided.

· Data architecture: IT architecture is the key. If your data is isolated in separate, fixed repositories — those infamous data silos everybody complains about — then it’s available to only a few people within a particular line of business. Siloed data structures result in scenarios where a majority of an organization’s data is simply unavailable for use by the organization at large. (Needless to say, siloed data structures are incredibly wasteful and inefficient.)

tip When preparing to analyze data, follow this 6-step process for data preparation:

1. Import. Read relevant datasets into your application.

2. Clean. Remove strays, duplicates, and out-of-range records, and also standardizing casing.

3. Transform. In this step, you treat missing values, deal with outliers, and scale your variables.

4. Process. Processing your data involves data parsing, recoding of variables, concatenation, and other methods of reformatting your dataset to prepare it for analysis.

5. Log in. In this step, you simply create a record that describes your dataset. This record should include descriptive statistics, information on variable formats, data source, collection methods, and more. Once you generate this log, make sure to store it in a place you’ll remember, in case you need to share these details with other users of the processed dataset.

6. Back up. The last data preparation step is to store a backup of this processed dataset so that you have a clean, fresh version — no matter what.

Taking Action on Business Insights

After wrangling your data down to actionable insights, the second step in the progression from raw data to value-added is to take decisive actions based on those insights. In business, the only justifiable purpose for spending time deriving insights from raw data is that the actions should lead to an increase in business profits. Failure to take action on data-driven insights results in a complete and total loss of the resources that were spent deriving them, at no benefit whatsoever to the organization. An organization absolutely must be ready and equipped to change, evolve, and progress when new business insights become available.

remember What I like to call the insight-to-action arc — the process of taking decisive actions based on data insights — should be formalized in a written action plan and then rigorously exercised to affect continuous and iterative improvements to your organization — iterative because these improvements involve a successive round of deployments and testing to optimize all areas of business based on actionable insights that are generated from organizational data. This action plan is not something that should be tacked loosely on the side of your organization and then never looked at again.

To best prepare your organization to take action on insights derived from business data, make sure you have the following people and systems in place and ready to go:

· Right data, right time, right place: This part isn’t complicated: You just have to have the right data, collected and made available at the right places and the right times, when it’s needed the most.

· Business-centric data scientists and business analysts: Have business-centric data scientists and business analysts in place and ready to tackle problems when they arise.

· Educated and enthusiastic management: Educate and encourage your organization’s leaders so that you have a management team that understands, values, and makes effective use of business insights gleaned from analytics.

· Informed and enthusiastic organizational culture: If the culture of your organization reflects a naïveté or lack of understanding about the value of data, begin fostering a corporate culture that values data insights and analytics. Consider using training, workshops, and events.

· Written procedures with clearly designated chains of responsibility: Have documented processes in place and interwoven into your organization so that when the time comes, the organization is prepared to respond. New insights are generated all the time, but growth is achieved only through iterative adjustments and actions based on constantly evolving data insights. The organization needs to have clearly defined procedures ready to accommodate these changes as necessary.

· Advancement in technology: Your enterprise absolutely must keep up-to-date with rapidly changing technological developments. The analytics space is changing fast — very fast! There are many ways to keep up. If you keep in-house experts, you can assign them the ongoing responsibility of monitoring industry advancements and then suggesting changes that are needed to keep your organization current. An alternative way to keep current is to purchase cloud-based Software-as-a-Service (SaaS) subscriptions and then rely on SaaS platform upgrades to keep you up to speed on the most innovative and cutting-edge technologies.

warning When relying on SaaS platforms to keep you current, you’re taking a leap of faith that the vendor is working hard to keep on top of industry advancements and not just letting things slide. Ensure that the vendor has a long-standing history of maintaining up-to-date and reliable services over time. Although you could try to follow the industry yourself and then check back with the vendor on updates as new technologies emerge, that is putting the onus on you. Unless you’re a data technology expert with a lot of free time to research and inquire about advancements in industry standards, it’s better to choose a reliable vendor that has an excellent reputation for delivering up-to-date, cutting-edge technologies to customers.

Distinguishing between Business Intelligence and Data Science

Business-centric data scientists and business analysts who do business intelligence are like cousins: They both use data to work toward the same business goal, but their approach, technology, and function differ by measurable degrees. In the following sections, I define, compare, and distinguish between business intelligence and business-centric data science.

Business intelligence, defined

The purpose of business intelligence is to convert raw data into business insights that business leaders and managers can use to make data-informed decisions. Business analysts use business intelligence tools to create decision-support products for business management decision making. If you want to build decision-support dashboards, visualizations, or reports from complete medium-size sets of structured business data, you can use business intelligence tools and methods to help you.

Business intelligence (BI) is composed of

· Mostly internal datasets: By internal, I mean business data and information that’s supplied by your organization’s own managers and stakeholders.

· Tools, technologies, and skillsets: Examples here include online analytical processing, ETL (extracting, transforming, and loading data from one database into another), data warehousing, and information technology for business applications.

The kinds of data used in business intelligence

Insights that are generated in business intelligence (BI) are derived from standard-size sets of structured business data. BI solutions are mostly built off of transactional data — data that’s generated during the course of a transaction event, like data generated during a sale or during a money transfer between bank accounts, for example. Transactional data is a natural byproduct of business activities that occur across an organization, and all sorts of inferences can be derived from it. The following list describes the possible questions you can answer by using BI to derive insights from these types of data:

· Customer service: “What areas of business are causing the largest customer wait times?”

· Sales and marketing: “Which marketing tactics are most effective and why?”

· Operations: “How efficiently is the help desk operating? Are there any immediate actions that must be taken to remedy a problem there?”

· Employee performance: “Which employees are the most productive? Which are the least?”

Technologies and skillsets that are useful in business intelligence

To streamline BI functions, make sure that your data is organized for optimal ease of access and presentation. You can use multidimensional databases to help you. Unlike relational, or flat databases, multidimensional databases organize data into cubes that are stored as multidimensional arrays. If you want your BI staff to be able to work with source data as quickly and easily as possible, you can use multidimensional databases to store data in a cube rather than store the data across several relational databases that may or may not be compatible with one another.

This cubic data structure enables Online Analytical Processing (OLAP) — a technology through which you can quickly and easily access and use your data for all sorts of different operations and analyses. To illustrate the concept of OLAP, imagine that you have a cube of sales data that has three dimensions: time, region, and business unit. You can slice the data to view only one rectangle — to view one sales region, for instance. You can dice the data to view a smaller cube made up of some subset of time, region(s), and business unit(s). You can drill down or drill up to view either highly detailed or highly summarized data, respectively. And you can roll up, or total, the numbers along one dimension — to total business unit numbers, for example, or to view sales across time and region only.

OLAP is just one type of data warehousing system — a centralized data repository that you can use to store and access your data. A more traditional data warehouse system commonly employed in business intelligence solutions is a data mart — a data storage system that you can use to store one particular focus area of data, belonging to only one line of business in the enterprise. Extract, transform, and load (ETL) is the process that you’d use to extract data, transform it, and load it into your database or data warehouse. Business analysts generally have strong backgrounds and training in business and information technology. As a discipline, BI relies on traditional IT technologies and skills.

Defining Business-Centric Data Science

Within the business enterprise, data science serves the same purpose that business intelligence does — to convert raw data into business insights that business leaders and managers can use to make data-informed decisions. If you have large sets of structured and unstructured data sources that may or may not be complete and you want to convert those sources into valuable insights for decision support across the enterprise, call on a data scientist. Business-centric data science is multidisciplinary and incorporates the following elements:

· Quantitative analysis: Can be in the form of mathematical modeling, multivariate statistical analysis, forecasting, and/or simulations.

remember The term multivariate refers to more than one variable. A multivariate statistical analysis is a simultaneous statistical analysis of more than one variable at a time.

· Programming skills: You need the necessary programming skills to analyze raw data and to make this data accessible to business users.

· Business knowledge: You need knowledge of the business and its environment so that you can better understand the relevancy of your findings.

Data science is a pioneering discipline. Data scientists often employ the scientific method for data exploration, hypotheses formation, and hypothesis testing (through simulation and statistical modeling). Business-centric data scientists generate valuable data insights, often by exploring patterns and anomalies in business data. Data science in a business context is commonly composed of

· Internal and external datasets: Data science is flexible. You can create business data mash-ups from internal and external sources of structured and unstructured data fairly easily. (A data mash-up is combination of two or more data sources that are then analyzed together in order to provide users with a more complete view of the situation at hand.)

· Tools, technologies, and skillsets: Examples here could involve using cloud-based platforms, statistical and mathematical programming, machine learning, data analysis using Python and R, and advanced data visualization.

Like business analysts, business-centric data scientists produce decision-support products for business managers and organizational leaders to use. These products include analytics dashboards and data visualizations, but generally not tabular data reports and tables.

Kinds of data that are useful in business-centric data science

You can use data science to derive business insights from standard-size sets of structured business data (just like BI) or from structured, semi-structured, and unstructured sets of big data. Data science solutions are not confined to transactional data that sits in a relational database; you can use data science to create valuable insights from all available data sources. These data sources include

· Transactional business data: A tried-and-true data source, transactional business data is the type of structured data used in traditional BI and it includes management data, customer service data, sales and marketing data, operational data, and employee performance data.

· Social data related to the brand or business: A more recent phenomenon, the data covered by this rubric includes the unstructured data generated through emails, instant messaging, and social networks such as Twitter, Facebook, LinkedIn, Pinterest, and Instagram.

· Machine data from business operations: Machines automatically generate this unstructured data, like SCADA data, machine data, or sensor data.

technicalstuff The acronym SCADA refers to Supervisory Control and Data Acquisition. SCADA systems are used to control remotely operating mechanical systems and equipment. They generate data that is used to monitor the operations of machines and equipment.

· Audio, video, image, and PDF file data: These well-established formats are all sources of unstructured data.

tip You may have heard of dark data — operational data that most organizations collect and store but then never use. Storing this data and then not using it is pure detriment to a business. On the other hand, with a few sharp data scientists and data engineers on staff, the same organization could use this data resource for optimization security, marketing, business processes, and more. If your organization has dark data, someone should go ahead and turn the light on.

Technologies and skillsets that are useful in business-centric data science

Since the products of data science are often generated from big data, cloud-based data platform solutions are common in the field. Data that’s used in data science is often derived from data-engineered big data solutions, like Hadoop, MapReduce, Spark, and massively parallel processing (MPP) platforms. (For more on these technologies, check out Chapter 2.) Data scientists are innovative forward-thinkers who must often think outside the box in order to exact solutions to the problems they solve. Many data scientists tend toward open-source solutions, when available. From a cost perspective, this approach benefits the organizations that employ these scientists.

Business-centric data scientists often use machine learning techniques to find patterns in (and derive predictive insights from) huge datasets that are related to a line of business or the business at large. They’re skilled in math, statistics, and programming, and they often use these skills to generate predictive models. They generally know how to program in Python or R. Most of them know how to use SQL to query relevant data from structured databases. They are usually skilled at communicating data insights to end users — in business-centric data science, end users are business managers and organizational leaders. Data scientists must be skillful at using verbal, oral, and visual means to communicate valuable data insights.

remember Although business-centric data scientists serve a decision-support role in the enterprise, they’re different from the business analyst in that they usually have strong academic and professional backgrounds in math, science, or engineering — or all of the above. This said, business-centric data scientists also have a strong substantive knowledge of business management.

Making business value from machine learning methods

A discussion of data science in business would be incomplete without a description of the popular machine learning methods being used to generate business value, as described in this list:

· Linear regression: You can use linear regression to make predictions for sales forecasts, pricing optimization, marketing optimization, and financial risk assessment.

· Logistic regression: Use logistic regression to predict customer churn, to predict response-versus-ad spending, to predict the lifetime value of a customer, and to monitor how business decisions affect predicted churn rates.

· Naïve Bayes: If you want to build a spam detector, analyze customer sentiment, or automatically categorize products, customers, or competitors, you can do that using a Naïve Bayes classifier.

· K-means clustering: K-means clustering is useful for cost modeling and customer segmentation (for marketing optimization purposes).

· Hierarchical clustering: If you want to model business processes, or to segment customers based on survey responses, hierarchical clustering will probably come in handy.

· k-nearest neighbor classification: k-nearest neighbor is a type of instance-based learning. You can use it for text document classification, financial distress prediction modeling, and competitor analysis and classification.

· Principal component analysis: Principal component analysis is a dimensionality reduction method that you can use for detecting fraud, for speech recognition, and for spam detection.

tip If you want to know more about how these machine learning algorithms work, keep reading! They’re explained in detail in Part 2 of this book.

Differentiating between Business Intelligence and Business-Centric Data Science

The similarities between BI and business-centric data science are glaringly obvious; it’s the differences that most people have a hard time discerning. The purpose of both BI and business-centric data science is to convert raw data into actionable insights that managers and leaders can use for support when making business decisions.

BI and business-centric data science differ with respect to approach. Although BI can use forward-looking methods like forecasting, these methods are generated by making simple inferences from historical or current data. In this way, BI extrapolates from the past and present to infer predictions about the future. It looks to present or past data for relevant information to help monitor business operations and to aid managers in short- to medium-term decision making.

In contrast, business-centric data science practitioners seek to make new discoveries by using advanced mathematical or statistical methods to analyze and generate predictions from vast amounts of business data. These predictive insights are generally relevant to the long-term future of the business. The business-centric data scientist attempts to discover new paradigms and new ways of looking at the data to provide a new perspective on the organization, its operations, and its relations with customers, suppliers, and competitors. Therefore, the business-centric data scientist must know the business and its environment. She must have business knowledge to determine how a discovery is relevant to a line of business or to the organization at large.

Other prime differences between BI and business-centric data science are

· Data sources: BI uses only structured data from relational databases, whereas business-centric data science may use structured data and unstructured data, like that generated by machines or in social media conversations.

· Outputs: BI products include reports, data tables, and decision-support dashboards, whereas business-centric data science products involve either dashboard analytics or another type of advanced data visualization, but rarely tabular data reports. Data scientists generally communicate their findings through words or data visualizations, but not tables and reports. That’s because the source datasets from which data scientists work are generally more complex than a typical business manager would be able to understand.

· Technology: BI runs off of relational databases, data warehouses, OLAP, and ETL technologies, whereas business-centric data science often runs off of data from data-engineered systems that use Hadoop, MapReduce, or massively parallel processing.

· Expertise: BI relies heavily on IT and business technology expertise, whereas business-centric data science relies on expertise in statistics, math, programming, and business.

Knowing Whom to Call to Get the Job Done Right

Since most business managers don’t know how to do advanced data work themselves, it’s definitely beneficial to at least know which types of problems are best suited for a business analyst and which problems should be handled by a data scientist instead.

If you want to use enterprise data insights to streamline your business so that its processes function more efficiently and effectively, bring in a business analyst. Organizations employ business analysts so that they have someone to cover the responsibilities associated with requirements management, business process analysis, and improvements-planning for business processes, IT systems, organizational structures, and business strategies. Business analysts look at enterprise data and identify what processes need improvement. They then create written specifications that detail exactly what changes should be made for improved results. They produce interactive dashboards and tabular data reports to supplement their recommendations and to help business managers better understand what is happening in the business. Ultimately, business analysts use business data to further the organization’s strategic goals and to support them in providing guidance on any procedural improvements that need to be made.

In contrast, if you want to obtain answers to very specific questions on your data, and you can obtain those answers only via advanced analysis and modeling of business data, bring in a business-centric data scientist. Many times, a data scientist may support the work of a business analyst. In such cases, the data scientist might be asked to analyze very specific data-related problems and then report the results back to the business analyst to support him in making recommendations. Business analysts can use the findings of business-centric data scientists to help them determine how to best fulfill a requirement or build a business solution.

Exploring Data Science in Business: A Data-Driven Business Success Story

Southeast Telecommunications Company was losing many of its customers to customer churn — the customers were simply moving to other telecom service providers. Because it’s significantly more expensive to acquire new customers than it is to retain existing customers, Southeast’s management wanted to find a way to decrease the churn rates. So, Southeast Telecommunications engaged Analytic Solutions, Inc. (ASI), a business-analysis company. ASI interviewed Southeast’s employees, regional managers, supervisors, frontline employees, and help desk employees. After consulting with personnel, they collected business data that was relevant to customer retention.

ASI began examining several years’ worth of Southeast’s customer data to develop a better understanding of customer behavior and why some people left after years of loyalty while others continued to stay on. The customer datasets contained records for the number of times a customer had contacted Southeast’s help desk, the number of customer complaints, and the number of minutes and megabytes of data each customer used per month. ASI also had demographic and personal data (credit score, age, and region, for example) that was contextually relevant to the evaluation.

By looking at this customer data, ASI discovered the following insights. Within the 1-year time interval before switching service providers

· Eighty-four percent of customers who left Southeast had placed two or more calls into its help desk in the nine months before switching providers.

· Sixty percent of customers who switched showed drastic usage drops in the six months before switching.

· Forty-four percent of customers who switched had made at least one complaint to Southeast in the six months before switching. (The data showed significant overlap between these customers and those who had called into the help desk.)

Based on these results, ASI fitted a logistic regression model to the historical data in order to identify the customers who were most likely to churn. With the aid of this model, Southeast could identify and direct retention efforts at the customers that it was most likely to lose. These efforts helped Southeast improve its services by identifying sources of dissatisfaction; increase returns on investment by restricting retention efforts to only those customers at risk of churn (rather than all customers); and, most importantly, decrease overall customer churn, thus preserving the profitability of the business at large.

What’s more, Southeast didn’t make these retention efforts a one-time event: The company incorporated churn analysis into its regular operating procedures. By the end of that year, and in the years since, it has seen a dramatic reduction in overall customer churn rates.