Thinking with Data (2014)

Preface

Working with data is about producing knowledge. Whether that knowledge is consumed by a person or acted on by a machine, our goal as professionals working with data is to use observations to learn about how the world works. We want to turn information into insights, and asking the right questions ensures that we’re creating insights about the right things. The purpose of this book is to help us understand that these are our goals and that we are not alone in this pursuit.

I work as a data strategy consultant. I help people figure out what problems they are trying to solve, how to solve them, and what to do with them once the problems are “solved.” This book grew out of the recognition that the problem of asking good questions and knowing how to put the answers together is not a new one. This problem—the problem of turning observations into knowledge—is one that has been worked on again and again and again by experts in a variety of disciplines. We have much to learn from them.

People use data to make knowledge to accomplish a wide variety of things. There is no one goal of all data work, just as there is no one job description that encapsulates it. Consider this incomplete list of things that can be made better with data:

§ Answering a factual question

§ Telling a story

§ Exploring a relationship

§ Discovering a pattern

§ Making a case for a decision

§ Automating a process

§ Judging an experiment

Doing each of these well in a data-driven way draws on different strengths and skills. The most obvious are what you might call the “hard skills” of working with data: data cleaning, mathematical modeling, visualization, model or graph interpretation, and so on.^[1^]

What is missing from most conversations is how important the “soft skills” are for making data useful. Determining what problem one is actually trying to solve, organizing results into something useful, translating vague problems or questions into precisely answerable ones, trying to figure out what may have been left out of an analysis, combining multiple lines or arguments into one useful result…the list could go on. These are the skills that separate the data scientist who can take direction from the data scientist who can give it, as much as knowledge of the latest tools or newest algorithms.

Some of this is clearly experience—experience working within an organization, experience solving problems, experience presenting the results. But these are also skills that have been taught before, by many other disciplines. We are not alone in needing them. Just as data scientists did not invent statistics or computer science, we do not need to invent techniques for how to ask good questions or organize complex results. We can draw inspiration from other fields and adapt them to the problems we face. The fields of design, argument studies, critical thinking, national intelligence, problem-solving heuristics, education theory, program evaluation, various parts of the humanities—each of them have insights that data science can learn from.

Data science is already a field of bricolage. Swaths of engineering, statistics, machine learning, and graphic communication are already fundamental parts of the data science canon. They are necessary, but they are not sufficient. If we look further afield and incorporate ideas from the “softer” intellectual disciplines, we can make data science successful and help it be more than just this decade’s fad.

A focus on why rather than how already pervades the work of the best data professionals. The broader principles outlined here may not be new to them, though the specifics likely will be.

This book consists of six chapters. Chapter 1 covers a framework for scoping data projects. Chapter 2 discusses how to pin down the details of an idea, receive feedback, and begin prototyping. Chapter 3 covers the tools of arguments, making it easier to ask good questions, build projects in stages, and communicate results. Chapter 4 covers data-specific patterns of reasoning, to make it easier to figure out what to focus on and how to build out more useful arguments. Chapter 5 takes a big family of argument patterns (causal reasoning) and gives it a longer treatment. Chapter 6provides some more long examples, tying together the material in the previous chapters. Finally, there is a list of further reading in Appendix A, to give you places to go from here.

Conventions Used in This Book

The following typographical convention is used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

^[1^]See Taxonomy of Data Science by Hilary Mason and Chris Wiggins (http://www.dataists.com/2010/09/a-taxonomy-of-data-science/) and From Data Mining to Knowledge Discovery in Databases by Usama Fayyad et al. (AI Magazine, Fall 1996).