Statistics Done Wrong: The Woefully Complete Guide (2015)
A few years ago I was an undergraduate physics major at the University of Texas at Austin. I was in a seminar course, trying to choose a topic for the 25-minute presentation all students were required to give. “Something about conspiracy theories,” I told Dr. Brent Iverson, but he wasn’t satisfied with that answer. It was too broad, he said, and an engaging presentation needs to be focused and detailed. I studied the sheet of suggested topics in front of me. “How about scientific fraud and abuse?” he asked, and I agreed.
In retrospect, I’m not sure how scientific fraud and abuse is a narrower subject than conspiracy theories, but it didn’t matter. After several slightly obsessive hours of research, I realized that scientific fraud isn’t terribly interesting—at least, not compared to all the errors scientists commitunintentionally.
Woefully underqualified to discuss statistics, I nonetheless dug up several dozen research papers reporting on the numerous statistical errors routinely committed by scientists, read and outlined them, and devised a presentation that satisfied Dr. Iverson. I decided that as a future scientist (and now a self-designated statistical pundit), I should take a course in statistics.
Two years and two statistics courses later, I enrolled as a graduate student in statistics at Carnegie Mellon University. I still take obsessive pleasure in finding ways to do statistics wrong.
Statistics Done Wrong is a guide to the more egregious statistical fallacies regularly committed in the name of science. Because many scientists receive no formal statistical training—and because I do not want to limit my audience to the statistically initiated—this book assumes no formal statistical training. Some readers may easily skip through the first chapter, but I suggest at least skimming it to become familiar with my explanatory style.
My goal is not just to teach you the names of common errors and provide examples to laugh at. As much as is possible without detailed mathematics, I’ve explained why the statistical errors are errors, and I’ve included surveys showing how common most of these errors are. This makes for harder reading, but I think the depth is worth it. A firm understanding of basic statistics is essential for everyone in science.
For those who perform statistical analyses for their day jobs, there are Tips at the end of most chapters to explain what statistical techniques you might use to avoid common pitfalls. But this is not a textbook, so I will not teach you how to use these techniques in any technical detail. I hope only to make you aware of the most common problems so you are able to pick the statistical technique best suited to your question.
In case I pique your curiosity about a topic, a comprehensive bibliography is included, and every statistical misconception is accompanied by references. I omitted a great deal of mathematics in this guide in favor of conceptual understanding, but if you prefer a more rigorous treatment, I encourage you to read the original papers.
I must caution you before you read this book. Whenever we understand something that few others do, it is tempting to find every opportunity to prove it. Should Statistics Done Wrong miraculously become a New York Times best seller, I expect to see what Paul Graham calls “middlebrow dismissals” in response to any science news in the popular press. Rather than taking the time to understand the interesting parts of scientific research, armchair statisticians snipe at news articles, using the vague description of the study regurgitated from some overenthusiastic university press release to criticize the statistical design of the research.
This already happens on most websites that discuss science news, and it would annoy me endlessly to see this book used to justify it. The first comments on a news article are always complaints about how “they didn’t control for this variable” and “the sample size is too small,” and 9 times out of 10, the commenter never read the scientific paper to notice that their complaint was addressed in the third paragraph.
This is stupid. A little knowledge of statistics is not an excuse to reject all of modern science. A research paper’s statistical methods can be judged only in detail and in context with the rest of its methods: study design, measurement techniques, cost constraints, and goals. Use your statistical knowledge to better understand the strengths, limitations, and potential biases of research, not to shoot down any paper that seems to misuse a p value or contradict your personal beliefs. Also, remember that a conclusion supported by poor statistics can still be correct—statistical and logical errors do not make a conclusion wrong, but merely unsupported.
In short, please practice statistics responsibly. I hope you’ll join me in a quest to improve the science we all rely on.