Introduction - Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)

Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)


The man who is denied the opportunities of taking decisions of importance begins to regard as important the decisions he is allowed to take. He becomes fussy about filing, keen on seeing that pencils are sharpened, eager to ensure that the windows are open (or shut) and apt to use two or three different-coloured inks.

—C. Northcote Parkinson

Statistics are not popular. One might even say they are disliked. Not by statisticians, of course, but by the millions who have to cope with the steady flow of statistics supporting all kinds of assertions, opinions, and theories. Received wisdom harrumphs, “You can prove anything by statistics”—and then sneers, “Lies, damned lies, and statistics.” My sympathies do not lie with these sentiments, which, I believe, have their origins in the misuse of statistics. I believe that statisticians are skilled in their work and act professionally, sincerely desiring their results to be interpreted and used correctly. The misuse arises when statements by those who have limited understanding of the subject are claimed to be justified by statistics.

The misuse is frequently due to misunderstanding. Results of statistical investigations often have to be worded with many qualifications and precise definitions, and this does not ease the understanding of the casual reader. Misguided attempts to summarize or simplify statistical findings are another cause of distortion. And undoubtedly an element of intentional misrepresentation is sometimes involved. Often, the misuse arises from a desperate attempt to justify a viewpoint with what is seen to be a scientific statement. Hence the suggestion that statistics are sometimes used as a drunk uses a lamp post: more for support than illumination.

This book is not for practitioners or would-be practitioners of statistics: it is, as the title implies, for those who have to make decisions on the basis of statistics. Most of us, at one time or another, make use of statistics. The use may be to make a trivial decision, such as buying a tube of toothpaste in the face of claims that nine out of ten dentists recommend it; or it may be to commit a large sum of money to a building project on the basis of an anticipated increase in sales. We are decision makers in our work and in our domestic affairs, and our decisions are frequently based on or influenced by statistical considerations.

My aim in writing this book is to help decision makers to appreciate what the statistics are saying and what they are not saying. In order to have this appreciation, it is not necessary to understand in detail how the statistics have been processed. The key is to understand the underlying perspective that is the foundation of the various procedures used and thereby understand the characteristic features of results from statistical investigations. This is the understanding that this book is intended to provide, by means of easy-to-follow explanations of basic methods and overviews of more complicated methods.

The decision makers I have primarily in mind are managers in business and industry. Business decisions are frequently taken on the basis of statistics. Whether to expand, whether to move into new areas, or whether to cut back on investment can make a big difference to the fortunes of a company. The building of houses, new roads, and new facilities of various kinds affects large numbers of people, and getting it wrong can be economically and socially disastrous for years ahead. Those who have to make such decisions are rarely statisticians, but the evidence on which they have to operate, whether in-house or from consultants, is frequently based on statistics. These people—the executives, planners, and project managers in all kinds of business—I aim to address, in the belief that, while the methods of statistics can be complicated, the meaning of statistics is not.

A better appreciation of statistics not only helps the decision makers in assessing what the statisticians have concluded, but also allows a more reliable judgment at the outset of what they should be asked to provide—recognizing what is possible, what the limitations are, and with what levels of uncertainty the answers are likely to be qualified. This is particularly important when consultants are to be involved, their fees being not insignificant.

I also have in mind students—the managers of the future—but not students who are studying statistics, as there are many excellent text books that they will know of and will be using (though some beginners might welcome a friendly introduction to the subject). The students who, I believe, will find this book useful are those who need to have an understanding of statistics without being involved directly in applying statistical methods. Many students of medicine, engineering, social sciences, and business studies, for example, fall into this category.

As I mentioned previously, we are all subjected to a regular deluge of statistics in our domestic affairs, and I therefore believe that interested non-professionals would find the book useful in helping them to adopt a more informed and critical view. Readers of newspapers and viewers of television, and that includes most of us, have a daily dose of statistics. We are told that sixty percent of the population think the government is doing a poor job, that there is more chance of being murdered than of winning a million dollars in the lottery, that there are more chickens in the country than people, and so on. Shoppers are faced with claims regarding price differentials and value for money. Advertisements constantly make claims for products based on statistical evidence: “Ninety percent of women looked younger after using Formula 39,” and so on. If this book encourages just a few people to understand statistics a little better and thereby question statistics sensibly, rather than simply dismissing all statistics as rubbish, it will have been worthwhile.

In its most restricted meaning, statistics (plural) are systematically collected related facts expressed numerically or descriptively, such as lists of prices, weights, birthdays or whatever. Statistics (singular) is a science involving the processing of the facts—the raw data—to produce useful conclusions. In total, we have a procedure that starts with facts and moves by mathematical processing through to final statements, which, although factual, involve probability and uncertainty.

We will encounter areas where it is easy to be misled. We will see that we are sometimes misled because the conclusions we are faced with are not giving the whole story. But we shall also see that we can be misled by our own misunderstanding of what we are being told. We are, after all, not statisticians, but we need to understand what the statisticians are saying. Our task is to reach that necessary level of understanding without having to become proficient in the mathematical procedures involved.

The chapters of the book progress in a logical sequence, though it is not the sequence usually adopted in books aimed at the teaching of statistics. It is a sequence which allows the reader readily to find the section appropriate for his or her immediate needs. Most of the chapters are well subdivided, which assists further in this respect.

Part I shows why statistics involves uncertainties. This leads to explanations of the basics of probability. Of particular interest are examples of how misuse of probability leads to numerous errors in the media and even in legal proceedings.

Part II concerns raw data—how data can be obtained and the various methods for sampling it. Data may be descriptive, such as geographical location or eye color, or numerical. The various ways that data can be presented and how different impressions of the meaning of the data can arise are discussed.

Part III examines how data samples are summarized and characterized. A sample can give us information relating to the much larger pool of data from which the sample was obtained. By calculating confidence intervals, we see how the concept of reliability of our conclusions arises.

Part IV investigates comparisons that can be made using the characteristics of our samples. We need to search for similarities and differences, and to recognize whether they are real or imaginary.

Part V moves to the question of whether there are relationships between two or more different features. As the number of features represented in the data increases, the examination of relationships becomes more involved and is usually undertaken with the help of computer packages. For such methods, I have given an overview of what is being done and what can be achieved.

Part VI deals with forecasting. Practical examples are worked through to illustrate the appropriate methods and the variety of situations that can be dealt with.

The final part, Part VII, is devoted to big data. This is the most important development in the application of statistics that has arisen in recent times. Big, in this context, means enormous—so much so that it has affected our basic concepts in statistical thinking.

Where examples of data and collections of data are given, they are realistic insofar as they illustrate what needs to be explained. But there the realism ends. I have used simple numbers—often small discrete numbers—for the sake of clarity. The samples that I have shown are small—too small to be considered adequate. In real investigations, samples need to be as large as can be reasonably obtained, but my use of small samples makes the explanation of the processing easier to follow.

The examples I have included have been kept to a minimum for the sake of brevity. I have taken the view that one example explained clearly, and perhaps at length, is better than half a dozen all of which might confuse in the same way.

To clarify the calculations, I have retained them within the main text rather than relegating them to appendices with formal mathematical presentation. This allows me to add explanatory comments as the calculations proceed and allows the reader to skip the arithmetic while following the procedure.

In describing procedures and calculations, I have adopted the stance that we—that is to say you, the reader, and I—are doing the calculations. It would have been messy to repeatedly refer to some third person, even though I realize that you may be predominantly concerned with having to examine and assess procedures and calculations carried out by someone else.

I have given references by quoting author and year in the main text, the details being listed at the end of the book.

If you have read this far, I hope I have encouraged you to overcome any prejudices you might entertain against the elegant pastime of statistics and read on. Believe it or not, statistics is a fascinating subject. Once you get into the appropriate way of thinking, it can be as addictive as crossword puzzles or Sudoku. As a branch of mathematics, it is unique in requiring only simple arithmetic: the clever bit is getting your head around what is really required.

If you have read this far and happen to be a statistician, it must be because you are curious to see if I have got everything right. Being a statistician, you will appreciate that certainty is difficult if not impossible to achieve, so please let me know of any mistakes you find.