Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)
Part I. Uncertainties
Chapter 3. Probability
How Bad Statistics Can Put You in Jail
To appreciate statistical analysis it is necessary to have some understanding of probability. Surprisingly, perhaps, not very much is required. Knowing how several different probabilities work together in combination and how the probability of occurrence of an event is affected by an overriding condition are all that are needed for most purposes.
Because of the uncertainties discussed in the preceding chapter, statistical results are quoted together with an indication of the probability of the results being correct. Thus it is necessary to have an understanding of basic probability, which fortunately is not difficult to achieve. Probability is defined as the ratio of all equally likely favorable outcomes to all possible equally likely outcomes. It is usually expressed as a fraction or a decimal and must lie between zero denoting impossibility and unity denoting certainty. Thus if we throw a die, there are 6 possible equally likely outcomes. The probability of throwing a 2 is 1/6 as there is only one favorable outcome. The probability of throwing an odd number is 3/6 (i.e., a half), as there are three favorable likely outcomes. The probability of throwing a 7 is zero (i.e., impossible), and the probability of throwing a number less than 10 is one (i.e., certain).
When interpreting probability results it is important to recognize that, simply because an event has a low probability of occurring, we must not conclude that we will never encounter it. After all, something has to occur, and most things that occur have only a small probability of occurring, because there are always many more things that could occur.
To take a fairly inconsequential example, if we deal a pack of cards to give 4 hands each of 13 cards, we would be surprised to find that each hand consisted of a complete suit. The probability of this happening is about 1 in 5x1028 (5 followed by 28 zeros). However, each time we deal the cards, the probability of the particular hands we find we have dealt, whatever the distribution of the cards, is exactly the same: about one in 5x1028. So, an event with this low probability of happening happens every time we deal a pack of cards.
Each day of our lives, we encounter a series of events—a letter from the bank, a cut finger, a favorite song on the radio, and so on—each of which has a probability of happening. Taken together and considering only independent events, the probability of each day’s sequence of events is extremely unlikely to have happened—yet it did!
It would be out of place and unnecessary to give an extensive account of probability theory, but it is important appreciate the basic rules used to manipulate probabilities in drawing conclusions. The next two sections are concerned with these rules.
Combining several probabilities is a simple process but it needs care to do it correctly. If we know the probability of each of two events, we can calculate the probability of both events occurring. Suppose we toss a coin and throw a die. The probability of getting a head is 1/2 and the probability of getting a 2 on the die is 1/6. The probability of both events, a head and a 2, is obtained by multiplying the two probabilities together. The answer is 1/2 x 1/6 = 1/12 or one in twelve, as can be seen from the listing of all the possibilities.
The procedure can be extended to any number of events, the individual probabilities being multiplied together. However it is important to note that this is a valid process only if the events are independent—that is, their occurrences are not linked in some way.
The need for independence can be illustrated by a different example. The probability of me being late for work on any particular day is 1/100, say. The probability of my colleague being late is 1/80. Multiplication of the probabilities gives 1/8,000 as the probability of us both being late on the same day. This is clearly wrong. Many of the circumstances that make him late also make me late. If the weather is foggy or icy, we are both likely to be late. We may even travel on the same train, so a late train makes us both late.
An example of a serious error caused by the unjustified multiplication of probabilities was publicized some years ago. Two children in the same family died, apparently suffering crib deaths. The mother, Sally Clark, a British solicitor, was charged with murder in 1999. An expert witness for the prosecution suggested that the chance of one crib death in a family as affluent as the one in question was one in 8,500. By squaring this probability (i.e., by multiplying 1/8,500 by 1/8,500), he obtained an estimate of 1 in 73 million for the chance of two crib deaths occurring in the family. The figure was not challenged by the defense, and the mother was found guilty and jailed. She won her second appeal in 2003. Clearly, it is possible that the likelihood of crib deaths could run in families for genetic reasons, and the two crib deaths could not be assumed to be independent events. The multiplication of the two (equal) probabilities was unjustified. As a result of the Sally Clark case, other similar cases were reviewed and two other mothers convicted of murder had their convictions overturned.
In 2003 in the Netherlands, a nurse, Lucia de Berk, was sentenced to life imprisonment for the murder of four patients and the attempted murder of three others. Part of the evidence was a statistical calculation provided by a law psychologist. It was claimed that the chance of a nurse working at the three hospitals being present at so many unexplained deaths and resuscitations was one in 342 million, the result being arrived at by a multiplication of probabilities. In the following years, many reputable statisticians criticized the simplistic calculation, and a petition to reopen the case was started. Eventually, in 2010, after lengthy legal processes, a retrial delivered a not-guilty verdict. There were, of course, many considerations other than the statistical calculation, but it is evident from the proceedings that the calculation carried weight in the original conviction.
The rule of multiplication of probabilities for independent events is often referred to as the “and” rule, because it expresses the probability of event A, and event B, and event C, and so on. A second rule—the “or” rule—is used to combine probabilities when we wish to know the probability of event A, or event B, or event C, etc. Here, we add the probabilities. As with the previous rule, this rule also carries an important condition: that the events must be mutually exclusive. That means that only one of the events is possible at any one time. To illustrate, if we throw a die, the probability of a 2 is 1/6 and the probability of a 3 is 1/6. The probability of a 2 or a 3 is 1/6 + 1/6 = 1/3. The two events are mutually exclusive in that it is impossible in throwing the die to get both a 2 and a 3 at the same time. If we extend the example to clarify further, the probability of getting a 1, or a 2, or a 3, or a 4, or a 5, or a 6, is 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1 (i.e., a certainty).
Because the sum of the probabilities of all possible mutually exclusive outcomes equals unity (a certainty), it follows that that the probability of something not happening is equal to one minus the probability of it happening.
To illustrate the misuse of the “or” rule we can return to our tossing of a coin and throwing of a die together. The separate probabilities of a head and a 2 are respectively 1/2 and 1/6. If we added these together we would conclude that the probability of getting a head or a 2 is 1/2 + 1/6 = 2/3, which is quite wrong. Getting a head and getting a 2 are not mutually exclusive events since both can occur. A proper analysis of this situation shows that:
Probability of heads and a 2 = 1/12
Probability of either, but not both = 6/12 = 1/2
Probability of neither = 5/12
Probability of both, or either, or neither = 1/12 + 6/12 + 5/12 =1
The final statement is a correct use of the “or” rule, since “both”, “either”, and “neither” constitute a set of mutually exclusive events. These results can be checked by viewing the full list of possibilities shown above.
The results are also shown in Figure 3-1 in the form of a tree diagram. The difference between the “and” rule and the “or” rule is made clear. Following a sequence of events horizontally across the diagram involves a coin event followed by a die event. The two “and” probabilities are multiplied together. The “or” alternatives are seen in the vertical listing of the final combined probabilities. This tree diagram is a rather trivial example, but you will encounter tree diagrams again in examples of more practical situations. It is worth pointing out here that although a tree diagram can be replaced by a quicker calculation, it is nevertheless an excellent means of clarifying or checking the logic behind the calculation.
Figure 3-1. Tree diagram of the various outcomes of tossing a coin and throwing a die
Note that when probabilities are multiplied together, the result is smaller than either of the two original probabilities. Thus, application of the “and” rule always leads to a decrease in probability. This is as we would expect: the probability of predicting the winner and the second in a horse race is less than the probability of predicting just one of the results. On the other hand, adding probabilities together increases the probability. Thus, application of the “or” rule increases the probability. Predicting the winner or the second in a horse race is more likely than predicting just one of the results.
Combinations of probabilities appear extensively in studies of reliability of systems, as you will see later in more detail. When systems consist of many components, the overall probability of failure depends on the individual probabilities of failure of the components and the way in which they combine. Suppose we have a simple smoke alarm consisting of a sensor connected to a siren. Failure of the system occurs if the sensor or the alarm fails, or both fail (the “or” rule). If we install a duplicate system, failure occurs only if the first system fails and the second system fails (the “and” rule).
Since such analyses are concerned with failures that have to be avoided as much as possible, the values of probability that are quoted are often very small. We are all more at home with probabilities in the range of tenths or perhaps hundredths; but when probabilities of 0.0001 (one in ten thousand) or 0.000001 (one in a million) are presented, we have difficulty not only in recognizing their significance but also in taking them seriously. A chance of a disastrous fire might be one in a million, and some safety procedure we introduce might reduce it to one in two million. This would halve the chance of a fire—a very significant reduction, but a comparison of the two values, 0.000001 and 0.000002, does not carry the same impact.
Probability calculations can become complicated when the required probability is conditional on some other event happening. You need not worry about these complications, but you do need to appreciate how false conclusions can be drawn in such situations. The conclusions, whether accidental or intentional, are particularly dangerous because they appear at first sight to be perfectly valid.
To see what is meant by conditional probability, think of two dice being rolled one after the other. What is the probability that the total score is 5? There are four ways of getting a score of 5—1+4, 2+3, 3+2, and 4+1—out of a possible 36 combinations of the two scores. So the probability is 4/36 or 1/9. If we introduce a condition—for example, that the first die is showing a 2—the probability of getting a total of 5 becomes 1/6 because the second die must show a 3 and there is a 1 in 6 chance of this happening.
Now consider a situation in which we have a bag of coins, of which 100 are forgeries, as illustrated in Figure 3-2.
Figure 3-2. Conditional probability illustrated by counterfeit coins
Ten of the coins are gold, and two of these are forgeries. We draw one coin from the bag and see that it is gold (the condition). The probability that it is a forgery is 2 out of 10, or 1/5. Alternatively, when we draw the coin out of the bag we may find it to be a forgery (the condition). The probability that it is gold is 2 out of 100 (i.e.,1/50). This illustrates the fact that the probability of event A, given event B, is generally not the same as the probability of event B, given event A. The two conditional probabilities are generally different and can be very different.
The so-called prosecutor’s fallacy arises from the use of the wrong conditional probability. Suppose a suspect is found to have matching DNA characteristics of the unknown perpetrator of a crime. Only one person in 10,000 would be expected to have a similar match. The prosecution argues, therefore, that there is only one chance in 10,000 that the suspect is innocent. But the 1/10,000 probability is the probability of a DNA match given the condition that the suspect is innocent. This is not the appropriate probability to use. The relevant probability is the probability that the suspect is innocent given the condition that there is a DNA match. We cannot evaluate this probability without knowing how many other possible suspects there might be who are equally likely to be guilty. (This would be like trying to solve the bag of coins example without knowledge of the total number of forgeries.) But the figure could be very much greater than 1/10,000. In a population of 100,000, say, there would be on average 10 people with the DNA match and, assuming that two of these, say, are also suspect, our suspect has a probability of 2/3 of being innocent.
As one might expect, there is also the defender’s fallacy. It arises from the supposition of a large population of equally suspected people. Following on from the previous example, if the population was taken to be 1,000,000 there would be 100 with the appropriate DNA match; so, the defender would argue, our suspect has a 99/100 probability of being innocent. Raising the supposed population to 10 million increases the probability of innocence to 999/1000. The fallacy lies in the assumption that everyone in the population is equally suspect.
Haigh (2003) and Seife (2010) give useful accounts of how the misuse of probability can result in errors in legal decisions. Many of the examples are taken from actual cases.
It is not only in legal arguments that errors of this sort arise. They are commonly encountered in political debates and in advertising. Have a look at the following examples.
“Of those dying of lung cancer each year, 75% are smokers. This shows that smokers have a 75% chance of dying of lung cancer.” No it doesn’t! We need to know the probability of someone dying of lung cancer, given that he or she is a smoker, not the probability of the person having been a smoker, given that he or she dies of lung cancer. The following data helps to show the fallacy.
Of the 300 smokers who died, 75 (i.e., 25%) died of lung cancer. This is very different from the quoted 75% of deaths from lung cancer associated with smoking. Take note that these are invented figures and must not be used to draw any medical conclusions!
“Of dental patients who were found to have had no fillings in ten years, 90% had brushed regularly with Toothglo.” But what we would really like to know is what percentage of those who brushed regularly with Toothglo had had no fillings in ten years.
“Eighty percent of winning horses in yesterday’s races were tipped by our racing correspondent.” Maybe, but what percentage of his tips predicted the winning horse?
The atmosphere was becoming tense. Rod Craig, representative of Jenson’s Switches, was in the manager’s office at Boilfast, a manufacturer of electric kettles. Boilfast fitted most of its kettles with switches supplied by Jenson’s, and it was the switches that were being discussed.
The manager of Boilfast, Tom Richards, was concerned about the number of kettles he was having to repair under guarantee because of a problem with the on-off switch.
He quoted some figures from a sheet of paper he was holding.
“Over the past two years, of the number of kettles returned because of a faulty on-off switch, sixty-seven percent were fitted with your switches. And I don’t think that is acceptable.”
Rod could hardly do anything but apologize and assure the manager, who was now leaning forward in a somewhat threatening manner, that he would refer the matter back to his technical department.
On the way back to Jenson’s, Rod had chance to think through the situation. His company supplied most of the switches that Boilfast fitted to its kettles, so Boilfast was a customer they would not want to lose. But how meaningful was the complaint? Rod began to see the light and, by the time he arrived in his office, he had a smile on his face.
He picked up the phone and dialed.
“Tom, the issue is not the one you described.”
“You are saying that of the kettles returned because of a faulty switch, 67% were fitted with our switches. The real issue is, given that the kettle has our switch fitted, what percent are returned because of a faulty switch? Perhaps you should look at your figures more closely.”
Tom was thrown off balance and felt slightly confused.
“I’ll get back to you,” he said.
He did look at the figures. Of the number of kettles with Jenson’s switches returned for any reason, 22% had a faulty switch. This was similar to the figures relating to kettles fitted with switches from other suppliers, the corresponding percentage being 19%. Because the majority of Boilfast kettles were fitted with Jenson’s switches, the predominance of their switch failures that was troubling Tom was readily explained.