Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)
Part V. Relationships
It is the function of creative men to perceive the relations between thoughts, or things, or forms of expression that may seem utterly different, and to be able to combine them into some new forms—the power to connect the seemingly unconnected.
—William Plomer
We now progress from considering a single variable to considering whether two quite different variables are related in some way. In the proper terminology, we are moving from univariate data to bivariate data and searching for relationships between two variables. We shall also consider relationships between more than two variables.
Chapter 13. Cause and Effect
Storks and Birth Rates
We human beings seem to have an inbuilt desire to seek out relationships between different observed effects, and deduce a cause-and-effect association. I suppose that survival depends to some extent on recognizing relationships and assuming that one effect causes another. As youngsters we learn of danger by relating climbing to the risk of falling. Crossing the road without looking is related to the possibility of being struck by a vehicle, and so on. However, we are inclined to imagine relationships where none exist, and, worse still, to imagine that these relationships imply cause and effect. The extreme situation is in the area of superstition: a remarkably high percentage of the population avoid the number thirteen or carry lucky charms. Astrology, which claims that events in our lives are affected by the positions of the planets, has a large following.
Of course, relationships can be a first step in indicating the presence of cause and effect. Science and technology have advanced, and still advance, by studying relationships. Meteorologists establish relationships between the features of air movements and the resulting weather. Chemists establish relationships between the constituents of substances and their properties.
In scientific investigations carried out under controlled conditions in a laboratory, a cause-and-effect link can be established beyond reasonable doubt. The same experiment can be repeated many times. Our chemist can assure us that he can predict a specific reaction if he knows the conditions that are being maintained. The meteorologist is on less certain ground, having to observe the effects without the ability to control any one of them or to remove unwanted effects that may play a part. Nevertheless, repeated observations can build up confidence that relationships are causally related, particularly if theories are available to explain the relationships. Indeed, theories, starting from hypotheses, develop from the confirmation of cause and effect and may progress to the status of laws.
The use of control groups is a common way of establishing a causal relationship, typically in trials of new drugs. The drug is administered to one group of patients while patients in a second group, the control group, are given placebos. The patients are not made aware of which group they are in. The validity of the results, of course, depends on the overall similarity of the two groups, which therefore need to be constituted by a randomizing procedure.
In general, unless we have evidence that changing one factor brings about a consistent change in another, we cannot assume a cause-and-effect connection. It is not sufficient to establish that the two factors are related. An example of correlation without a causal relationship is that the number of births in Copenhagen in the post-WWII period correlated with the number of storks nesting on the roofs of buildings. The correlation is consistent with the theory that storks deliver human babies but does not prove it. A more plausible reason for the correlation, however, is that increase in the city’s human population was causally correlated with an increase in building which provided more nesting opportunities for storks. Similar correlations between storks and births have been reported from Germany and the Netherlands. Some correlations may be due not simply to a third common cause, as in these examples, but to a series of interconnected factors.
Sometimes a correlation may arise in a more subtle way. Suppose we suspect that a particular medical treatment is triggering an unpleasant side effect in patients. This could be based on an observed correlation between the use of the treatment and occurrence of the side effect. However, it may be that the side effect is not really a side effect but rather is a result of the ailment that the treatment is being used to relieve.
Blastland and Dilnot (2007: 163-174) provide a thought-provoking chapter describing situations in which correlation has been taken to imply causation. An example involves longevity and being overweight. Data from America showed overweight people living slightly longer than thin people. However, a factor not taken into account was that very ill people tend to be very thin. The inclusion of data from this category influences the overall picture, suggesting that being overweight leads to a longer life. The authors also point out that, because of the many false claims of causality, there arises among some of us an unfortunate condemnation of all claims, regardless of their validity.
Some proposed causal relationships are not easy to prove because we have no direct control over the effects involved. What do you make of the following, for example? Richard Wiseman (2007: 27-31) describes an experiment involving 40,000 people. Each was asked to rate himself or herself as lucky or unlucky. The results were found to correlate with the month of birth. The self-described lucky ones were born in summer months, and the unlucky ones in winter months.
The experiment was repeated in the southern hemisphere (New Zealand), though with only 2000 subjects, and it was found that the birth rate for lucky people peaked in December—summer in the southern hemisphere. It was suggested that the temperature at the time of birth might influence the way the baby is looked after in its early months, or perhaps the mother’s diet varies at different times of the year according to the climate. On the other hand, I suspect that many statisticians would like to see the results in greater detail before expressing an opinion. They might also want to know if the subjects in New Zealand knew of the UK finding before they took part.
Amazing coincidences occur regularly. We read of them in the newspapers every week. It is not too surprising, when we consider the enormous number of events that take place in the world and the large number of people there are to experience them. We must always remember that a relationship between events is not sufficient to demonstrate cause and effect. Correlation is a necessary condition for cause and effect, but it is not a sufficient condition. Statistics can demonstrate relationships within a specified level of reliability. But that is as far as it can go. Statistics alone can never prove a causal relationship.