Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)
Part IV. Comparisons
Reason respects differences, and imagination the similitudes of things.
—Percy Bysshe Shelley
We are now in a position to consider situations in which comparisons are made between the features of samples and populations, in order to decide whether they are different or could simply represent likely variations of the same underlying data.
Chapter 8. Levels of Significance
What Odds Are You Giving?
When we obtain two or more samples, we may expect them to be from the same population. Thus we may sample goods produced on two production lines in the same factory, or we could be comparing the same product from two different suppliers. If we find samples to be from the same population, we can pool them to create a larger sample and summarize the data more succinctly. If we find the samples to be from different populations, we are in a position to draw important conclusions. We might change our supplier, for example.
In making comparisons, statisticians propose at the outset that there is a difference or there is not a difference. These proposals are referred to as hypotheses. Hypothesis testing describes the process involved. The correctness of a hypothesis cannot be determined with certainty. There is always a degree of uncertainty, which is expressed in terms of a level of significance.
The null hypothesis, H0, is the hypothesis whose correctness is being tested. H1 is the alternative hypothesis, which is accepted if the null hypothesis cannot be accepted.
Thus we might have a null hypothesis that the average income in Midtown is no different from that in the rest of the county. The alternative hypothesis is that the average income in Midtown is different from that in the rest of the county. Acceptance of the null hypothesis would be expressed by stating that the average income in Midtown is found to be not significantly different from that in the rest of the county. A level of significance is attached to the conclusion. A level of 5%, say, means there is a 1 in 20 chance of the conclusion being wrong.
There is a similarity between the significance level and the confidence limit described in the final section of the preceding chapter. There we used percentages close to 100% to express the level of confidence in our conclusions. Here, our significance levels are close to zero, expressing the probability that our finding of a difference is likely to be wrong. You will see later that the similarity extends to the manner in which the confidence limits and the significance levels are calculated.
The null hypothesis is usually worded in such a way that if it is accepted, then there is no change to the situation, the use of the word “null” implying this approach. If the null hypothesis in the Midtown example had been accepted, we would have discovered nothing special about Midtown and the situation would have been in effect unchanged.
This may seem a rather pedantic convention. After all, why not have adopted a null hypothesis stating that the average income in Midtown is different from that in the rest of the county? The calculation procedure would remain the same and the result obtained would be identical. However, as shall be seen in Chapter 12, the convention does result in improved clarity when we consider the errors that are possible within our degrees of uncertainty.
Tests may be stated to be one-tailed or two-tailed. The test just described is a two-tailed test in that we are asking whether Midtown incomes differ from the others, either by being less or greater than those in the rest of the county. If we test to see whether the Midtown incomes are different in being lower than the rest, or test to see whether they are different in being greater, we would have in each case a one-tailed test. The tail referred to is the tail of the distribution extending away from the mean—i.e., to larger values of standard deviation from the mean and therefore more unlikely to be observed.
A level of greater than 5% is generally never considered to be significant, as the probability of it being simply an odd result is too great. For many purposes, even 5% is not considered good enough and a level of 1% may be required. The probability of the result being wrong is then 1 in 100, and the result may be termed very significant. Of course, for life-or-death situations in medical activities or health and safety applications, even this level may be inadequate and significance levels of 0.1% or better may be called for.
When results are quoted with their levels of significance, the number of data in the sample or samples may also be quoted. There may also be references to degrees of freedom, which were explained in Chapter 7.
In the following four chapters, I will describe various ways of testing hypotheses. The emphasis will be on giving you an understanding of what the statistician is saying and the language she is using. I will not get involved in any complicated mathematics but will outline and illustrate the steps involved. In any case, the mathematical processing is generally carried out by calculator or computer programs rather than by hand. When reference is made to a small or a large sample, the dividing line is around 30 data.