Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)
Part IV. Comparisons
Chapter 11. Comparisons with Descriptive Data
Is Your Staff Female/Male Ratio OK?
Chapter 6 explained that descriptive data can be rendered numerical by expressing the numbers of items in various categories as proportions—thus enabling further analysis to be carried out on the data. In this chapter, a single proportion will be compared with a population, and two sample proportions will be compared. If the data is ordinal—that is to say, it can be listed in logical order—then ranking tests, which will be introduced, can be applied to achieve comparisons between pairs of ranks.
There is a particular advantage in having large samples of descriptive data, because several of the procedures used then allow the data to be dealt with as normally distributed data.
A sample consisting of yes/no data will provide an example of dealing with a proportion. Suppose we know from previous investigations that the proportion of inhabitants of Newtown who were born in Newtown is 0.7. We can use this information to decide whether a sample of size 100, say, obtained in one area of the town is representative of the town or whether the sample shows a significant difference.
The null hypothesis is that the sample proportion, 0.8 say, is not significantly different from the overall proportion of 0.7 for the town. Our sample of 100 consists of 80 who were born in Newtown, whereas we would have expected 70 on the basis of the known results for the whole town. The procedure follows a pattern similar to that used in the first section of Chapter 10, where we asked whether a single value was likely to have been drawn from a population of known mean value. A Z-score was calculated by dividing the difference between the single observed value and the population mean by the square root of the variance. This gave us a measure of the difference in units of standard deviations.
Proportion is a binary measure: each person in our sample was either born in Newtown or not. So the proper distribution to use is the binomial, which we will look at in a moment. However, if the sample is large and the population proportion is not excessively large or small, the normal distribution can be assumed to be relevant. The variance of binomially distributed data is np(1–p), where p is the population proportion and n is the number of data in the sample. The Z-score is therefore
This value shows that our sample differs significantly from the population at the 5% level. (See the selection of values for the normal distribution in the first section of Chapter 10.)
The binomial distribution differs from the normal distribution when the sample size is small, but unfortunately tables of the binomial distribution are not convenient to use. The values of probability vary with both the number of data in the sample and the population proportion, so there has to be a separate table for each size of sample and each value of population proportion. Furthermore, the values listed are cumulative probabilities. Figure 11-1 gives a more easily appreciated view of the binomial distribution by providing a selection of plotted values for a number of sample sizes and different proportions of the property of interest in the population. When the population proportion is small, the distribution is skewed, but it becomes symmetrical when the proportion is 0.5. As the sample size increases, the distribution approaches the normal distribution, as has already been stated.
Suppose we have a firm with just 10 employees, only two of which are female. Does this provide evidence that the firm is discriminating against female employees? The expected number of females assuming no discrimination is 5, so the null hypothesis is that a sample of 10 employees containing 2 or fewer females could have been drawn from a population having a proportion of females of 0.5.
Entering the values from our example in tables of the binomial distribution gives a probability of occurrence of 0.0547—i.e., just over 5%. We would have to conclude that there was no evidence at the 5% level of discrimination. Had there been just one female employee, the probability would be lower—0.0107, just over 1%— and we would consider that there was evidence of discrimination. With zero female employees, the probability would be even lower, but we would have to be careful. It would be quite likely that there was an underlying reason why the work was not suitable for or attractive to female employees.
In Figure 11-1(b), you can see these results diagrammatically. The bottom distribution is appropriate for a sample size of 10 and a population proportion of 0.5. The requirement for 1% significance is shown as zero occurrences and the requirement for 5% significance is shown as less than 2—i.e., 0 or 1.
Figure 11-1. The binomial distribution showing the probability of a number of specified events in a sample when the proportion in the population is p, for a range of p values and for a sample size of (a) 5, (b) 10, (c) 20, and (d) 30
Difference between Proportions
It may be that we have two samples and we wish to examine the difference between them. The null hypothesis is that the two samples could have been drawn from the same population. If the samples are large we can again use the normal distribution and deal with the data as for numerical data described in the “Difference between Means” section of Chapter 10. For samples of equal size, the variances of the two samples are added, and the Z-score is the difference between the numbers of occurrences in each sample divided by the square root of the combined variance. If the samples are of unequal size, the difference has to be the difference between the two proportions and there has to be an appropriate adjustment of the expression for the combined variance. Thus the Z-score takes a more complicated appearance,
where p1 and p2 are the two proportions in the samples, n1 and n2 are the two sample sizes, and p is the population proportion. If the population proportion is not known, a weighted mean of the two sample proportions is used.
Ordinal data, which is descriptive data that can be placed in a logical order, can be compared by ranking tests. These are nonparametric—meaning that no particular distribution is assumed.
Suppose we have two categories that we wish to compare, and our sample data consists of an overall ranking of representatives of both categories. For example, we could have a list of singers ranked in order of preference by a panel of voters, and we wish to see if there is a significant preference for male or female singers. The list might look like this:
M F M M M F M F F F F F M.
An appropriate test would be the Mann-Whitney U-test. An equivalent test, with a slight difference, is the Wilcoxon rank-sum test.
To take an example which we may follow through in greater detail, consider two teams of runners competing in a race: five runners from the A team and five runners from the B team. Our data consists of the order in which the runners finish, and our null hypothesis is that there is no significant difference between the two teams. The runners in order of finishing are
A A B B A B B A A B B B A.
Each data item is given its rank value and the values are totaled for each group, as follows”
A Team ranks 1, 2, 5, 8, 9, 13 Number, nA = 6 Total, RA = 38
B Team ranks 3, 4, 6, 7, 10, 11, 12 Number, nB = 7 Total, RB = 53
Two U values are calculated,
UA = nAnB + nA(nA+1)/2 – RA and
UB = nAnB + nB(nB+1)/2 – RB
The statistic U is the smaller of UA and UB and is referred to the tables of critical values for the Mann-Whitney U-test. Use of the values above gives UA = 25 and UB = 17, so U = 17. The value needs to be equal to or less than the tabulated critical value to indicate a difference between the two sets, A and B, at the indicated significance level. Below is a selection of values from the tables:
A two-tail test is appropriate because we are testing for no difference, rather than a difference in favor of A or B. It can be seen that our U value is too large to indicate any significant difference between the two sets of runners.
For large values of n, the normal distribution can be used. The appropriate mean value is nAnB/2, and the variance is nAnB(nA+nB+1)/12. Thus a Z-score can be calculated from the value of U and referred to tables of the normal distribution as shown in the first section of Chapter 10.
If the Wilcoxon rank-sum test is used, the rank sum from the smaller group, RA, in this example 38, is the statistic to be referred to tables of critical values for the Wilcoxon rank-sum test to obtain the significance level. If the groups are equal in size, the smaller total is used. If the samples are large, the normal distribution can again be used. The appropriate mean is nAnB/2+nB(nB+1)/2 and the variance is nAnB(nA+nB+1)/12.
The Kruskal-Wallis test is an extension of the Mann-Whitney test to cater to three or more samples. The test statistic has a complicated formula describing essentially the variance of the ranks. It is referred to tables of the chi-squared distribution, which was described in Chapter 7, to obtain the significance level. However, if the groups are too small (less than about 5), the statistic departs from the chi-squared distribution.
Ranks of Paired Data
If the two samples to be compared consist of paired values, the Wilcoxon matched-pairs rank-sum test can be used. Suppose we wish to compare a student’s position in class in a range of subjects for two consecutive years. We are investigating whether there is an overall improvement in Year 2 compared with Year 1. The positions in class are as follows:
The sum of the negative ranks, 5 in this example, is the statistic W, which must be equal to or less than the value in the tables of the Wilcoxon matched-pairs test. The number of pairs, n, is entered as 8. A small extract from the tables is shown below:
The one-tail test is relevant because we are testing for a significant improvement rather than a significant difference, and the value of 5 indicates a significant improvement at the 5% level.
If we have two separate rankings of the same items, there are a number of ranking methods that can be used. One of these employs the Spearman rank correlation coefficient, ρ (Greek letter rho) or rs. I will illustrate the method by imagining seven different restaurants which are compared by two judges. We wish to know whether there is significant difference between the opinions of the two judges. The null hypothesis is that the two orderings are related and could have been drawn from the same population. The judges would therefore have similar opinions of the restaurants. The orderings might appear thus:
If two or more ranks were equal within a judge’s ordering, the mean value, allowing fractions, would be substituted for each; but too many equal ranks render the analysis inappropriate.
The differences between the ranks from the two judges are squared and from the sum of the squares the correlation coefficient, ρ, is calculated. The value of ρ ranges between +1 and –1, with +1 indicating perfect agreement between the two rankings, and –1 indicating exactly opposite rankings.
The coefficient is calculated by
ρ = 1 – 6 x (sum of d2)/(n(n2 –1)) ,
where n is the number of items that are ranked. In our example,
ρ = 1 – 6 x 14/(7(49 –1)) = 0.75 .
This value is referred to published tables of ρ to obtain the significance level. To give an idea of the required levels of ρ, the following table shows the values for a selection of n values and two significance levels:
Our value of 0.75 can be seen to exceed the 5% significance level for a one-tail test but not for a two-tail test. In this example, a one-tail test is appropriate because we are investigating whether our two judges have ranked the restaurants in the same order. The second tail of the distribution is concerned with rankings that correlate but are in the reverse order to each other. We conclude, therefore, that there is evidence, at the 5% level, of agreement between the two rankings.
If n is greater than about 40, a Z-score can be calculated (as shown in Chapter 10) and tables of the normal distribution used to obtain the level of significance. The appropriate normal distribution has a mean of zero and a variance of 1/(n – 1).
There are several other rank correlation coefficients, including the Kendall rank correlation coefficient, τ (Greek letter tau), which are calculated differently but which yield correlation coefficients that are interpreted the same as Spearman’s and to which can be attributed levels of significance.
The word correlation in a strict sense means a linear relationship between two variables, and these ranking methods are also used to examine relationships. Here we have simply used the rank correlation coefficients to compare two samples that could be from the same population. In a sense, there could be considered to be a relationship between the two rankings: we could plot a graph of the rankings of Judge 1 against those of Judge 2. Perfect agreement between the two rankings would give a straight line with a rising slope of unity. If the two judges had given exactly opposite rankings, such a graph would give a straight line with a descending slope of unity. In Part V, we shall deal with relationships and meet ranking again.