# Excel Data Analysis For Dummies, 2nd Edition (2014)

### Part IV. The Part of Tens

Visit *www.dummies.com/extras/exceldataanalysis* for ten tips on better big-data analysis.

*In this part …*

· Buff up your basic statistics skills so you’re more easily and more comfortably doing data analysis with Excel.

· Boost your effectiveness in analyzing data and communicating the results with clever tricks and techniques.

· Get secrets for visually analyzing and presenting your data.

### Chapter 13. Ten Things You Ought to Know about Statistics

*In This Chapter*

Descriptive statistics are straightforward

Averages aren’t so simple sometimes

Standard deviations describe dispersion

Probability distribution functions aren’t always confusing

Parameters aren’t so complicated

Skewness and kurtosis describe a probability distribution’s shape

An observation is an observation

A sample is a subset of values

Inferential statistics are cool but complicated

Confidence intervals are super-useful

In as much that I discuss how to use Excel for statistical analysis in a number of chapters in this book, I thought it might make sense to cover some of the basics.

Don't worry. I’m not going to launch into some college-level lecture about things like chi-square or covariance calculations. You'll see no Greek symbols in this chapter.

If you've never been exposed to statistics in school or it’s been a decade or two since you were, let this chapter to help you use (comfortably) some of the statistical tools that Excel provides.

*Descriptive Statistics Are Straightforward*

The first thing that you ought to know is that some statistical analysis and some statistical measures are pretty darn straightforward. Descriptive statistics, which include things such as the pivot table cross-tabulations (that I present in Chapters *3* and *4*), as well as some of the statistical functions, make sense even to somebody who’s not all that quantitative.

For example, if you sum a set of values, you get a sum. Pretty easy, right? And if you find the biggest value or the smallest value in a set of numbers, that's pretty straightforward, too.

I mention this point about descriptive statistics because a lot of times people freak out when they hear the word *statistics.* That’s too bad because many of the most useful statistical tools available to you are simple, easy-to-understand descriptive statistics.

*Averages Aren’t So Simple Sometimes*

Here’s a weird thing that you might remember if you ever took a statistics class. When someone uses the term *average,* what he usually refers to is the most common average measurement, which is a *mean.* But you ought to know that several other commonly accepted average measurements exist, including mode, median, and some special mean measurements such as the geometric mean and harmonic mean.

I want to quickly cover some of these … not because you need to know all this stuff, but because understanding that the term *average* is imprecise makes some of the discussions in this book and much of Excel's statistical functionality more comprehensible.

To make this discussion more concrete, assume that you’re looking at a small set of values: 1, 2, 3, 4, and 5. As you might know, or be able to intuitively guess, the mean in this small set of values is 3. You can calculate the mean by adding together all the numbers in the set (1+2+3+4+5) and then dividing this sum (15) by the total number of values in the set (5).

Two other common average measurements are mode and median. I start with the discussion of the median measurement first because it’s easy to understand using the data set that I introduce in the preceding paragraph. The *median value* is the value that separates the largest values from the smallest values. In the data set 1, 2, 3, 4, and 5, the median is 3. The value 3 separates the largest values (4 and 5) from the smallest values (1 and 2). In other words, the median shows the middle point in the data. Half of the data set values are larger than the median value, and half of the data set values are smaller than the median value.

When you have an even number of values in your data set, you calculate the median by averaging the two middle values. For example, the data set 1, 2, 3, and 4 has no middle value. Add the two middle values — 2 and 3 — and then divide by 2. This calculation produces a median value of 2.5. With the median value of 2.5, half of the values in the data set are above the median value, and half of the values in the data set are below the median value.

The mode measurement is a third common average. The *mode* is the most common value in the data set. To show you an example of this, I need to introduce a new data set. With the data set 1, 2, 3, 5, and 5, the mode is 5 because the value *5* occurs twice in the set. Every other value occurs only once.

As I mention earlier, other common statistical measures of the average exist. The mean measurement that I refer to earlier in this discussion is actually an arithmetic mean because the values in the data set get added together arithmetically as part of the calculation. You can, however, combine the values in other ways. Financial analysts and scientists sometimes use a geometric mean, for example. There is also something called a harmonic mean.

You don't need to understand all these other different average measurements, but you should remember that the term *average* is pretty imprecise. And what people usually imply when they refer to an average is the *mean.*

*Standard Deviations Describe Dispersion*

Have you ever heard the term *standard deviation?* You probably have. Any statistical report usually includes some vague or scary reference to either standard deviation or its close relative, the variance. Although the formula for standard deviation is terrifying to look at — at least if you're not comfortable with the Greek alphabet — intuitively, the formula and the logic are pretty easy to understand.

A *standard deviation* describes how values in a data set vary around the mean. Another way to say this same thing is that a standard deviation describes how far away from the mean the average value is. In fact, you can almost think of a standard deviation as being equal to the average distance from the mean. This isn't quite right, but it’s darn close.

Suppose you’re working with a data set, and its mean equals 20. If the data set standard deviation is 5, you can sort of think about the average data set value as being 5 units away from the mean of 20. In other words, for values less than the mean of 20, the average is sort of 15. And for values that are larger than the mean, the average value is kind of 25.

The standard deviation isn’t really the same thing as the average deviation, but it's pretty darn close in some cases. And thinking about the standard deviation as akin to the average deviation — or average difference from the mean — is a good way to tune into the logic.

The neat thing about all this is that with statistical measures like a mean and a standard deviation, you often gain real insights into the characteristics of the data that you’re looking at. Another thing is that with these two bits of data, you can often draw inferences about data by looking at samples.

I should tell you one other thing about the standard deviation. The statistical terms *variance* and *standard deviation* are related. A *standard deviation* equals the square root of a variance. Another way to say this same thing is that a *variance* equals the square of a standard deviation.

It turns out that when you calculate things such as variances and standard deviations, you actually arrive at the variance value first. In other words, you calculate the variance before you calculate the standard deviation. For this reason, you'll often hear people talk about variances rather than standard deviations. Really, however, standard deviations and variances are almost the same thing. In one case, you're working with a square root. In another case you are working with a square.

It’s six of one, half a dozen of the other … sort of.

*An Observation Is an Observation*

*Observation* is one of the terms that you’ll encounter if you read anything about statistics in this book or in the Excel online Help. An observation is just an observation. That sounds circular, but bear with me. Suppose that you're constructing a data set that shows daily high temperatures in your neighborhood. When you go out and observe that the temperature some fine July afternoon is 87° F, that measurement (87°) is your first observation. If you go out and observe that the high temperature the next day is 88° F, that measurement is your second observation.

Another way to define the term observation is like this: Whenever you actually assign a value to one of your random variables, you create an observation. For example, if you’re building a data set of daily high temperatures in your neighborhood, every time that you go out and assign a new temperature value (87° one day, 88° the next day, and so on) you're creating an observation.

*A Sample Is a Subset of Values*

A *sample* is a collection of observations from a population. For example, if you create a data set that records the daily high temperature in your neighborhood, your little collection of observations is a sample.

In comparison, a sample is not a population. A *population* includes all the possible observations. In the case of collecting your neighborhood’s high temperatures, the population includes all the daily high temperatures — since the beginning of the neighborhood’s existence.

*Inferential Statistics Are Cool but Complicated*

As I note earlier in this chapter, some statistics are pretty simple. Finding the biggest value in a set of numbers is a *statistical measurement.* But it’s really pretty simple. Those simple descriptive statistical measures are called, cleverly, *descriptive statistics.*

Another more complicated but equally useful branch of statistics is *inferential statistics.* Inferential statistics are based on this very useful, intuitively obvious idea. If you look at a sample of values from a population and the sample is representative and large enough, you can draw conclusions about the population based on characteristics of the sample.

For example, for every presidential election in the United States, the major television networks (usually contrary to their earlier promises) predict the winner after only a relatively small number of votes have been calculated or counted. How do they do this? Well, they sample the population. Specifically, they stand outside polling places and ask exiting voters how they voted. If you ask a large sample of voters whether they voted for the one guy or the other guy, you can make an inference about how all the voters voted. And then you can predict who has won the election.

Inferential statistics, although very powerful, possess two qualities that I need to mention:

· **Accuracy issues:** When you make a statistical inference, you can never be 100 percent sure that your inference is correct. The possibility always exists that your sample isn't representative or that your sample doesn't return enough precision to estimate the population value.

This is partly what happened with the 2000 presidential election in the United States. Initially, some of the major news networks predicted that Al Gore had won based on exit polls. Then based on other exit polls, they predicted that George W. Bush had won. Then, perhaps finally realizing that maybe their statistics weren't good enough given the closeness of the race … or perhaps just based on their own embarrassment about bobbling the ball … they stopped predicting the race. In retrospect, it’s not surprising that they had trouble calling the race because the number of votes for the two candidates was *extremely* close.

· **Steep learning curve:** Inferential statistics quickly gets pretty complicated. When you work with inferential statistics, you immediately start encountering terms such as *probability distribution functions,* all sorts of crazy (in some cases) parameters, and lots of Greek symbols.

As a practical matter, if you haven't at least taken a statistics class — and probably more than one statistics class — you'll find it very hard to move into inferential statistics in a big way. You probably can, with a single statistics class and perhaps the information in this book, work with inferential statistics based on normal distributions and uniform distributions. However, working with inferential statistics and applying those inferential statistics to other probability distributions becomes very tricky. At least, that's my observation.

*Probability Distribution Functions Aren't Always Confusing*

One of the statistical terms that you’ll encounter a little bit in this book — and a whole bunch if you dig into the Excel Help file — is *probability distribution function.* This phrase sounds pretty tricky; in some cases, granted, maybe it is. But you can actually understand intuitively what a probability distribution function is with a couple of useful examples.

One common distribution that you hear about in statistics classes, for example, is a T distribution. A *T distribution* is essentially a normal distribution except with heavier, fatter tails. There are also distributions that are skewed (have the hump tilted) one way or the other. Each of these probability distributions, however, has a probability distribution function that describes the probability distribution chart.

Here are two probability distribution functions that you probably already understand: uniform distribution and normal distribution.

*Uniform distribution*

One common probability distribution function is a uniform distribution. In a *uniform distribution,* every event has the same probability of occurrence. As a simple example, suppose that you roll a six-sided die. Assuming that the die is fair, you have an equal chance of rolling any of the values: 1, 2, 3, 4, 5, or 6. If you roll the die 60,000 times, what you would expect to see (given the large number of observations) is that you’ll probably roll a 1 about 10,000 times. Similarly, you’ll probably also roll a 2, 3, 4, 5, or 6 about 10,000 times each. Oh sure, you can count on some variance between what you expect (10,000 occurrences of each side of the six-sided die) and what you actually experience. But your actual observations would pretty well map to your expectations.

The unique thing about this distribution is that everything is pretty darn level. You could say that the probability or the chance of rolling any one of the six sides of the die is even, or *uniform.* This is how uniform distribution gets its name. Every event has the same probability of occurrence. Figure *13-1* shows a uniform distribution function.

**Figure 13-1:** A uniform distribution function.

*Normal distribution*

Another common type of probability distribution function is the *normal distribution,* also known as a *bell curve* or a *Gaussian distribution.*

A normal distribution occurs naturally in many situations. For example, intelligence quotients (IQs) are distributed normally. If you take a large set of people, test their IQs, and then plot those IQs on a chart, you get a normal distribution. One characteristic of a normal distribution is that most of the values in the population are centered around the mean. Another characteristic of a normal distribution is that the mean, the mode, and the median all equal each other.

Do you kind of see now where this probability distribution function business is going? A probability distribution function just describes a chart that, in essence, plots probabilities. Figure *13-2* shows a normal distribution function.

**Figure 13-2:** A normal distribution function.

A probability distribution function is just a function, or equation, that describes the line of the distribution. As you might guess, not every probability distribution looks like a normal distribution or a uniform distribution.

*Parameters Aren't So Complicated*

After you grasp the concept that a probability distribution function is essentially an equation or formula that describes the line in a probability distribution chart, it’s pretty easy to understand that a *parameter* is an input to the probability distribution function. In other words, the formula or function or equation that describes a probability distribution curve needs inputs. In statistics, those inputs are called parameters.

Refer to Figure *13-2* to see its probability function. Most of those crazy Greek symbols refer to parameters.

Some probability distribution functions need only a single simple parameter. For example, to work with a uniform distribution, all you really need is the number of values in the data set. A six-sided die, for example, has only six possibilities. Because you know that only six possibilities exist, you can pretty easily calculate that there’s a 1-in-6 chance that any possibility will occur.

A normal distribution uses two parameters: the mean and the standard deviation.

Other probability distribution functions use other parameters.

*Skewness and Kurtosis Describe a Probability Distribution’s Shape*

A couple of other useful statistical terms to know are skewness and kurtosis. *Skewness* quantifies the lack of symmetry in a probability distribution. In a perfectly symmetrical distribution, like the normal distribution (refer to Figure *13-2*), the skewness equals zero. If a probability distribution leans to the right or the left, however, the skewness equals some value other than zero, and the value quantifies the lack of symmetry.

*Kurtosis* quantifies the heaviness of the tails in a distribution. In a normal distribution, kurtosis equals zero. In other words, zero is the measurement for a tail that looks like a tail in a normal distribution. The *tail* is the thing that reaches out to the left or right. However, if a tail in a distribution is heavier than a normal distribution, the kurtosis is a positive number. If the tails in a distribution are skinnier than in a normal distribution, the kurtosis is a negative number.

*Confidence Intervals Seem Complicated at First, but Are Useful*

Probabilities often confuse people, and perhaps this happens most when during the U.S. presidential elections. Pundits talk in all sorts of confusing ways about one candidate’s chances of winning (often in ways confusing to even the pundits themselves).

Say, for example, some talking head on television says “The results of a recent poll show that Barack Obama would receive 51% of the vote were the election held today; the margin of error was +/- 3% with a confidence level of 95%.”

Okay, this sounds like a mouthful, but break it down and things get a little clearer. What the survey really means is this: the pollsters took a sample of the U.S. population and asked them who they would vote for today, and 51% of the sample said they would vote for Mr. Obama.

Now here’s where this gets interesting. Largely because of the size of the sample, the pollsters can do some fancy math and infer that there’s sort-of a 95% chance (more on this later) that the real percent of people who would answer “Obama” in the entire population is between 48% and 54%. Note “margin of error” is basically just another way to describe the confidence interval.

An important thing to understand about confidence levels is that they’re linked with the margin of error. If the pollsters in the example above had wanted a range of values with a confidence level of 99%, the margin of error they calculated would be larger.

To put it another way, perhaps there’s a 95% chance (sort-of) that the real percent of people in the whole population who would answer “Obama” is between 48% and 54%, but there’s a 99% chance (again, sort-of) that the real percent of people with that answer is between 45% and 57%. The wider your range of possible values, the more confidence you have that the real data point falls within your range. Conversely, the more confident you want to be that the real data point is included in your range, the wider you have to make your range.

This is why it’s a bit of a pet peeve of mine that news organizations reporting on polls will often report the margin of error for a poll, but not the confidence level. Without knowing the confidence level the pollster used to calculate the margin of error, the information on margin of error is pretty meaningless.

Another important thing to understand about confidence levels is that the bigger you make your sample size, the smaller your margin of error will be using the same confidence level. If you sample two people on the sidewalk by asking them who they’re going to vote for, and one says “the challenger” and one says “the incumbent,” you can’t then assert with much confidence that when the whole country votes it will be a perfect 50-50 split. Data from this sample would have an enormous margin of error, unless you use an incredibly low confidence level for your calculations.

However, if you go out and randomly sample 5,000 people by asking who they’re voting for, then you have some pretty solid ground to stand on when making a prediction about who’s going to win the presidential race. Put another way, a sample of 5,000 people leads to a much smaller margin of error than a sample of 2, assuming you want for both samples the same level of confidence for your range.

At this point, I should make a slight correction: When I said that what the confidence interval means is that there’s a “95% chance the real number falls within this range,” that’s not quite accurate, although it was easier to use as an explanation when first describing the basic concept of a confidence interval. What an interval with 95% confidence really means, *technically*, is that if, hypothetically, you were to take different samples from the same population over and over and over again, and then you calculated the confidence interval for those samples in the exact same way for each new sample, about 95% of the time the confidence intervals you calculated from the samples would include the real number (because your data from each sample will be slightly different each time, and therefore the interval you calculate as well). So when I say phrases like “95% chance” or “99% chance,” that’s what I really meant. (I need to include this clarification so that my old statistics professors don’t start shaking their heads in shame if they read this book.)

And my final point is this: Predicting election results is far from the only useful thing you can do with confidence intervals. As just one example, say you had some Google Analytics data on two different web ads you’re running to promote your small business, and you want to know which ad is more effective. You can use the confidence interval formula to figure out how long your ads need to run before Google’s collected enough data for you to know which ad is really better. (In other words, the formula tells you how big your sample size needs to be to overcome the margin of error.)