Glossary of Data Analysis and Excel Terms - Excel Data Analysis For Dummies, 2nd Edition (2014)

Excel Data Analysis For Dummies, 2nd Edition (2014)

Appendix. Glossary of Data Analysis and Excel Terms

3-D pie charts: Perhaps the very worst way to share the results of your data analysis — and often inexcusable.

absolute reference: A cell address used in a formula that Excel doesn’t adjust if you copy the formula to some new location. To create an absolute cell reference, you precede the column letter and row number with a dollar sign ($).

Access: A database program developed and sold by Microsoft. Use Access to build and work with large, sophisticated, relational databases; you can easily export information from Access databases to Excel. Just choose the Access Microsoft Office menu’s Export command.

arithmetic operators: The standard operators that you use in Excel formulas. To add numbers, use the addition (+) operator. To subtract numbers, use the subtraction (–) operator. To multiply numbers, use the multiplication (*) operator. To divide numbers, use the division (/) operator. You can also perform exponential operations by using the exponential operator (^). See operator precedence.

ascending order: A sorting option that alphabetizes labels and arranges values in smallest-value-to-largest-value order. See also chronological order; descending order.

ASCII text file: A type of text file that in essence is just straight text and nothing else. See also delimited text file; importing.

AutoFilter: An Excel tool (available from the Data tab’s Filter command) that helps you produce a new table that’s a subset of your original table. For example, in the case of a grocery list table, you could use AutoFilter to create a subset that shows only those items that you’ll purchase at a particular store, in specified quantities, or that exceed a certain price.

average: Typically, the arithmetic mean for a set of values. Excel supplies several averaging functions. See also median; mode.

binomial distributions: Used to calculate probabilities in situations in which you have a limited number of independent trials, or tests, which can either succeed or fail. Success or failure of any one trial is independent of other trials.

Boolean expressions: Also known as logical expressions, these expressions describe a comparison that you want to make. For example, to compare fields with the value 1,000,000, you use a Boolean expression. To construct a Boolean expression, you use a comparison operator and then a value used in the comparison.

calculated field: Used to insert a new row or column into a pivot table and then fill the new row or column with a formula.

calculated item: An amount shown in a pivot table that you create by calculating a formula. Frankly, adding a calculated item usually doesn't make any sense. But, hey, strange things happen all the time, right?

cells: In Excel, the intersections of rows and columns. A cell location is described using the column letter and row number. For example, the cell in the upper-left corner of the workbook is labeled A1.

chart data labels: Annotate data markers with pivot table information or list information.

chart legend: Identifies the data series that you plot in your chart.

chart titles: Text that you use to label the parts of a chart.

chart type: Includes column, bar, line, pie, XY, surface, and so on. In Excel, you can create a bunch of different types of charts.

chi-square: Used to compare observed values with expected values, returning the level of significance, or probability (also called a p-value). That p-value lets you assess whether differences between the observed and expected values represent chance.

chronological order: A sorting option that arranges labels or values in chronological order such as Monday, Tuesday, Wednesday, and so on. See also ascending order; descending order.

comparison operator: A mathematical operator used in a Boolean expression. For example, the > comparison operator makes greater than comparisons. The = operator makes equal to comparisons. The <= operator makes less than or equal to comparisons. Cool, huh? See also Boolean expressions.

counting: Used for useful statistical functions for counting cells within a worksheet or list. Excel provides four counting functions: COUNT, COUNTA, COUNTBLANK, and COUNTIF. Excel also provides two useful functions for counting permutations and combinations: PERMUT and COMBIN.

cross-tabulation: An analysis technique that summarizes data in two or more ways. For example, if you run a business and summarize sales information both by customer and by product, that’s a cross-tabulation because you tabulate the information in two ways. See also pivot table.

Custom Calculations: Used to add many semi-standard calculations to a pivot table. By using Custom Calculations, for example, you can calculate the difference between two pivot table cells, percentages, and percentage differences. See also pivot table.

Data Analysis: An Excel add-in with which you perform statistical analysis.

data category: Organizes the values in a data series. That sounds complicated; however, in many charts, identifying the data category is easy. In any chart (including a pivot chart) that shows how some value changes over time, the data category is time. See also data series.

data list: Another name for an Excel table.

data series: Oh geez, this is another one of those situations where somebody’s taken a ten-cent idea and labeled it with a five-dollar word. Charts show data series. And a chart legend names the data series that a chart shows. For example, if you want to plot sales of coffee products, those coffee products are your data series. See also data category.

data validation: An Excel command with which you describe what information can be entered into a cell. The command also enables you to supply messages that give data input information and error messages to help users correct data entry errors.

database functions: A special set of functions for simple statistical analysis of information that you store in Excel tables.

delimited text file: A type of text file. Delimited text files use special characters, called delimiters, to separate fields of information in the report. For example, such files commonly use the Tab character to delimit. See also ASCII text file; importing.

descending order: A sorting order that arranges labels in reverse alphabetical order and values in largest-value-to-smallest-value order. See also ascending order; chronological order.

descriptive statistics: Describe the values in a set. For example, if you sum a set of values, that sum is a descriptive statistic. If you find the largest value or the smallest value in a set of numbers, that’s also a descriptive statistic.

exponential smoothing: Calculates the moving average but weights the values included in the moving average calculations so that more recent values have a bigger effect. See also moving average.

exporting: In the context of databases, moving information to another application. If you tell your accounting system to export a list of vendors that Excel can later read, for example, you’re exporting. Many business applications, by the way, do easily export data to Excel. See alsoimporting.

F distributions: Compare the ratio in variances of samples drawn from different populations and draw a conclusion about whether the variances in the underlying populations resemble each other.

field: In a database, stores the same sort of information. In a database that stores people’s names and addresses, for example, you’ll probably find a Street Address field. In Excel, by the way, each column shows a particular sort of information and therefore represents a field.

field settings: Determine what Excel does with a field when it’s cross-tabulated in the pivot table. See also cross-tabulation; pivot table.

formulas: Calculation instructions entered into worksheet cells. Essentially, this business about formulas going into workbook cells is the heart of Excel. Even if an Excel workbook did nothing else, it would still be an extremely valuable tool. In fact, the first spreadsheet programs did little more than calculate cell formulas. See also text labels; value.

function: A pre-built formula that you can use to more simply calculate some amount, such as an average or standard deviation.

function arguments: Needed in most functions; also called inputs. All database functions need arguments, which you include inside parentheses. If a function needs more than one argument, you separate arguments by using commas. See also database functions.

header row: A top row of field names in your table range selection that names the fields.

histogram: A chart that shows a frequency distribution.

importing: In the context of databases, grabbing information from some other application. Excel rather easily imports information from popular databases (such as Microsoft Access), other spreadsheets (such as Lotus 1-2-3), and from text files. See also exporting.

inferential statistics: Based on a very useful, intuitively obvious idea that if you look at a sample of values from a population and if the sample is representative and large enough, you can draw conclusions about the population based on characteristics of the sample.

kurtosis: A measure of the tails in a distribution of values. See also skewness.

list: Another name for table, a list is, well, a list. This definition sounds circular, I guess, but if you make a list (sorry) of the things that you want to buy at the grocery store, that’s a list. Excel lists, or tables, usually store more information than just names of items. Usually, Excel tables also store values. In the case of a grocery list, the Excel table might include prices and quantities of the items that you’re shopping for.

logarithmic scale: Used in a chart to view rates of change, rather than absolute changes, in your plotted data.

median: The middle value in a set of values. Half the values fall below the median, and half the values fall above the median. See also average; mode.

Microsoft Access: See Access.

Microsoft Query: See Query.

mode: The most common value in a set. See also average; median.

moving average: An average that’s calculated by using only a specified set of values, such as an average based on just the last three values. See also exponential smoothing.

normal distribution: The infamous bell curve. Also known as a Gaussian distribution.

objective function: The formula that you want to optimize when performing optimization modeling. In the case of a profit formula, for example, you want to maximize a function. But some objective functions should be minimized. For example, if you create an objective function that describes the cost of some advertising program or the risk of some investment program, you may logically choose to minimize your costs or risks. See also optimization modeling.

observation: Suppose that you're constructing a data set that shows daily high temperatures in your neighborhood. When you go out and observe that the temperature some fine July afternoon is 87°, that measurement is your observation.

operator precedence: Standard rules that determine the order of arithmetic operations in a formula. For example, exponential operations are performed first. Multiplication and division operations are performed second. Addition and subtraction operations are performed third. To override these standard rules, use parentheses. See also formulas.

optimization modeling: A problem-solving technique in which you look for the optimum value of an objective function while explicitly recognizing constraints. See also objective function.

parameter: An input to a probability distribution function.

phantom data marker: Some extra visual element on a chart that exaggerates the chart message or misleads the chart viewer. Usually, phantom data markers are embellishments that someone has added (hopefully, not you!) that sort of resemble the chart’s real data markers — especially to the eyes of casual chart viewers.

pivot chart: A cross-tabulation that appears in a chart. See also cross-tabulation.

pivoting and re-pivoting: The thing that gives the pivot table its name. You can continue cross-tabulating the data in the pivot table. You can pivot, and re-pivot, and re-pivot again… .

pivot table: Perhaps the most powerful analytical tool that Excel provides. Use the PivotTable command to cross-tabulate data stored in Excel lists. See also cross-tabulation.

primary key: In sorting, the field first used to sort records. See also secondary key; sort; and if you’re really interested, tertiary key.

probability distribution: A chart that plots probabilities. See also normal distribution; uniform distribution.

probability distribution function: An equation that describes the line of the probability distribution. See also probability distribution.

p-value: The level of significance, or probability.

Query: A program that comes with Excel. Use Query to extract information from a database and then place the results into an Excel workbook.

QuickBooks: The world’s most popular small business accounting program — and one of the many business applications that easily, happily, and without complaint exports information to Excel. In QuickBooks, for example, you simply click a button cleverly labeled Excel.

range: In terms of Excel data analysis, refers to two different items. A range can be a reference to a rectangle of cells in a worksheet, or a range can show the difference between the largest and smallest values in the data set.

record: A collection of related fields in a table. In Excel, each record goes into a separate row.

refreshing pivot data: Updating the information shown in a pivot table or pivot chart to reflect changes in the underlying data. You can click the Refresh data tool provided by the PivotTable toolbar button to refresh.

regression analysis: Plotting pairs of independent and dependent variables in an XY chart and then finding a linear or exponential equation that best describes the plotted data.

relational database: Essentially, a collection of tables or lists. See also table; list.

relative reference: A cell reference used in a formula that Excel adjusts if you copy the formula to a new cell location. See also absolute reference.

scatter plot: An XY chart that visually compares pairs of values. A scatter plot is often a good first step when you want to perform regression analysis. See also regression analysis.

secondary key: In sorting, the second field used to sort records. The secondary key comes into play only when the primary keys of records have the same value. See also primary key; sort.

skewness: A measure of the symmetry of a distribution of values. See also kurtosis.

solve order: The order in which calculated item formulas should be solved. See also calculated item.

Solver: An Excel add-in with which you perform optimization modeling. See also optimization modeling.

Solver variables: The variables in an optimization modeling problem. See optimization modeling.

sort: To arrange list records in some particular order, such as alphabetically by last name. Excel includes easy-to-use tools for doing this, by the way.

standard deviation: Describes dispersion about the data set’s mean. You can think of a standard deviation as an average deviation from the mean. See also average; variance.

table: In relational databases and also in Excel, where information is stored. Tables are essentially spreadsheets, or lists, that store database information.

tertiary key: In sorting, the third field used to sort records. The tertiary key comes into play only when the primary and secondary keys of records have the same value. See also primary key; secondary key; sort.

text file: A file that’s all text. Many programs export text files, by the way, because other programs (including Excel) often easily import text files.

text functions: Used to manipulate text strings in ways that enable you to easily rearrange and manipulate the data that you import into an Excel workbook. Typically, these babies are extremely useful tools for scrubbing or cleaning the data that you want to later analyze.

text labels: Includes letters and numbers that you enter into worksheet cells but that you don't want to use in calculations. For example, your name, a budget expense description, and a telephone number are all examples of text labels. None of these pieces of information get used in calculations.

time-series chart: Shows how values change over time. A chart that shows sales revenues over the last 5 years or profits over the last 12 months, for example, is a time-series chart.

Tufte, Edward: The author of a series of wonderful books about visually analyzing and visually presenting information. I recommend that you read at least one of Tufte's books.

t-value: Sort of like a poor-man’s z-value. When you’re working with small samples — fewer than 30 or 40 items — you can use what’s called a student t-value to calculate probabilities rather than the usual z-value, which is what you work with in the case of normal distributions. Not coincidentally, Excel provides three T distribution functions. See also z-value.

uniform distribution: Having the same probability of occurrence in every event. One common probability distribution function is a uniform distribution.

value: Some bit of data that you enter into a workbook cell and may want to later use in a calculation. For example, the actual amount that you budget for some expense would always be a number or value. See also formulas; text labels; workbook.

variance: Describes dispersion about the data set’s mean. The variance is the square of the standard deviation. Conversely, the standard deviation is the square root of the variance. See also average; standard deviation.

Web query: Grabbing data from a table that’s stored in a web page. Excel provides a very slick tool for doing this, by the way.

workbook: An Excel spreadsheet document or file. A spreadsheet comprises numbered rows and lettered columns. See also cells.

x-values: The independent values in a regression analysis.

y-values: The dependent values in a regression analysis.

z-value: In statistics, describes the distance between a value and the mean in terms of standard deviations. (How often does one get to include a legitimate Z entry in a glossary! Not often, but here I do.) See also average; standard deviation.