Using JMP 12 (2015)
Chapter 8. Summarize Data
The Table Summarize Command
This chapter describes how to create a summary data table, which includes summary statistics such as the mean and median, standard deviation, and minimum and maximum values.
Figure 8.1 Summary Table for Companies.jmp
Contents
Summarize Columns
Create a Summary Table
Add a Statistics Column to an Existing Summary Table
Explanation of Statistics
Example of Creating a Summary Table
Summarize Columns
The Tables > Summary command calculates various summary statistics, including the mean and median, standard deviation, minimum and maximum value, and so on.
In a summary table:
•A single row exists for each level of a grouping variable that you specify. If no grouping variable is specified, a single row exists for the full data table.
•When there are several grouping variables, the table contains rows for each combination of levels of all the grouping variables.
•In addition to one column for each grouping variable, the table contains frequency counts in a column named N Rows with counts for each grouping level.
•The summary table can be linked to its source table. When you select rows in the summary table, the corresponding rows are highlighted in its source table.
•If the source table’s column(s) contain value labels, the value labels are displayed in the new table.
•A summary table is not saved when you close it unless you select File > Save As to give it a name and location.
Create a Summary Table
To create a summary table
1.Open a data table.
2.SelectTables > Summary.
3.Highlight the columns that you want to summarize.
Note: For details about the options in the red triangle menu, see “Columns Filter Menu” in the “JMP Platforms” chapter.
4.Add summary statistics, groups, subgroups, and select any options needed:
‒“Add Summary Statistics”
‒“Use One or More Grouping Columns”
‒“Use Quantile Statistics”
‒“Change the Format of the Statistics Column Name”
‒“Link to the Original Data Table”
‒“Keep the Summary Window Open”
‒“Create a Two-Way Table of Summary Statistics by Adding a Subgroup Variable”
5.Name the summary table by typing a name in the box besideOutput table name.
6.ClickOK.
Add Summary Statistics
You can add columns that display summary statistics (such as mean, standard deviation, median, and so on) for any numeric column in the source table.
1.In the Summary window, highlight the column that you want to use in calculating the statistics.
2.Click theStatisticsbutton.
3.Select one of the standard univariate descriptive statistics from theStatisticsdrop-down menu. The statistics are described in“Explanation of Statistics”.
Use One or More Grouping Columns
If you want the statistics summarized by group, highlight the column(s) that you want to be your grouping variables and click Group to move the variable into the grouping variables list. See “Example of Creating a Summary Table”, for an example. If you add only grouping variables, the summary table shows a count for each group.
To change the order of the grouping variables
To change the order of the grouping variables (ascending or descending order), select a variable in the grouping variable list and click the ascending or descending button (). The icon beside the variable changes to indicate the sorting order.
You can also change the order of the grouping variables using the Value Ordering column property. See “Value Ordering” in the “The Column Info Window” chapter.
To include marginal statistics
To add marginal statistics (for the grouping variables) to the output columns, click the box beside Include marginal statistics. In addition to adding marginal statistics for each grouping variable, JMP adds rows at the end of the table that summarize each level of the first grouping variable. For example, proceed as follows:
1.Open the Companies.jmp sample data table.
2.SelectTables > Summary.
3.SelectProfits ($M)and clickStatistics.
4.SelectMean.
5.SelectTypeandSize Coand clickGroup.
6.SelectInclude marginal statistics.
7.ClickOK(orCreate). SeeFigure 8.2at left.
Figure 8.2 Summary Table with and without Marginal Statistics
Compare the summary table with marginal statistics (at left) to the summary table without marginal statistics (at right). You can see that the marginal statistics are added, and a row showing that there are 32 total Computer and Pharmaceutical companies.
Use Quantile Statistics
To add specific quantile statistics, follow these steps:
1.In the box under For quantile statistics, enter value (%) type the desired quantile value (%) for the first quantile (for example, 25).
2.Select the applicable column and clickStatistics.
3.SelectQuantiles.
4.(Optional) Repeat this process for any additional quantiles.
Change the Format of the Statistics Column Name
To change the format of the statistics column name in the summary table, select from one of the formats in the statistics column name format menu. Table 8.1 illustrates the available options. Assume that you are creating a summary table of the mean profits for a company. Your original column name is Profits ($M).
Table 8.1 Statistics Column Name Format Options and Examples |
|
Option |
Example |
stat (column) |
Mean (Profits ($M)) |
column |
Profits ($M) |
stat of column |
Mean of Profits ($M) |
column stat |
Profits ($M) Mean |
Link to the Original Data Table
You can select whether to link the summary table to the original data table. By default, the Link to original data table option is selected. If you want to edit the data in the summary table, deselect the Link to original data table option. When the summary table is linked to the original data table, you cannot edit the data in the summary table, since that would modify and compromise the original data.
Within linked tables, if you drag columns from the summary table into the column heading of a new column in the original data table, the values are expanded as if they were matched by grouping columns.
Keep the Summary Window Open
If you select the Keep dialog open option, the Summary window remains open after you click Create. Notice that once you select this option, the OK button is replaced by a Create button.
Create a Two-Way Table of Summary Statistics by Adding a Subgroup Variable
1.Highlight the column(s) that you want to be the nested variable(s). These are your “subgroup variable(s).”
2.ClickSubgroupto move the variable(s) into the subgroup list.
3.Highlight the column for which you want statistics summarized by subgroup.
4.In theStatisticslist, select the specific statistic that you want.
5.ClickOK.
For details about the types of statistics, see “Explanation of Statistics”.
Add a Statistics Column to an Existing Summary Table
After you have created a summary table, you can add columns of descriptive summary statistics for any numeric column in the source table. To do so, from an existing summary table, click on the upper red triangle in the data grid and select Add Statistics Column.
Example of Adding a Statistics Column to an Existing Table
Suppose that you have already created a summary table, and you want to add more statistics to the existing summary table.
1.Open the Companies.jmp sample data table.
2.SelectTables > Summary.
3.SelectTypeandSize Coand clickGroup.
4.ClickOK.
5.From the red triangle menu in the upper left corner of the data table grid, selectAdd Statistics Column.
Figure 8.3 Creating a Summary Statistics Column from Within a Data Table
A modified version of the Summary window appears.
6.Select the column that you want, clickStatistics, and select the specific statistic that you want. For this example, selectprofit/empand clickStatistics, and then selectMean.
7.ClickOK.
Figure 8.4 Example of a Summary Table with a Summary Statistics Column
The Mean(profit/emp) column is added to the existing summary table.
Explanation of Statistics
You can add columns of descriptive summary statistics for any numeric column in the source table by clicking the Statistics button and making a selection from the menu.
The Statistics menu gives these summary statistics for numeric columns:
N
The number of nonmissing values.
Mean
The arithmetic mean of a column’s values. It is the sum of nonmissing values (and if defined, multiplied by the weight variable) divided by the Sum Wgt.
Std Dev
The sample standard deviation, computed for the nonmissing values. It is the square root of the sample variance.
Min
The smallest nonmissing value in a column.
Max
The largest nonmissing value in a column.
Range
The difference between Max and Min.
% of Total
The percent of the total count for each group. Or, if you have so specified, the percent of nonmissing values of the column to the total count for each group.
N Missing
The number of missing values.
N Categories
The number of distinct categories.
Sum
The sum of all values in a column.
Sum Wgt
The sum of all weight values in a column. (See “Column Properties” in the “The Column Info Window” chapter.) Or, if no column is assigned the weight role, Sum Wgt is the total number of nonmissing values.
Variance
The sample variance, computed for the nonmissing values. It is the sum of squared deviations from the mean, divided by the number of nonmissing values minus one.
Std Err
The standard error of the mean. It is the standard deviation divided by the square root of N. If a column is assigned the role of weight, then the denominator is the square root of the sum of the weights.
CV (Coefficient of Variation)
The measure of dispersion, which is the standard deviation divided by the mean multiplied by one hundred.
Median
The 50th percentile, which is the value where half the data are below and half are above or equal to the 50th quantile (median).
Interquartile Range
The difference between the third and first quartiles.
Quantiles
the value at which the specific percentage of the argument is less than or equal to. For example, 75% of the data is less than the 75th quantile. The summary window has an edit box for entering the quantile percentage that you want.
Histogram
Generates histograms for different groups. Images of the histograms are saved in data table Expression columns.
Example of Creating a Summary Table
Suppose a researcher is working with Companies.jmp, which groups companies by Type and Size. Follow along with this next example by opening Companies.jmp from the sample data folder that was installed when you installed JMP.
Suppose the researcher wants to:
•Create a table that shows the average profit per employee for small, medium, and big computer and pharmaceutical companies. In other words, create a table that contains a row for each size company and a column for the mean profit per employee of each type of company.
•Create it so the cells hold the mean for the subgroup (defined by the intersection of the row and column).
1.Open the Companies.jmp sample data table.
2.SelectTables > Summary.
3.SelectSize Coand clickGroup.
The researcher selects Size Co as the grouping variable because he wants the values in that column to become rows in the new table.
4.Selectprofit/empand clickStatistics.
5.SelectMean.
6.SelectTypeand clickSubgroup.
This tells JMP to create a column for the average profit per employee (Mean(profit/emp)) for each level (computer, pharmaceutical) of subgroup variable (type).
Figure 8.5 shows the completed Summary window and the resulting summary table.
Figure 8.5 Summary Statistics for a Subgroup