Anne Geraci
Instructor – St. John Fisher College

Electronic Portfolio

Going beyond mean and standard deviation:
using Excel pivot tables to mine for data.

### Interpretation of mean

• The “typical” age of people in this sample is ______
• The youngest person in this sample is ______
• The oldest person is ______
• What is the average deviation from the mean?

### Introduction – Pivot Tables

A pivot table report allows us to quickly summarize large amounts of data. Pivot tables are expressly designed for querying large amounts of data as well as subtotaling or summarizing the data using categorical variables.

### How do I create a Pivot table?

• Make sure each column has a field name in Row 1.
• All the data in a particular column should represent the same type of data. In other words, a column that contains numbers should not contain text, and so on
• Data must be contiguous. No empty columns or blank rows.

Step 2. Create your pivot table.

Excel 2007: Insert/Pivot Table/Pivot Table Report
Excel 2003: Data/PivotTable

Step 3. Select your row and column fields

• Click method (2007 Only): Click on the fields you want to see in your pivot table.
• Drag-and-drop Method: Drag field from field list and drop it on the row/column/data area.

If your Field List disappears, click on your pivot table to make it display.

Step 4. Format the table.

Excel doesn’t know if the data you are summarizing is in units of dollars, donuts, or gallons. You will want to choose an appropriate format

### Practice

#### Scenario 1 (Grocery Sales)

Each observation in this data file represents the 2008 quarterly sales revenue for a particular product at this grocery store.

1. What was the average quarterly revenue for a “typical” product at this store? How accurate is that number?
2. Design a pivot table to determine the grand total (for all four quarters) for each particular product? Can you break it up by quarters?
3. Were sales higher during this year for the Beverages category or the Dairy Products category?
4. During which quarter was the highest total sales revenue posted?

#### Scenario 2 (Waiting Times)

Each observation in this data file represents an observation for a customer at a fast food chain. WE have identified each observation with a unique identifier (Order ID), and recorded: Waiting time (in seconds), Time of day (Morning, Lunch, Evening), Size of the order (Small, Medium, Large), Venue (Drive-thru or Counter), Complexity of the order (Easy or Hard) as well as some customer survey data about how satisfied the customer was.

Let’s focus on Waiting time as a metric for the time being.

1. How long did the “typical” customer wait for his/her food? What’s the “average” deviation from that time?
2. Was there any difference in Waiting Time between the Time of day?
3. Was there a difference in the Venue (Drive-thru vs. Counter)?
4. How about the Size or Complexity of the order? Does that affect waiting time?
5. is there any difference in the variability (as measured by standard deviation) in any of these cross-sections?

#### Scenario3 (International Sales)

Each observation in this data file represents a single transaction (called an order) in which a sales representative sold something. Variables include the salesperson’s name, the date and amount of the order, and the division (US or UK) in which the salesperson works.

• Which salesperson sold the most over this time period?
• What was the average amount of an order (e.g. a “typical” order)?
• How do the order amounts break down between the two divisions (US and UK)?
• Which division had more variability (less consistency) in their order amounts?

Each observation in this data file represents a student. Variables include gender and grade.

• What was the average grade? What was the average deviation from that mean?
• Were the grades for the males higher than the females?

Count of Score / Column Labels
Row Labels / Female / Male / Grand Total
70-79
80-89
90-99
Grand Total
• Design a pivot table that looks like this (with the data filled in):