Name ______

Chapter 1 Learning Objectives / Section / Related Example
on Page(s) / Relevant
Chapter Review Exercise(s) / Can I do this?
Identify the individuals and variables in a set of data. / Intro / 3 / R1.1
Classify variables as categorical or quantitative. / Intro / 3 / R1.1
Display categorical data with a bar graph. Decide whether it would be appropriate to make a pie chart. / 1.1 / 9 / R1.2, R1.3
Identify what makes some graphs of categorical data deceptive. / 1.1 / 10 / R1.3
Calculate and display the marginal distribution of a categorical variable from a two-way table. / 1.1 / 13 / R1.4
Calculate and display the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table. / 1.1 / 15 / R1.4
Describe the association between two categorical variables by comparing appropriate conditional distributions. / 1.1 / 17 / R1.5
Make and interpret dotplots and stemplots of quantitative data. / 1.2 / Dotplots: 25
Stemplots: 31 / R1.6
Describe the overall pattern (shape, center, and spread) of a distribution and identify any major departures from the pattern (outliers). / 1.2 / Dotplots: 26 / R1.6, R1.9
Identify the shape of a distribution from a graph as roughly symmetric or skewed. / 1.2 / 28 / R1.6, R1.7, R1.8, R1.9
Make and interpret histograms of quantitative data. / 1.2 / 33 / R1.7, R1.8
Compare distributions of quantitative data using dotplots, stemplots, or histograms. / 1.2 / 30 / R1.8, R1.10
Calculate measures of center (mean, median). / 1.3 / Mean: 49
Median: 52 / R1.6
Calculate and interpret measures of spread (range, IQR, standard deviation). / 1.3 / IQR: 55
Std. dev: 60 / R1.9
Choose the most appropriate measure of center and spread in a given setting. / 1.3 / 65 / R1.7
Identify outliers using the 1.5 × IQR rule. / 1.3 / 56 / R1.6, R1.7, R1.9
Make and interpret boxplots of quantitative data. / 1.3 / 57 / R1.7
Use appropriate graphs and numerical summaries to compare distributions of quantitative variables. / 1.3 / 65 / R1.8, R1.10

1.1 Analyzing Categorical Data

Read 2–4

Fr/Soph/Jr/Srg.p.a

Email address

Name

Bus route

Phone number

Days absent

Address

Credits earned

Allergies

Current on immunizations

Exterior colormileage

Total car length

Number of cylinders

Cost

Model

VIN

Type of sound system

Size of fuel tank

What do we call these two kinds of variables? What’s the difference?

Why do people sometimes confuse the two kinds of variables?

What is a distribution? It’s all the values that a variable can take on and how often.

Alternate Example: Willott’s music

Here is information about 12 randomly selected songs in Willott’s music library.

Song Title / Artist / Album year / Track Length / Genre / Tracks on the album / Track Number
Double Dare / Bauhaus / 1980 / 4:54 / Gothic / 9 / 1
Carpe Noctum / Tiesto / 2007 / 7:03 / Dance/Electronic / 12 / 4
She Wolf / Shakira / 2009 / 3:10 / Latin / 12 / 1
Come as You Are / Nirvana / 1991 / 3:39 / Alternative / 12 / 3
The Heinrich Maneuver / Interpol / 2007 / 3:35 / Alternative / 11 / 4
Shake It Out / Florence + The Machine / 2011 / 4:38 / Alternative / 12 / 2
My Songs Know What You Did in the Dark (Light Em Up) / Fall Out Boy / 2013 / 3:07 / Alternative / 11 / 2
Locked Out of Heaven / Bruno Mars / 2012 / 3:53 / Pop / 10 / 2
Womanizer / Britney Spears / 2008 / 3:44 / Pop / 13 / 1
Iceolate / Front Line Assembly / 1990 / 5:13 / Industrial / 10 / 7
I Bet You Look Good On The Dancefloor / Arctic Monkeys / 2006 / 2:54 / Indie / 13 / 2
Meat is Murder / The Smiths / 1985 / 6:06 / Alternative / 9 / 9

(a)Who are the individuals in this data set?

(b)What variables are measured? Identify each as categorical or quantitative. In what units were the quantitative variables measured?

(c)Describe the individual in the first row.

Read 7–11

What's the difference between a data table, a frequency table, and a relative frequency table?

Data table / Frequency table / Relative frequency table
tells values of variables for individuals / tells distribution of 1 variable in table form / tells distribution of 1 variable as a %, decimal, or fraction

Which one was the previous example?

When making pie charts and bar graphs, what do people often mess up?

Bar Graphs / Pie Charts
Pros / Quick & easy / Show part-whole relationships well
Cons / part-wholerelationships arehard to see / They’re hard to make by hand.
Don't use whenpercents don't add up to 100%.

Let's search "misleading graph" and see some examples.

Identify some particular problems many of these graphs share.

HW #11: page 7 (1, 3, 5, 7, 8), page 22 (11, 13, 15, 17, 18)

Read 12–18

Examples of:

…two-way table (2 variables are shown with counts or frequencies)

Senior / Non-senior
Boy / 8 / 3
Girl / 15 / 4

…marginal distribution (totals for rows & columns; the distribution for each variable)

Senior / Non-senior / Totals
Boy / 8 / 3 / 11
Girl / 15 / 4 / 19
Totals / 23 / 7 / 30

…conditional distribution (distribution of one variable as a % of the other variable)

Senior / Non-senior
Boy / 35% / 43%
Girl / 65% / 57%
Totals / 100% / 100%
Senior / Non-senior / Totals
Boy / 73% / 27% / 100%
Girl / 79% / 21% / 100%

How do we know which variable to condition on? Divide by the explanatory variable totals.

Died / Survived
Hospital A
Hospital B

What is a segmented (or stacked) bar graph?

Use a segmented bar graph to compare conditional distributions, to look for differences, and to look for patterns.

When knowing the value of one variable helps predict the value of the other, we say that the variables are associated. Association appears in a segmented bar graph when we see big differences in the proportions. The proportions may be “flipped” or reversed.

Careful! An association does NOT

automatically mean that there is a

cause-and-effect relationship.

The boy/girl senior/non-senior graphs

did not show much association.

Alternate Example: Horseshoe Crabs

Two members of the University of Florida at Gainesville Department of Zoology collected data on Horseshoe Crabs on a Delaware beach during 4 days in the late spring of 1992. Based on the color of the shells, they classified each crab as Young, Intermediate, or Old and whether the crabs could right themselves when flipped on their backs or whether they were stranded for at least a certain period of time. Here are the results.

Young / Intermediate / Old / Total
Stranded / 214 / 384 / 295 / 893
Not Stranded / 1668 / 1204 / 216 / 3088
Total / 1882 / 1588 / 511 / 3981

(a) Explain what it would mean if there was no association between age andstrandedness.

(b) Does there appear to be an association between age and strandedness in this sample? Justify.

HW #12:page 22(19, 21, 23, 25, 27–34)

And now, we change from categorical data to quantitative data…

1.2 Displaying Quantitative Data with Graphs

Elmer and Ethel have retired and want to move someplace warm. The couple is considering nine different cities. The dotplots below show the distribution of average daily high temperatures in December, January, and February for each of these cities. Help them pick a city by answering the questions below, based on the data shown in the graph.

  1. What is the typical high temperature for these months inPhoenix, Orlando, and San Juan? Which of those 3 cities is most similar in this respect to Palm Springs? (Look for the center: the average, median, or typical value.)
  1. Are daily high temperatures for these months more predictable in Palm Springs or in Orlando? (Look at the spread: the variation, including the range.)
  1. What might be unique to Atlanta, San Diego, and Honolulu? (Look for outliers: unusual values.)
  1. What makes San Juan and San Diego somewhat similar to one another? Likewise, Palm Springs, Phoenix, and Orlando are similar to one another in this way, but different from the first group. (Look at the shape: symmetry vs. asymmetry.)

Read 25–27Notice that we are now looking at quantitative data!

How should we describe the distribution of a quantitative variable? Use “SOCS”

Center- Typical value, such as the mean or the median

Spread- Range for now (we'll also use standard deviation and interquartile range "IQR")

Outliers- Unusual values for now (we'll eventually use the "1.5IQR Rule")

Shape- Address the graph's # of peaks and its symmetry

(unimodal, bimodal, multimodal, uniform, symmetric, asymmetric, skewed left, skewed right)

Read 27–29 Examples and descriptions of various shapes of distributions:

Unimodal Symmetric

CurveDotplotHistogram

Heights on adult womenExpected sums on 36 rolls

of two 6-sided diceLength of growing

seasons in St. Louis

Bimodal

CurveDotplotHistogram

Heights of men and womenMaximum angle of a

Observed sums on 35 rolls sample of roller coasters

of a 4-sided die and an 8-sided die

Unimodal Skewed Left

CurveDotplotHistogram

Heights of kids at a

middle school danceTime to finish a difficult testHeights in my extended family

Unimodal Skewed Right

CurveDotplotHistogram

Salaries of MLB players Selling prices of homes

ina new subdivision Scores on a multiple choice pre-test over completely new material

Uniform

CurveDotplotHistogram

Expected outcomes of spins of a

spinner with equally-sized spaces Outcomes of 36 rolls Ages of students

numbered 1-10of a 6-sided diein a school district

Here are the number of calories per item for 16 convenience store sandwiches, along with a dotplot of the data.

360430440440440450450460

470480480490490490500510

Describe the shape, center, and spread of the distribution. Are there any outliers?

Read 29–30

When asked to compare two distributions, be sure that you compare and don’t just describe!

Be sure that you use “less”, “more”, and “-er” words.

How does the annual energy consumption (kWh/year) compare for top-loading washing machines and front-loading washers? The data below is from the Home Depot website. There are 26 front-loaders and 32 top-loaders included.

Read 31–32

Caution! Rememberto include a key when making a stemplot (stem-and-leaf-plot).

If you write "19|7", is that 197, 19.7, 1970, ...?

How do gas prices in St. Charles County compare to those in Madison County, where Alton, Illinois is located? A sample of gas prices was taken on several days in July2015. Make a back-to-back stemplot and compare the distributions.

St. Charles Co.: 2.56, 2.56, 2.57, 2.57, 2.58, 2.58, 2.58, 2.58, 2.59, 2.59, 2.59, 2.59, 2.60, 2.60, 2.61

Madison Co.: 2.67, 2.68, 2.69, 2.69, 2.70, 2.70, 2.70, 2.71, 2.71, 2.71, 2.71, 2.72, 2.72, 2.73, 2.74

HW #13: page 41 (37, 39, 43, 45, 47)

1.2 Histograms

The following table presents the total number of triples (3B) for the 30 MLB teams in the 2014 regular season. Make a dotplot to display the distribution of triples for the season. Then, use your dotplot to make a histogram of the distribution.

Team / 3B / Team / 3B / Team / 3B
Arizona / 47 / Pittsburgh / 30 / Toronto / 24
San Francisco / 42 / San Diego / 30 / Tampa Bay / 24
Colorado / 41 / Kansas City / 29 / Cleveland / 23
LA Dodgers / 38 / Milwaukee / 28 / Atlanta / 22
Miami / 36 / Texas / 28 / St. Louis / 21
Oakland / 33 / Minnesota / 27 / Boston / 20
Chicago Sox / 32 / Washington / 27 / Cincinnati / 20
Seattle / 32 / Philadelphia / 27 / Houston / 19
LA Angels / 31 / Detroit / 26 / NY Mets / 19
Chicago Cubs / 31 / NY Yankees / 26 / Baltimore / 16

Read 33–36

When you make a histogram...

...you can turn a dotplot into a histogram.

... be consistent with "fence sitters".

... be consistent with spacing and bin width.

Read 38–41

When might we want a relative frequency histogram rather than a frequency histogram?

…to see part-whole relationships or to compare 2 groups

HW #14: page 45 (51, 53, 55, 59–62)

1.3 Describing Quantitative Data with Numbers

Read 48–50

isis a statistic; "x bar" is the sample mean. is a parameter; "mu" is the population mean.

When adding a very large or very small data value to a data set (or changing a data value to something very large or very small) does not change the value of a statistic very much, or at all, we say that the statistic is resistant.

The mean is not a resistant measure of center. Adding an extreme value, or altering a value to make it extreme, will change the value of the mean quite a bit. Think about what happens to the average age of people in the classroom when Mr. Willott walks in.

The mean is the balancing point.

Approximately where will the mean be located, when looking at a histogram or dotplot?

Read 51–53

The median is a resistant measure of center. Adding an extreme value, or altering a value to make it extreme, will not change the value of the median much, if at all. Think about what happens to the median age of people in the classroom when Mr. Willott walks in.

If we know the shape of a distribution, as shown below, then where are the mean and the median located in relation to one another?

roughly symmetricexactly symmetricskewed

Read 53–55

The range = highest data value minus lowest data value. The range is a single number and it is not a resistant measure of spread. An extreme value will affect the value of the range. Think about what happens to the range of ages of people in the classroom when Mr. Willott walks in.

The median divides an ordered list of data into two equal groups.

The quartiles divide an ordered list of data into four equal groups.

The interquartile range (IQR) is the spread of the middle 50% of the data. The IQR is a resistant measure of spread. Think about what happens to the range of the middle 50% of ages of people in the classroom when Mr. Willott walks in.

Item / Fat (g)
Crunchy Taco / 10
Nachos Supreme / 24
Cheese Quesadilla / 26
Chicken Quesadilla / 27
Mexican Pizza / 31
Taco Salad (steak) / 37
Nachos BellGrande / 39
XXL Grilled Stuft Burrito – Beef / 41
Taco Salad (original) / 42

Here are data on the amount of fat (in grams) in 9 different Taco Bell menu items. Calculate the median, quartiles, and IQR.

Read 57–58

What is the 1.5 IQR Rule for identifying outliers?

Illustration by

Kelly Boles

How many fat grams would qualify as an outlier for the Taco Bell items?

Are there outliers among the 9 taco bell items?

Here are data for the calories for 12McDonald’s menu items. Are there any outliers?

Sandwich / Calorie
32 oz. Chocolate Shake / 1160
Big Breakfast® / 740
Big Mac® / 540
Sausage Biscuit with Egg / 510
McRib® / 500
10 pc. McNuggets® / 460
Double Cheeseburger / 440
Quarter Pounder® / 410
Filet-O-Fish® / 380
McChicken® / 360
Large Caramel Latte / 330
Large Vanilla Iced Coffee / 270

Read 56–58

The five-number summary: Minimum, Q1, Median, Q3, Maximum

A boxplot is a graph that is related to the five-number summary.

Item / Fat (g)
Crunchy Taco / 10
Nachos Supreme / 24
Cheese Quesadilla / 26
Chicken Quesadilla / 27
Mexican Pizza / 31
Taco Salad (steak) / 37
Nachos BellGrande / 39
XXL Grilled Stuft Burrito – Beef / 41
Taco Salad (original) / 42

Draw a boxplot for the Taco Bell data. Check yours against the one that the graphing calculator makes.

Here are parallel boxplots for the heights of baseball players for 5 of the 2005 MLB teams. Compare these distributions.

HW #15: page 47 (65, 69–74), page 69 (79, 81, 83, 85, 86, 87, 89, 91, 93)

1.3 Standard Deviation

Arnoldraneachafternoonfor 5 days. His distances (in miles) were 10,10, 10, 10, and 10.

Findthemean(or average)numberof miles thatArnold raneachday.______

Completethetable:

Table for Arnold's distances
Distances / Difference from the mean / Square of difference from the mean
10
10
10
10
10
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:

That last valueis the standarddeviationfor the distancesArnold ran.What are the units? ______

Thenumberabove it is thevariancefor the distances.What arethe units? ______

Beckyraneachafternoonfor 5 days. Her distances (in miles) were 8, 9, 10, 11, and 12.

Findthemean(or average)numberof miles thatBecky raneachday.______

Completethetable:

Table for Becky's distances
Distances / Difference from the mean / Square of difference from the mean
8
9
10
11
12
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:

That last valueis the standarddeviationfor the distances Becky ran.What are the units? ______

Thenumberabove it is thevariancefor the distances.What arethe units? ______

Calebraneachafternoonfor 5 days. His distances (in miles) were 7, 9, 10, 11, and 13.

Findthemean(or average)numberof miles thatCalebraneachday.______

Completethetable:

Table for Caleb's distances
Distances / Difference from the mean / Square of difference from the mean
7
9
10
11
13
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:

That last valueis the standarddeviationfor the distances Caleb ran.What are the units? ______

Thenumberabove it is thevariancefor the distances.What arethe units? ______

Donnaraneachafternoonfor 5 days. Her distances (in miles) were 3, 3, 4, 5, and 35.

Findthemean(or average)numberof miles thatDonnaraneachday.______

Completethetable:

Table for Donna's distances
Distances / Difference from the mean / Square of difference from the mean
3
3
4
5
35
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:

That last valueis the standarddeviationfor the distances Donna ran. What are the units? ______

Thenumberabove it is thevariancefor the distances.What arethe units? ______

The standard deviation measures the typical distance thedata are from the mean.

The range, IQR, and standard deviation all measure variation or spread, but only the IQR is resistant.

Standard deviation / Variance
Square root of variance / Square of standard deviation
s= sample standard deviation / = sample variance
= population standard deviation / = population variance

Read 60–62

If s =4, then =16. If =9, then s=3. If =25, then =5. If =6, then =36.

Fourimportant properties of the standard deviation:

Standard deviation 0. (0 means no variation, a large number means lots of variation.)

Standard deviation units are the same as the units for the data.

Standard deviation is not resistant.

Standard deviation measures spread around the mean.

s=5s=6.22s=9.52s=10.7

A random sample of 5 students was asked how many minutes they spent listening to musicoutside school hours the previous day. They responded: 20, 30, 60, 90, 120. Calculate and interpret the standard deviation.

Read 63–66

Of mean, median, IQR, and standard deviation, whichsummary statistics will we typically use for each situation?

Symmetric / Skewed

Center
Spread

HW #16: page 71 (95, 97, 99, 101–105, 107–110)

FRAPPY! page 74

HW #17:page 76Chapter Review Exercises

Review Chapter 1

HW #18: page 78Chapter 1 AP Statistics Practice Test

Chapter 1 Test

1