Name ______
Chapter 1 Learning Objectives / Section / Related Exampleon Page(s) / Relevant
Chapter Review Exercise(s) / Can I do this?
Identify the individuals and variables in a set of data. / Intro / 3 / R1.1
Classify variables as categorical or quantitative. / Intro / 3 / R1.1
Display categorical data with a bar graph. Decide whether it would be appropriate to make a pie chart. / 1.1 / 9 / R1.2, R1.3
Identify what makes some graphs of categorical data deceptive. / 1.1 / 10 / R1.3
Calculate and display the marginal distribution of a categorical variable from a two-way table. / 1.1 / 13 / R1.4
Calculate and display the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table. / 1.1 / 15 / R1.4
Describe the association between two categorical variables by comparing appropriate conditional distributions. / 1.1 / 17 / R1.5
Make and interpret dotplots and stemplots of quantitative data. / 1.2 / Dotplots: 25
Stemplots: 31 / R1.6
Describe the overall pattern (shape, center, and spread) of a distribution and identify any major departures from the pattern (outliers). / 1.2 / Dotplots: 26 / R1.6, R1.9
Identify the shape of a distribution from a graph as roughly symmetric or skewed. / 1.2 / 28 / R1.6, R1.7, R1.8, R1.9
Make and interpret histograms of quantitative data. / 1.2 / 33 / R1.7, R1.8
Compare distributions of quantitative data using dotplots, stemplots, or histograms. / 1.2 / 30 / R1.8, R1.10
Calculate measures of center (mean, median). / 1.3 / Mean: 49
Median: 52 / R1.6
Calculate and interpret measures of spread (range, IQR, standard deviation). / 1.3 / IQR: 55
Std. dev: 60 / R1.9
Choose the most appropriate measure of center and spread in a given setting. / 1.3 / 65 / R1.7
Identify outliers using the 1.5 × IQR rule. / 1.3 / 56 / R1.6, R1.7, R1.9
Make and interpret boxplots of quantitative data. / 1.3 / 57 / R1.7
Use appropriate graphs and numerical summaries to compare distributions of quantitative variables. / 1.3 / 65 / R1.8, R1.10
1.1 Analyzing Categorical Data
Read 2–4
Fr/Soph/Jr/Srg.p.a
Email address
Name
Bus route
Phone number
Days absent
Address
Credits earned
Allergies
Current on immunizations
Exterior colormileage
Total car length
Number of cylinders
Cost
Model
VIN
Type of sound system
Size of fuel tank
What do we call these two kinds of variables? What’s the difference?
Why do people sometimes confuse the two kinds of variables?
What is a distribution? It’s all the values that a variable can take on and how often.
Alternate Example: Willott’s music
Here is information about 12 randomly selected songs in Willott’s music library.
Song Title / Artist / Album year / Track Length / Genre / Tracks on the album / Track NumberDouble Dare / Bauhaus / 1980 / 4:54 / Gothic / 9 / 1
Carpe Noctum / Tiesto / 2007 / 7:03 / Dance/Electronic / 12 / 4
She Wolf / Shakira / 2009 / 3:10 / Latin / 12 / 1
Come as You Are / Nirvana / 1991 / 3:39 / Alternative / 12 / 3
The Heinrich Maneuver / Interpol / 2007 / 3:35 / Alternative / 11 / 4
Shake It Out / Florence + The Machine / 2011 / 4:38 / Alternative / 12 / 2
My Songs Know What You Did in the Dark (Light Em Up) / Fall Out Boy / 2013 / 3:07 / Alternative / 11 / 2
Locked Out of Heaven / Bruno Mars / 2012 / 3:53 / Pop / 10 / 2
Womanizer / Britney Spears / 2008 / 3:44 / Pop / 13 / 1
Iceolate / Front Line Assembly / 1990 / 5:13 / Industrial / 10 / 7
I Bet You Look Good On The Dancefloor / Arctic Monkeys / 2006 / 2:54 / Indie / 13 / 2
Meat is Murder / The Smiths / 1985 / 6:06 / Alternative / 9 / 9
(a)Who are the individuals in this data set?
(b)What variables are measured? Identify each as categorical or quantitative. In what units were the quantitative variables measured?
(c)Describe the individual in the first row.
Read 7–11
What's the difference between a data table, a frequency table, and a relative frequency table?
Data table / Frequency table / Relative frequency tabletells values of variables for individuals / tells distribution of 1 variable in table form / tells distribution of 1 variable as a %, decimal, or fraction
Which one was the previous example?
When making pie charts and bar graphs, what do people often mess up?
Bar Graphs / Pie ChartsPros / Quick & easy / Show part-whole relationships well
Cons / part-wholerelationships arehard to see / They’re hard to make by hand.
Don't use whenpercents don't add up to 100%.
Let's search "misleading graph" and see some examples.
Identify some particular problems many of these graphs share.
HW #11: page 7 (1, 3, 5, 7, 8), page 22 (11, 13, 15, 17, 18)
Read 12–18
Examples of:
…two-way table (2 variables are shown with counts or frequencies)
Senior / Non-seniorBoy / 8 / 3
Girl / 15 / 4
…marginal distribution (totals for rows & columns; the distribution for each variable)
Senior / Non-senior / TotalsBoy / 8 / 3 / 11
Girl / 15 / 4 / 19
Totals / 23 / 7 / 30
…conditional distribution (distribution of one variable as a % of the other variable)
Senior / Non-seniorBoy / 35% / 43%
Girl / 65% / 57%
Totals / 100% / 100%
Senior / Non-senior / Totals
Boy / 73% / 27% / 100%
Girl / 79% / 21% / 100%
How do we know which variable to condition on? Divide by the explanatory variable totals.
Died / SurvivedHospital A
Hospital B
What is a segmented (or stacked) bar graph?
Use a segmented bar graph to compare conditional distributions, to look for differences, and to look for patterns.
When knowing the value of one variable helps predict the value of the other, we say that the variables are associated. Association appears in a segmented bar graph when we see big differences in the proportions. The proportions may be “flipped” or reversed.
Careful! An association does NOT
automatically mean that there is a
cause-and-effect relationship.
The boy/girl senior/non-senior graphs
did not show much association.
Alternate Example: Horseshoe Crabs
Two members of the University of Florida at Gainesville Department of Zoology collected data on Horseshoe Crabs on a Delaware beach during 4 days in the late spring of 1992. Based on the color of the shells, they classified each crab as Young, Intermediate, or Old and whether the crabs could right themselves when flipped on their backs or whether they were stranded for at least a certain period of time. Here are the results.
Young / Intermediate / Old / TotalStranded / 214 / 384 / 295 / 893
Not Stranded / 1668 / 1204 / 216 / 3088
Total / 1882 / 1588 / 511 / 3981
(a) Explain what it would mean if there was no association between age andstrandedness.
(b) Does there appear to be an association between age and strandedness in this sample? Justify.
HW #12:page 22(19, 21, 23, 25, 27–34)
And now, we change from categorical data to quantitative data…
1.2 Displaying Quantitative Data with Graphs
Elmer and Ethel have retired and want to move someplace warm. The couple is considering nine different cities. The dotplots below show the distribution of average daily high temperatures in December, January, and February for each of these cities. Help them pick a city by answering the questions below, based on the data shown in the graph.
- What is the typical high temperature for these months inPhoenix, Orlando, and San Juan? Which of those 3 cities is most similar in this respect to Palm Springs? (Look for the center: the average, median, or typical value.)
- Are daily high temperatures for these months more predictable in Palm Springs or in Orlando? (Look at the spread: the variation, including the range.)
- What might be unique to Atlanta, San Diego, and Honolulu? (Look for outliers: unusual values.)
- What makes San Juan and San Diego somewhat similar to one another? Likewise, Palm Springs, Phoenix, and Orlando are similar to one another in this way, but different from the first group. (Look at the shape: symmetry vs. asymmetry.)
Read 25–27Notice that we are now looking at quantitative data!
How should we describe the distribution of a quantitative variable? Use “SOCS”
Center- Typical value, such as the mean or the median
Spread- Range for now (we'll also use standard deviation and interquartile range "IQR")
Outliers- Unusual values for now (we'll eventually use the "1.5IQR Rule")
Shape- Address the graph's # of peaks and its symmetry
(unimodal, bimodal, multimodal, uniform, symmetric, asymmetric, skewed left, skewed right)
Read 27–29 Examples and descriptions of various shapes of distributions:
Unimodal Symmetric
CurveDotplotHistogram
Heights on adult womenExpected sums on 36 rolls
of two 6-sided diceLength of growing
seasons in St. Louis
Bimodal
CurveDotplotHistogram
Heights of men and womenMaximum angle of a
Observed sums on 35 rolls sample of roller coasters
of a 4-sided die and an 8-sided die
Unimodal Skewed Left
CurveDotplotHistogram
Heights of kids at a
middle school danceTime to finish a difficult testHeights in my extended family
Unimodal Skewed Right
CurveDotplotHistogram
Salaries of MLB players Selling prices of homes
ina new subdivision Scores on a multiple choice pre-test over completely new material
Uniform
CurveDotplotHistogram
Expected outcomes of spins of a
spinner with equally-sized spaces Outcomes of 36 rolls Ages of students
numbered 1-10of a 6-sided diein a school district
Here are the number of calories per item for 16 convenience store sandwiches, along with a dotplot of the data.
360430440440440450450460
470480480490490490500510
Describe the shape, center, and spread of the distribution. Are there any outliers?
Read 29–30
When asked to compare two distributions, be sure that you compare and don’t just describe!
Be sure that you use “less”, “more”, and “-er” words.
How does the annual energy consumption (kWh/year) compare for top-loading washing machines and front-loading washers? The data below is from the Home Depot website. There are 26 front-loaders and 32 top-loaders included.
Read 31–32
Caution! Rememberto include a key when making a stemplot (stem-and-leaf-plot).
If you write "19|7", is that 197, 19.7, 1970, ...?
How do gas prices in St. Charles County compare to those in Madison County, where Alton, Illinois is located? A sample of gas prices was taken on several days in July2015. Make a back-to-back stemplot and compare the distributions.
St. Charles Co.: 2.56, 2.56, 2.57, 2.57, 2.58, 2.58, 2.58, 2.58, 2.59, 2.59, 2.59, 2.59, 2.60, 2.60, 2.61
Madison Co.: 2.67, 2.68, 2.69, 2.69, 2.70, 2.70, 2.70, 2.71, 2.71, 2.71, 2.71, 2.72, 2.72, 2.73, 2.74
HW #13: page 41 (37, 39, 43, 45, 47)
1.2 Histograms
The following table presents the total number of triples (3B) for the 30 MLB teams in the 2014 regular season. Make a dotplot to display the distribution of triples for the season. Then, use your dotplot to make a histogram of the distribution.
Team / 3B / Team / 3B / Team / 3BArizona / 47 / Pittsburgh / 30 / Toronto / 24
San Francisco / 42 / San Diego / 30 / Tampa Bay / 24
Colorado / 41 / Kansas City / 29 / Cleveland / 23
LA Dodgers / 38 / Milwaukee / 28 / Atlanta / 22
Miami / 36 / Texas / 28 / St. Louis / 21
Oakland / 33 / Minnesota / 27 / Boston / 20
Chicago Sox / 32 / Washington / 27 / Cincinnati / 20
Seattle / 32 / Philadelphia / 27 / Houston / 19
LA Angels / 31 / Detroit / 26 / NY Mets / 19
Chicago Cubs / 31 / NY Yankees / 26 / Baltimore / 16
Read 33–36
When you make a histogram...
...you can turn a dotplot into a histogram.
... be consistent with "fence sitters".
... be consistent with spacing and bin width.
Read 38–41
When might we want a relative frequency histogram rather than a frequency histogram?
…to see part-whole relationships or to compare 2 groups
HW #14: page 45 (51, 53, 55, 59–62)
1.3 Describing Quantitative Data with Numbers
Read 48–50
isis a statistic; "x bar" is the sample mean. is a parameter; "mu" is the population mean.
When adding a very large or very small data value to a data set (or changing a data value to something very large or very small) does not change the value of a statistic very much, or at all, we say that the statistic is resistant.
The mean is not a resistant measure of center. Adding an extreme value, or altering a value to make it extreme, will change the value of the mean quite a bit. Think about what happens to the average age of people in the classroom when Mr. Willott walks in.
The mean is the balancing point.
Approximately where will the mean be located, when looking at a histogram or dotplot?
Read 51–53
The median is a resistant measure of center. Adding an extreme value, or altering a value to make it extreme, will not change the value of the median much, if at all. Think about what happens to the median age of people in the classroom when Mr. Willott walks in.
If we know the shape of a distribution, as shown below, then where are the mean and the median located in relation to one another?
roughly symmetricexactly symmetricskewed
Read 53–55
The range = highest data value minus lowest data value. The range is a single number and it is not a resistant measure of spread. An extreme value will affect the value of the range. Think about what happens to the range of ages of people in the classroom when Mr. Willott walks in.
The median divides an ordered list of data into two equal groups.
The quartiles divide an ordered list of data into four equal groups.
The interquartile range (IQR) is the spread of the middle 50% of the data. The IQR is a resistant measure of spread. Think about what happens to the range of the middle 50% of ages of people in the classroom when Mr. Willott walks in.
Item / Fat (g)Crunchy Taco / 10
Nachos Supreme / 24
Cheese Quesadilla / 26
Chicken Quesadilla / 27
Mexican Pizza / 31
Taco Salad (steak) / 37
Nachos BellGrande / 39
XXL Grilled Stuft Burrito – Beef / 41
Taco Salad (original) / 42
Here are data on the amount of fat (in grams) in 9 different Taco Bell menu items. Calculate the median, quartiles, and IQR.
Read 57–58
What is the 1.5 IQR Rule for identifying outliers?
Illustration by
Kelly Boles
How many fat grams would qualify as an outlier for the Taco Bell items?
Are there outliers among the 9 taco bell items?
Here are data for the calories for 12McDonald’s menu items. Are there any outliers?
Sandwich / Calorie32 oz. Chocolate Shake / 1160
Big Breakfast® / 740
Big Mac® / 540
Sausage Biscuit with Egg / 510
McRib® / 500
10 pc. McNuggets® / 460
Double Cheeseburger / 440
Quarter Pounder® / 410
Filet-O-Fish® / 380
McChicken® / 360
Large Caramel Latte / 330
Large Vanilla Iced Coffee / 270
Read 56–58
The five-number summary: Minimum, Q1, Median, Q3, Maximum
A boxplot is a graph that is related to the five-number summary.
Item / Fat (g)Crunchy Taco / 10
Nachos Supreme / 24
Cheese Quesadilla / 26
Chicken Quesadilla / 27
Mexican Pizza / 31
Taco Salad (steak) / 37
Nachos BellGrande / 39
XXL Grilled Stuft Burrito – Beef / 41
Taco Salad (original) / 42
Draw a boxplot for the Taco Bell data. Check yours against the one that the graphing calculator makes.
Here are parallel boxplots for the heights of baseball players for 5 of the 2005 MLB teams. Compare these distributions.
HW #15: page 47 (65, 69–74), page 69 (79, 81, 83, 85, 86, 87, 89, 91, 93)
1.3 Standard Deviation
Arnoldraneachafternoonfor 5 days. His distances (in miles) were 10,10, 10, 10, and 10.
Findthemean(or average)numberof miles thatArnold raneachday.______
Completethetable:
Table for Arnold's distancesDistances / Difference from the mean / Square of difference from the mean
10
10
10
10
10
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:
That last valueis the standarddeviationfor the distancesArnold ran.What are the units? ______
Thenumberabove it is thevariancefor the distances.What arethe units? ______
Beckyraneachafternoonfor 5 days. Her distances (in miles) were 8, 9, 10, 11, and 12.
Findthemean(or average)numberof miles thatBecky raneachday.______
Completethetable:
Table for Becky's distancesDistances / Difference from the mean / Square of difference from the mean
8
9
10
11
12
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:
That last valueis the standarddeviationfor the distances Becky ran.What are the units? ______
Thenumberabove it is thevariancefor the distances.What arethe units? ______
Calebraneachafternoonfor 5 days. His distances (in miles) were 7, 9, 10, 11, and 13.
Findthemean(or average)numberof miles thatCalebraneachday.______
Completethetable:
Table for Caleb's distancesDistances / Difference from the mean / Square of difference from the mean
7
9
10
11
13
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:
That last valueis the standarddeviationfor the distances Caleb ran.What are the units? ______
Thenumberabove it is thevariancefor the distances.What arethe units? ______
Donnaraneachafternoonfor 5 days. Her distances (in miles) were 3, 3, 4, 5, and 35.
Findthemean(or average)numberof miles thatDonnaraneachday.______
Completethetable:
Table for Donna's distancesDistances / Difference from the mean / Square of difference from the mean
3
3
4
5
35
Sum of squared differences:
Sum of squared differences divided by 4 (since there were 5 distances):
Square root of the sum of squared differences divided by 4:
That last valueis the standarddeviationfor the distances Donna ran. What are the units? ______
Thenumberabove it is thevariancefor the distances.What arethe units? ______
The standard deviation measures the typical distance thedata are from the mean.
The range, IQR, and standard deviation all measure variation or spread, but only the IQR is resistant.
Standard deviation / VarianceSquare root of variance / Square of standard deviation
s= sample standard deviation / = sample variance
= population standard deviation / = population variance
Read 60–62
If s =4, then =16. If =9, then s=3. If =25, then =5. If =6, then =36.
Fourimportant properties of the standard deviation:
Standard deviation 0. (0 means no variation, a large number means lots of variation.)
Standard deviation units are the same as the units for the data.
Standard deviation is not resistant.
Standard deviation measures spread around the mean.
s=5s=6.22s=9.52s=10.7
A random sample of 5 students was asked how many minutes they spent listening to musicoutside school hours the previous day. They responded: 20, 30, 60, 90, 120. Calculate and interpret the standard deviation.
Read 63–66
Of mean, median, IQR, and standard deviation, whichsummary statistics will we typically use for each situation?
Symmetric / SkewedCenter
Spread
HW #16: page 71 (95, 97, 99, 101–105, 107–110)
FRAPPY! page 74
HW #17:page 76Chapter Review Exercises
Review Chapter 1
HW #18: page 78Chapter 1 AP Statistics Practice Test
Chapter 1 Test
1