Algebra 1 Mrs. Bondi
Unit 3 Notes: Data Analysis and Probability
Unit 3:
Data Analysis
and Probability
Lesson Topics:
Lesson 1: Types of Graphs (not in PH text)
Lesson 2: Frequency and Histograms (PH text 12.2)
Lesson 3: Measures of Central Tendency and Dispersion (PH text 12.3)
Lesson 4: Stem and Leaf Plots (PH text p.800)
Lesson 5: Box-and-Whisker Plots (PH text 12.4)
Lesson 6: Scatterplots/Scattergrams (PH text 5.7)
Lesson 7: Theoretical and Experimental Probability (PH text 12.7)
Lesson 8: Probability of Compound Events (PH text 12.8)
Lesson 9: Permutations and Combinations (PH text 12.6) - optional
Lesson 1: Types of Graphs (not in PH text)
Objective: To identify, interpret and choose between different types of graphs
Types of Graphs:
Histogram – a special type of bar graph used to show frequencies – there is one bar for each interval – there are no gaps between bars, and all bars are of equal width
– most appropriate to visually compare frequencies at which specific data items (or groups) occur
Stem and Leaf Plot – most appropriate to display the individual data items in an ordered and concise manner
Box-and-Whisker Plot – used to show the general layout of a set of data, where most of the numbers fall
– most appropriate to display the median, lower and upper quartiles, and least and greatest values, AND/OR to compare these aspects of multiple sets of data
Scatterplot/Scattergram – a display of unconnected points that show the relationship between two sets of data
– most appropriate to display the correlation (relationship) between two sets of data
Bar Graph – a graph that is created using bars to fill the space from the axis to the data point – most often, one axis has the quantity while the other axis has the categories being compared
– most appropriate to compare numbers or amounts of items
Line Graph – a graph that is created using line segments to connect data points to show the amount and direction of a change over a period of time
– most appropriate to show how a set of data changes over time
Circle (or Pie) Graph – used to represent data as parts of a whole – entire circle represents the whole, or 100%, of the data – each wedge/sector represents a part of the data – the central angles must be proportionate to the percent of the whole it represents (a whole circle contains 360o)
– most appropriate to visually represent the parts of a set of data to the whole for comparison (percents)
Types of Data:
Continuous Data -
Best choice for a graph of something that continues without breaks is ______
Discrete Data -
Best choice for a graph of something that has a specific number of data points, or a count of something is ______
Pie charts are good for showing a comparison of percentages using discrete data.
Lesson 2: Frequency and Histograms (PH text 12.2)
Objective: to make and interpret frequency tables and histograms
Frequency Distribution/Table - A method of organizing a set of data into intervals to show how often an item in each interval occurs
Frequency – the number of data values in that interval
Make a frequency distribution from the data below.
Hours worked: 5, 6, 6, 5, 7, 8, 6, 5, 7, 5, 5, 6, 6, 8, 5, 6, 8, 6
hours worked tally frequency
5
6
7
8
Frequency Table with Intervals -
A frequency table that has the data grouped in equal intervals
Make a frequency table and histogram with intervals for the data below.
Ages of Company Presidents
45 58 60 62 56 58 55 48 39 50 65
48 50 42 60 38 55 47 39 35 44 74
Frequency Table
Histogram – a graph that displays data from a frequency table – has one bar for each interval – there are no gaps between bars, and all bars are of equal width
HW: p.723 #8-10 (make BOTH a frequency table and a histogram), 14-17, 22-31, 34-35
Lesson 3: Measures of Central Tendency and Dispersion (PH text 12.3)
Objective: to find mean, median, mode and range of a set of numbers
Measures of Central Tendency:
Mean – the "average" you're used to, where you add up all the numbers and then divide by the number of numbers. – or, the sum of n numbers divided by n -- the symbol for mean is
Median – the "middle" value in the list of numbers - To find the median, your numbers have to be listed in numerical order, so you may have to rewrite your list first.
Mode – the value that occurs most often (If no number is repeated, then there is no mode for the list.)
Range – the difference between the largest and smallest values – this is a measure of dispersion, NOT a measure of central tendency, but frequently useful information when describing a set of data
Example: Find the mean, median, mode and range for this set of data. These are the daily salaries of seven people who work for the same company.
$120 $98 $134 $458 $122 $128 $125 Mean: ______
Median: ______
Mode: ______
Range: ______
Outlier – a data value that is significantly higher or lower than the other values in the set of numbers
Look at the set of data above. Would you consider any of those values to be an outlier?
Choosing a measure of central tendency:
Sometimes, one (or more than one) measure of central tendency is a better descriptor of a set of data.
Mean – good when there are no outliers, or extreme values
Median – best choice when there are outliers, or extreme values
Mode – only use for non-numeric data, or when choosing the most popular item
Range – never a good measure of central tendency, but the range does show how closely grouped the set of data is
Example: Use the data above. Which measure of central tendency best describes the data? Why?
Measures of Central Tendency Practice:
Directions: Find the measures of central tendency. Which best describes the data? Explain why.
1) The number of students in NP elementary schools:
597 581 555 384 463 620 407 656 426 407 547 566 307
Mean: ______
Median: ______
Mode: ______
Best measure(s) of central tendency:
Explanation:
2) The current prices per share (in $) of twelve stocks:
59 97 53 83 45 47 88 47 51 47 62 47
Mean: ______
Median: ______
Mode: ______
Best measure(s) of central tendency:
Explanation:
3) The prices (in $) of Edmund’s video games:
60 57 84 15 59 63 60 67 59 75 72
Mean: ______
Median: ______
Mode: ______
Best measure(s) of central tendency:
Explanation:
Finding a data value when you know the mean:
Sometimes you will be given the mean of a set of data along with some of the individual values, and be asked to find the remaining data value. In this case, you write an equation using a variable to represent the missing value. Then solve the equation to find the value.
Example:
Kayla has sales of $1280, $1125, $965, and $1210 the first four days of the week. How much does she need to sell on the fifth day to average $1150 for the week?
The swim team wants an average of 40 laps for the day’s training session. If the other swimmers do 35, 26, 47, 40, 45, 50, 31, and 46 laps, how many laps does Char need to do to ensure the average of 40 laps is met?
Practice:
HW: p.730 #5, 6-18 even, 28-29
Lesson 4: Stem and Leaf Plots (PH text p.800)
Objective: to read, use and create stem and leaf plots
Stem and Leaf Plot -
A method of displaying data so that the frequency is easily seen, yet the value of each number is maintained
leaf – the last digit on the right of a given number
stem – the digit(s) remaining when the leaf digit is dropped
Stem-and-Leaf Plots (from PurpleMath.com)
Stem-and-leaf plots are a method for showing the frequency with which certain classes of values occur. You could make a frequency distribution table or a histogram for the values, or you can use a stem-and-leaf plot and let the numbers themselves to show pretty much the same information.
For instance, suppose you have the following list of values: 12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41. You could make a frequency distribution table showing how many tens, twenties, thirties, and forties you have:
FrequencyClass / Frequency
10 - 19 / 2
20 - 29 / 2
30 - 39 / 4
40 - 49 / 3
You could make a histogram, which is a bar-graph showing the number of occurrences, with the classes being numbers in the tens, twenties, thirties, and forties:
(The shading of the bars in a histogram isn't necessary, but it can be helpful by making the bars easier to see, especially if you can't use color to differentiate the bars.)
The downside of frequency distribution tables and histograms is that, while the frequency of each class is easy to see, the original data points have been lost. You can tell, for instance, that there must have been three listed values that were in the forties, but there is no way to tell from the table or from the histogram what those values might have been.
On the other hand, you could make a stem-and-leaf plot for the same data:
The "stem" is the left-hand column which contains the tens digits. The "leaves" are the lists in the right-hand column, showing all the ones digits for each of the tens, twenties, thirties, and forties. As you can see, the original values can still be determined; you can tell, from that bottom leaf, that the three values in the forties were 40, 40, and 41.
Note that the horizontal leaves in the stem-and-leaf plot correspond to the vertical bars in the histogram, and the leaves have lengths that equal the numbers in the frequency table.
Make a stem and leaf plot for the data below.
Ages of Company Presidents
45 58 60 62 56 58 55 48 39 50 65
48 50 42 60 38 55 47 39 35 44 74
Find the mean, median, mode and range for this set of data. Does the stem-and-leaf plot make this process any easier? Why?
Mean:
Median:
Mode:
Range:
Look at www.purplemath.com stem-and-leaf examples.
HW: Stem and Leaf Practice page; p.731 # 31; p.800 #1-5
Less
Lesson 5: Box-and-Whisker Plots (PH text 12.4)
Objective: To make and interpret box-and-whisker plots and to find and interpret quartiles
A box-and-whisker plot is used to show the general layout of a set of data – where most of the numbers fall. It shows the median of the whole set, the median of both halves, and the highest and lowest numbers in the data set.
Quartile: values that divide a set of data into four equal parts.
Data is divided into four parts, or quartiles. The median (Q2) of the entire set of numbers is the center of the “box.” The numbers less than the median are then divided into two sections by finding the median of those numbers. That new median is called the first (or lower) quartile (Q1), and is the marker for the left side of the box. Finding the median of the values greater than the original median gives you the third (or upper) quartile value (Q3); this is the marker for the right side of the box. The lowest value is the end of the left whisker, and the highest value is the end of the right whisker.
Example:
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100
↑ ↑ ↑
First Median Third
Quartile (Q2) Quartile
(Q1) (Q3)
To graph this data, we create a number line including at least all of the values in the set of data. The box-and-whisker plot is placed on or near the number line as indicated above.
When multiple sets of data are plotted together, a box-and-whisker plot t is an easy way to make a quick visual comparison of sets of data.
Box-and-Whisker Plot Practice
Use this box-and-whisker plot to answer #1-4.
1. What is the age of the oldest dog(s) in the show? ______
2. What is the median age of the dogs? ______
3. What number is in the lower quartile? ______
4. About what fraction of the dogs are 5 years old or older? ______
Use this box-and-whisker plot to answer #5-8.
5. What is the median of all the scores? ______
6. What number is in the lower quartile? ______
7. What part of the data is in the box (between 70 and 80)?
a. The top one-fourth of the scores
b. The middle half of the scores
c. The lower half of the scores
d. The lowest fourth of the scores
8. Which statement best describes the scores?
a. They are spread evenly from 65 to 95.
b. There are more scores in the upper fourth than in the lower fourth of the scores.
c. The scores are bunched closer together in the lowest ¼ of the scores than in the highest ¼.
d. The mean of the scores is 70.
Box-and-Whisker Plot Practice (cont.)
9. This box-and-whisker plot shows the amount of money raised by 11 volunteers in a charity walk.