MEASURES OF CENTER AND VARIABLILITY WITH BOX PLOTS
INTRODUCTION
The objective for this lesson on Measures of Center and Variability with Box Plots is, the student will use box plots and data sets to determine measures of center and variability.
The skills students should have in order to help them in this lesson include, dot plots, measure of center or the median, measure of variability and box plots.
We will have three essential questions that will be guiding our lesson. Number one, why is it important to know the median of the data when creating a box plot? Number two, what values must be plotted in a box plot? And number three, when is it best to use a box plot?
We will begin by completing the warm-up mean and median of a set of values to prepare for measures of center and variability with box plots in this lesson.
SOLVE PROBLEM – INTRODUCTION
We are going to begin by completing the following SOLVE problem. A local movie theater has shown the same movie for the past nineteen days. The data below represents the theater’s daily attendance for that movie. What is the median of the data set that represents the number of people who attended the movie? The data for attendance is listed below the question.
(Fifteen, twenty six, forty eight, sixty, ten, twenty two, fifty nine, sixty, thirteen, twenty four, fifty six, sixty four, fourteen, thirty five, fifty six, six five, thirty seven, forty one, fifty seven).
In Step S, we will Study the Problem. First we need to identify where the question is located within the problem and underline the question. The question for this problem is, what is the median of the data set that represents the number of people who attended the movie?
Now that we have identified where the question is located within the problem, we will put this question in our own words in the form of a statement. This problem is asking me to find the median of the daily attendance at the movie.
In Step O, we will Organize the Facts. First we need to identify the facts. A local movie theater has shown the same movie for the past nineteen days/fact. The data below represents the theater’s attendance for that movie/fact. What is the median of the data set that represents the number of people who attended the movie?
Now that we have identified the facts, we are ready to eliminate the unnecessary facts. These are the facts that will not help us to find the median of the daily attendance at the movies. A local movie theater has shown the same movie for the past nineteen days. In order to find the median of the data set we need to know how many days we have data for, so it’s important that we know that this is for the past nineteen days. We will keep this fact. The data below represents the theater’s daily attendance for that movie. We need to have the data in order to find the median for the data set. So we will keep this fact as well. There are not any facts that we can eliminate in this problem.
We are now ready to list the necessary facts. These facts are, that the movie theater has shown the movie for the past nineteen days. And we also need to list the data of attendance for each of these nineteen days. This information is listed below the question. (Fifteen, twenty six, forty eight, sixty, ten, twenty two, fifty nine, sixty, thirteen, twenty four, fifty six, sixty four, fourteen, thirty five, fifty six, sixty five, thirty seven, forty one, fifty seven).
Now in Step L, we will Line Up a Plan. First we need to write in words what your plan of action will be. We can list the data points in order from least to greatest. And then determine the middle value in the data set.
What operation or operations will we use in our plan? Since we are listing the data in order from least to greatest there is not a specific operation or operations that will be used in the plan.
In Step V, we Verify Your Plan with Action. First we will estimate your answer. Looking at the data we can see that the data ranges from ten to sixty five. Let’s say that the estimate is going to be about forty for this data.
Now let’s carry out your plan. In our plan we said that we would list the data points in order from least to greatest. Let’s do so now. You can see the data points listed from least to greatest here. We then need to determine the middle value in the data set. Since we know that there are nineteen pieces of data, we know that the tenth number in our set of data when listed from least to greatest will be the median for this data. This number is forty one. The median for the data set is forty one.
In Step E, we Examine Your Results.
First, does your answer make sense? Here compare your answer to the question. Yes, because I was looking for the median of the data set.
Is your answer reasonable? Here compare your answer to the estimate. Yes, because it is close to my estimate of about forty.
And is your answer accurate? Here check your work. Yes, the answer is accurate.
We are now ready to write your answer in a complete sentence. The median value of the data set is forty one which represents forty one people.
In this SOLVE problem we determined the median of the data set. During this lesson we will learn how to work with measures of center and variability with box plots. We will refer back to this SOLVE problem as we continue the lesson.
DISCOVERY ACTIVITY – EXTEND THE SOLVE PROBLEM
MAXIMUM VALUE, MINIMUM VALUE, MEDIAN, AND QUARTILES
In the last activity we completed a SOLVE problem finding the median of attendance at a movie over a nineteen day period. When we found the median in the SOLVE problem, we were finding the measure of center. What is a measure of center? It is numerical data described in a single value.
What type of data display might we use to chart all nineteen data points from the SOLVE problem? We could use a dot plot. Explain when a situation would be appropriate to use a dot plot. It is appropriate to use a dot plot, when you need to display each data value, identify clusters, peaks, and gaps, and to determine the mean and median for a set of data.
Is it always necessary to display each piece of data on a graph? Why or why not? No, because sometimes there is too much data or you only need to identify one value such as the median or mean.
Can you think of other data questions where you do not need to display each piece of data on the graph? One example would be: when asking what percentage of the data is above or below a certain value such as the median or you may need to know the lowest or highest value.
Let’s look at the data from the SOLVE problem we just completed. The data is listed below in order from least to greatest.
(Ten, thirteen, fourteen, fifteen, twenty two, twenty four, twenty six, thirty five, thirty seven, forty one, forty eight, fifty six, fifty six, fifty seven, fifty nine, sixty, sixty, sixty four, sixty five).
What are some of the values we can identify by looking at the list? Let’s look at the graphic organizer to describe three of them.
What is the first value in the chart? It is the minimum value. What does the word minimum mean? Minimum means the lowest value. Let’s record this information in our chart for minimum value. It means the lowest value. How can we identify the minimum value? We can place the values in order from least to greatest and identify the lowest value. Record this information in the column form how did you find it in the graphic organizer for minimum value. What is the minimum value for this set of data? The minimum value is ten. Record this in the graphic organizer as well.
Now complete the rest of the chart.
What is the maximum value? It means the greatest value. We find it by placing the values in order form least to greatest and identifying the greatest value. The maximum value for our set of data is sixty five.
What is the median? The median means the middle value. We find it by starting with the highest and lowest values and we can cross them out until only one value remained in the middle of the data set. The median for our set of data is forty one.
Now let’s chart the minimum value, maximum value, and median for our data set on the number line. The number line below shows our numbers starting at ten and going up to sixty five. This is appropriate for our set of data.
The minimum value of the set of data is ten. So we will place a dot on the number line at ten.
The median for our data set is forty one. So we will place a dot on the number line at forty one. This is going to be closer to forty than to forty five, but between those two numbers on the number line.
And our maximum value for the data set is sixty five. So we will place our third dot on the number line at sixty five.
How do we choose what numbers to use on the number line? We choose these numbers by looking at the minimum and maximum values and the range between the two.
What is our scale for the number line? Our scale is five. We are counting by five from ten to sixty five on the number line.
What percent of the data is greater than the median value of forty one? Fifty percent of the data. What does this mean? It means that fifty percent of the time the attendance at the movies was more than forty one people.
What percent of the data is less than the median value of forty one? Also fifty percent. What does this mean? This means that fifty percent of the time the attendance at the movie was less than forty one people.
Can we determine from this graph any other information about the attendance at the movies? Why or why not? No, because there are only three values shown, the minimum, maximum and median.
Discuss how we could use the information from the data set to determine the value that marks the point of the lowest twenty five percent of the attendance. What is the median of the lower half of the data set? The median of the lower half of the data set is twenty two. Let’s explain how we determined that value. We separated the lower fifty percent of the data and found the median of those nine values. You can see the nine values that are in the lower fifty percent of the data here at the bottom of the screen.
(Ten, thirteen, fourteen, fifteen, twenty two, twenty four, twenty six, thirty five thirty seven).
The median of these values is twenty two. What does this mean? Twenty five percent of the days the movie was shown had less than twenty two people in attendance. This value marks the data point of twenty five percent and we call this Quartile one or Q one for short.
Discuss how we could use the information from the data set to determine the value that marks the point of the highest twenty five percent of attendance. What is the median of the upper half of the data set? The median of the upper half of the data set is fifty nine. Explain how we determine that value. We separated the upper fifty percent of the data and found the median of those nine values. Those nine values are listed here.
(Forty eight, fifty six, fifty six, fifty seven, fifty nine, sixty, sixty, sixty four, sixty five).
The median of the set of data is fifty nine. What does this mean? Twenty five percent of the days the movie was shown had more than fifty nine people in attendance. This value marks the data point of seventy five percent and we call this Quartile three or Q three.
Let’s plot the quartiles on the number line. We will plot a point at twenty two for Quartile one and a point at fifty nine to represent Quartile three.
Now let’s complete the graphic organizer based on what we discovered about Quartile one and quartile three.
Quartile one means the median of the lower half of the data. And Quartile three means the median of the upper half of the data.
How did you find the value of Quartile one? You divided the lower fifty percent of the data into two equal halves.
And how did you find the values for Quartile three? You divided the upper fifty percent of the data into two equal halves.
What is the value of Quartile one? Twenty two
And what is the value of Quartile three? Fifty nine
Hold on to this information as we will use it in the next activity.
CREATING A BOX PLOT AND INTERQUARTILE RANGE (IQR)
When we create a box plot, we use the number line and all the information we have obtained from the data set to build our graph. Notice the points that we have plotted on the number line include, the minimum value, the first quartile, the median value, the third quartile, and the maximum value for our data set.
Let’s identify all of the values we have found so far. The minimum value for our data set is ten. Quartile one for our data set is twenty two. The median for our data set is forty one. Quartile three for our data set is fifty nine. And our maximum value for the data set is sixty five.
Now we can use those values to create our box plot. We will draw a rectangle above the number line that begins at quartile one and ends at quartile three. The rectangle will begin at twenty two and end at fifty nine. Let’s draw this now. Next we will draw a horizontal line from quartile one to the minimum value of ten. And finally we will draw a horizontal line from quartile three to the maximum value of sixty five. You have now learned how to create a box plot!
Now explain the data set for the box plot. It uses a data set listed from least to greatest.
How many data points does the box plot display? It displays five data points.
What are the five data points that can be identified from the box plot? They are the median, the minimum value, the maximum value and quartile one and quartile three.
We can easily identify the spread of the data. The box plot separates the data into four parts with each part containing twenty five percent of the data.