Measures of Variability Lab

Lab

Objective: The objectives of this lab are:

· Learn to calculate measures of variability in SPSS.

· Learn to use box plots in SPSS.

· Learn to use measures of variability to evaluate a research question.

Directions: Use the data from the GSS to perform the following tasks.

Demonstration: Calculating Measures of Variability and Constructing Box Plots with SPSS

I will use the variable “hrs1” to demonstrate how to calculate the range, 25th percentile, 75th percentile, variance, and standard deviation using SPSS.

a. What is this variable measuring?

b. What is the level of measurement of this variable?

To calculate measures of variability, do the following:

From the pull-down menu, choose Analyze… Descriptive Statistics… Frequencies
Choose the variable “hrs1” (NUMBER OF HOURS WORKED LAST WEEK) and use the arrow to move it to the blank space underneath the Variable(s) area.
Click on Statistics
In the area marked “dispersion”, make sure that there are checks next to the boxes with the following statistics:
Std. deviation
Variance
Range
Minimum
Maximum
In the area marked “Percentile Values” put a check in the box next to Percentile(s)
You can use this to add the 25th and 75th percentile
To do so, type in 25 and click add
Repeat the process with 75 instead of 25
You may also want to include measures of central tendency like:
Mean
Median
Mode
Click Continue… OK

SPSS will calculate several statistics

a. What are the standard deviation and variance of this variable?

i. How do we interpret the variance or the standard deviation?

b. What is the range?

i. How do we interpret it?

c. What is the Interquartile Range (IQR)?

i. How do we interpret it?

Suppose we wanted to look at the distribution of this variable. We can do this using a box plot or a histogram. To construct a box plot, do the following:

From the pull-down menus select Graphs… Boxplot
Select Simple
Make sure that the bubble next to “Summaries of separate variables” is selected
Click on “Define”
Select the variable hrs1 (NUMBER OF HOURS WORKED LAST WEEK) and use the arrow to move it underneath the box marked “Boxes Represent”
Click OK

SPSS will produce a box plot of the variable

In general, how do we interpret a box plot?
What is being represented at the end of the vertical lines?
What is being represented inside the rectangular box?
What is being represented by the black band inside the rectangular box.
What do you notice about the box plot produced by SPSS?
Statistical packages rarely produce traditional box plots; instead they attempt to alert you to sample values which may be unusually distant from the bulk of the data.
These sample values are represented as circles or asterisks
They extend beyond the whiskers. This means that whiskers do not extend to the minimum and maximum of the sample, but to the smallest and largest values inside a “reasonable” distance from the end of the box.
The extra information provided by the flagging process enables you to distinguish between a truly skewed sample, and one whose apparent skewness is attributable to a single point.
SPSS has a two stage flagging process.
Values which are more than three box lengths from either end of the box receive an asterisk.
Values which are between one and a half and three box lengths from either end of the box receive a circle.
All other values are equivalent to a traditional box plot
The following diagram illustrates how SPSS calculates a box plot:

Given this how do we interpret the statistics presented in the box plot?

Suppose we wanted to compare the box plot output to the actual distribution of the data. We could use a histogram to achieve this. To do this, try the following:

From the pull-down menu select Graphs… Histogram
Select the variable hrs1 (NUMBER OF HOURS WORKED LAST WEEK) and use the arrow to move it underneath the box marked “Variable”
Make sure that box next to “Display normal curve” is selected – this will fit a normal curve to the histogram

SPSS will produce a histogram. How does the distribution of the data compare to the box plot?

Lab Exercise

We often hear that there is an earnings gap between men and women. Let’s explore this gap in income more closely by using data from the GSS. We will combine what we learned about measures of central tendency with what we have learned about measures of variability to investigate income differentials between men and women.

We will use the following variables in our analysis: inccod98 and sex.

If you do not already know what these variables are, then use the GSS webpage to help you identify the variables (http://webapp.icpsr.umich.edu/GSS/).
What is the level of measurement of each variable?

To look at differences between the variables, first look at the distribution of the variable sex. Unfortunately, SPSS does not calculate some measures of variability such as the index of qualitative variation (IQV), so this will have to be done by hand.

Calculate the appropriate measure of variability for the variable sex
What does this tell us about the distribution of this variable?
Is this distribution likely to be reflective of the larger population?
Construct a box plot for both variables and compare the box plots to each other
To plot two box plots on the same axis do the following:
From the pull-down menus select Graphs… Boxplot
Choose “Simple”
In the “Data in Chart Area” portion of the window, make sure the bubble next to “Summaries for groups of cases” is selected
Not “Summaries of Separate Variables” as in the last example.
Put the inccod98 variable in the blank under he “Variable” heading
Under the “Category Axis” heading, put the sex variable
Click OK
Compare the box plots to each other
Compare and contrast the two box plots.
What is the approximate range of the data?
What is the approximate IQR of the data?
Where is most of the data located in each distribution?
What is the median income of both groups?
Are there any outliers or extreme cases (as defined by SPSS)?
Construct separate histograms of income for men and women and compare them to the box plots.
To plot two histograms side-by-side, do the following:
From the pull-down menu, select Graph… Histogram
In the space under “Variable” put the income variable
In the space under “Rows” put the sex variable
SPSS will produce two histograms of income, one for men and one for women.
Do the histograms look like you expect them to based on the box plots?
Is there anything unusual about the distributions?
Are either of these distributions skewed?
How do you know?
Now split the file by sex (Data… Split File… Compare Groups) and calculate measures of central tendency and measures of variability across gender.
How do the measures of central tendency compare across men and women?
Which is the most appropriate measure of central tendency?
Why do you think so?
How would you interpret this measure of central tendency?
How do the measures of variability compare across groups?
Are there any measures that are the same across groups?
Which is the most appropriate measure of variability?
Why do you think so?
How would you interpret this measure?