Okun
PSY 230
STUDY GUIDE #2
Graphs, Central Tendency, and the Shapes of Distributions
I. Graphs
1. When is it appropriate to use a histogram? How is a histogram constructed and how is it interpreted?
A histogram is appropriate for either unordered or ordered qualitative variables. The categories of the variable are placed along the X axis. Then the frequencies associated with the categories of the variable are listed on the Y axis. With an unordered qualitative variable, the order of the categories on the X axis is completely arbitrary. In contrast, with an ordered qualitative variable, the categories on the X axis should be ordered in either an ascending or descending manner. Bars are drawn over each category of the variable that corresponds to its frequency of occurrence.
Karate Belt F
White 40
Yellow 20
Brown 10
Black 5
2. When is it appropriate to use a frequency polygon? How is a frequency polygon constructed and how is it interpreted?
A frequency polygon is appropriate for quantitative variables. The values of the variable are placed along the X axis. Then the frequencies associated with the values of the variable are listed on the Y axis. Dots are drawn over each value of the variable that corresponds to its frequency of occurrence. Then the dots are connected.
FREQUENCY DISTRIBUTION FOR NUMBER OF NIGHTS DRINKING DURING A TYPICAL WEEK FOR 30 COLLEGE STUDENTS
X F
7 1
6 2
5 3
4 4
3 5
2 6
1 7
0 2
3. When is it appropriate to use a stem-and-leaf display? How is a stem-and-leaf display constructed and how is it interpreted?
A stem-and-leaf display is appropriate for quantitative variables. We will compute stem-and-leaf displays only for ungrouped (raw) data. Each score is divided into two parts--a stem and a leaf.
Score f
100 1
99 1
98 4
96 3
95 2
94 2
93 5
92 2
91 3
90 1
89 4
88 5
87 1
86 4
85 1
84 3
82 1
81 2
79 2
77 1
76 1
60 1
STEM-AND-LEAF DISPLAY OF FIRST TEST SCORES IN PSY 230 (N = 50)
10 0 [1]
9 01112233333445566688889 [23]
8 112444566667888889999 [21]
7 6799 [4]
6 0 [1]
4. What are the advantages of the stem-and-leaf display?
II. Central Tendency
6. What does central tendency refer to? What is the advantage of using an index of central tendency as compared to a frequency distribution or a graph?
7. What does the mean refer to? Which type of variable is it used with? How is the mean computed?
The mean for a distribution of scores is the arithmetic average. It is obtained by summing all of the scores and dividing by the sample size (N). It is used with quantitative variables.
For a sample, M = åX/N; for a population, mx = åX/N
Consider the following sample of 13 quiz scores
5, 6, 8, 9, 10, 11, 11, 11, 12, 13, 14, 16, and 17
M = 5 + 6 + 8 + 9 + 10 + 11 + 11 + 11 + 12 + 13 + 14 + 16 + 17
______
13
M = 143/13 = 11
Boys Girls
5 6
8 10
9 11
11 13
11 14
12 17
16
åX=72 åX=71
M = 10.29 M = 11.83
8. What is the special property of the mean?
Given the sample mean and N-1 deviation scores in a distribution, how can the remaining raw score be determined?
A special property of the mean is that, when the mean is subtracted from each of the raw scores, the sum of these deviation scores always equals zero
City Temp. Deviation Score (X - M)
Boise 5 5-20 = -15
Chicago 10 10-20 = -10
Detroit 15 15-20 = -5
Pittsburgh 22 22-20 = +2
Phoenix 48 48-20 = +28
______
å = 0
å(X - M)= 0.
When we subtract the mean from a raw score (X - M), we get a deviation score. The sign of the deviation score tells us whether a score is above or below the mean. The value of the deviation score tells us the distance of the score from the mean.
9. Which type of variable is it typically used with? What does the median refer to? How is it computed?
The median is used with qualitative–ordered variables and sometimes with quantitative variables. For quantitative variables, the median is a number that divides a distribution in half when scores are put into an array (i.e., ordered from lowest to highest). The median is equivalent to the 50th percentile.
How to Find the Median
Here are the steps involved in finding the median:
1. Order the observations from lowest to highest(i.e., form an array).
2. Count up the number of observations (N).
3. Divide [N + 1] by 2 to identify the distance (i.e., how many observations) you must “walk” into the array to locate the score that represents the median.
4A. If N is an odd number, proceed with your “walk” and identify the score in the middle of the array. This score is the median.
4B. If N is an even number, use your “walk” to identify the two scores closest to the middle of the array. Then compute the median by summing the two closest scores and dividing by 2.
Computing the Median from Arrayed Distributions of Scores
17
16 16
14 14
13 13
12 12
11 12
11 12
11 11
10 10
9 9
8 8
6 6
5 5
10. What does the mode refer to? Which type of variable is it typically used with? How is it determined?
The mode is the score (or category) in a distribution that occurs most frequently.
III. Shapes of distributions
11. What is the difference between symmetrical and skewed distributions? How are scores distributed in
(1) a normal distribution;(2) a positively skewed distribution; and (3) a negatively skewed distribution?
In a symmetrical distribution, the right-hand side of a of the distribution always will be a mirror image of the left-hand side of the distribution. An example of a symmetrical distribution is the normal curve.
*Normal: A distribution in which scores are bunched up in the middle and scores gradually become sparser at the extremes. (See p. 11.)
In contrast to a normal distribution, sometimes distributions are skewed. In a skewed distribution, most scores are bunched up at one end of the distribution and scores occur relatively infrequently at the other end of the distribution. Skewed distributions are, by definition, asymmetrical.
*positive skew: A lopsided distribution in which most scores are bunched up at the low end and only a few scores are at the high end. Positively skewed distributions are said to be skewed to the right because the tail goes to the right. (See p. 11.)
*negative skew: A lopsided distribution in which most scores are bunched up at the high end and only a few scores are at the low end. Negatively skewed distributions are said to be skewed to the left because the tail goes to the left. (See p. 11.)
12. What is a bimodal distribution? What does a bimodal distribution indicate?
A bimodal distribution has two humps or peaks. Distributions tend to be bimodal when the sample consists of two distinct groups. For example, the weight of adults tends to have a bimodal distribution, reflecting the tendency for men and women to differ in their typical weights (see [p. 11).
13. What is the relative standing of the mean, median, and mode in (a) normal, (b) positively skewed, and (c) negatively skewed distributions?
14. How can everyday discourse be translated into indices of central tendency?
15. What are the advantages of the mean as an index of central tendency with quantitative variables?
16. Under what circumstances is the median a better index of central tendency than the mean with quantitative variables?
Summary of the Properties of the Mean, Median, and ModeThe mean (for a sample, M; for a population, mx) is:
1. the balance point of distribution, the point for which
å(X - M) = 0;
2. the preferred measure for relatively symmetrical distributions and quantitative variables;
3. the measures with the best sampling stability;
4. widely used in advanced statistical procedures;
5. the only measure whose value is dependent on the value of every score in the distribution;
6. more sensitive to extreme scores than the median and the mode and, hence, is not recommended for markedly skewed distributions or small distributions with an outlier; and
7. not appropriate for open-ended distributions.
The median is
1. the point that divides the ordered scores in half;
2. second to the mean in usefulness;
3. widely used for markedly skewed distributions or for small distributions with an outlier because it is sensitive only to the number rather than to the values of scores above and below it;
4. the most stable measure that can be used with open-ended distributions;
5. more subject to sampling fluctuation than the mean;
6. less often used in advanced statistical procedure; and
7. the most appropriate measure for ordered qualitative variables.
The mode is
1. the score (or category) that occurs most often;
2. the only measure appropriate for unordered qualitative variables;
3. the easiest measure to compute;
4. much more subject to sampling fluctuation that the mean and the median; and
5. rarely used in advanced statistical procedures
1