CHAPTER 1: SAMPLE PROBLEMS FOR HOMEWORK, CLASS OR EXAMS
These problems are designed to be done without access to a computer, but they may require a calculator.
1. The boxplots shown below summarize data for vocabulary scores from two samples of first-graders, some of whom attended Pre-K, and some of who did not (No Pre-K).
a. Among Pre-K children, approximately what proportion have vocabulary scores greater than 41?
b. Among no Pre-K children, approximately what proportion have vocabulary scores between 24 and 28?
c. Which group apparently has the higher typical vocabulary score?
d. Which group shows the greater variability in their vocabulary scores?
e. Summarize the shape of the distribution for each group.
2. You jotted down some summary values for a data set of 15 observations of wave heights, in feet, at a monitoring site. Your notes state that:
37 and 173.
a. Give the mean and the standard deviation of the wave heights, in feet.
b. Give the mean and the standard deviation of the wave heights, in meters. (Hint: a meter = 3.3 feet).
c. You also have written down that the median wave height, in feet, was 2 . Is the distribution more likely negatively skewed, positively skewed, or symmetric?
3. Volunteers in a psychology experiment watch a video re-enactment of a crime. After a delay, they answer questions about the scene. The researcher records both the length of the delay (in hours) and the number of questions the volunteer answered correctly. A graphical display of the data is shown below.
Write a sentence, in simple language, that summarizes the relationship between the two variables.
4. A school is concerned that the playground area may have contaminated soil left from a time when a creosote plant was nearby. The school can not afford to test all the soil on the playground, so a statistician divides a map of the playground into 1000 equally sized rectangles. A random number generator is used to select 30 rectangles. A soil specimen is taken from each selected rectangle.
a. Identify the population, the sampling frame, and the sample.
b. Is this study observational or a designed experiment?
c. For each variable, say whether its scale is nominal, ordinal, interval or ratio. Identify one type of graph that can be used to summarize the data.
Contamination level (none, trace, moderate, high)
Creosote concentration, in parts per billion
Usage (general play, organized sports, drainage structure, etc.)
5. Barometric pressures in tropical cyclones are very negatively (left) skewed. You have written down 1005 and 985 as the mean and median, but forgot to label which was which. Label the values properly.
6. The two histograms below summarize total cholesterol levels for large samples of elderly men and women. Write a short paragraph comparing the two groups. Be sure to contrast the typical values, variability and shape of the distributions.
7. The frequency table below summarizes the results for mercury concentrations in a sample of Florida lakes. Display the relative frequencies using an appropriate graph.
Mercury Concentration Number of cases
Trace (very low) 11
Low 27
Borderline 8
Dangerous 3
8. The data below shows electricity costs in cents per KWH for a sample of 14 utility companies. The data has already been sorted, reading across each row.
12.412.813.113.613.8
13.914.014.214.214.4
14.614.815.015.3
a. Calculate the first quartile, median, and third quartile.
b. Calculate the range and the interquartile range.
c. Identify the outliers, if there are any. Show your computations.
d. Construct the boxplot for this data, and comment on the shape of the distribution.
9. The boxplots show NOx emissions from your power plant (expressed as pounds per MWh of electricity generated), using two different settings for the air intakes. Compare the emissions, being sure to address the typical values, variability, shape and/or other special features. If your goal is to lower typical NOx emissions, which setting should you select? If your goal is to avoid any extremely high values, which setting should you select?
SOLUTIONS
1. a) 25%b) 25%c) The pre-K group apparently has higher typical values.
d) The No pre-K group shows higher variability.
e) Scores in the pre-K group are nearly symmetrically distributed, but in the No pre-K group they are positively (or right) skewed.
2. a) mean = 2.47, SD = 2.42
b) this is a simple change of scale. Mean = 2.47/3.3 = 0.75, SD = 2.42/3.3 = 0.73
c) Since the mean is substantially greater than the median, the distribution is most likely right, or positively, skewed. Alternately, one might see that the size of the SD compared to the mean, combined with the fact that there are no negative wave heights, implies a positive skew.
3. As the time delay increases, the number of questions answered correctly tends to decrease.
4. a) The population is all soil in the playground, or alternatively all 1000 rectangles of soil. The sampling frame is the list of the 1000 rectangles. The sample is the 30 rectangles selected for study.
b) Observational
c) Contamination level is ordinal, use bar chart or pie chart;
Creosote concentration is ratio, use a histogram, boxplot or stem-and-leaf plot.
Usage is ordinal, use a bar chart or pie chart.
5. Since the data is very left skewed, the mean must be less than the median. Mean = 985, median = 1005.
6. Typical values for the females are slightly higher than those for the men. Females show slightly higher variability, though the difference in variability is not large. Both distributions show some right skew, but the skew is stronger in the males.
7. The bar chart is shown below. Pie charts are possible, but are harder to draw and harder to compare when there is more than one sample.
8. a) median = (14.0+14.2)/2 = 14.1
Using the algorithm described in the text, Q1 = 13.35 and Q3 = 14.5
However, a variety of algorithms exist, on the TI83/84 calculators, Q1 = 13.6 and Q3 = 14.6.
b) Range = 2.9. IQR = 1.15 using algorithm for quartiles given in book, but IQR = 1 using TI83/84.
c) Using algorithm for quartiles given in book, lower fence = 11.625 and upper fence = 16.225. There are no outliers.
d) will depend slightly on algorithm for outliers. Nearly symmetric, but some students will not a slight negative skew.
9. The typical NOx values are lower at the High air intake setting, as shown by the lower median. However, the NOx values are more variable at the high air intake setting, and somewhat right-skewed. At the low intake setting, the values are more symmetric and less variable. Hence, to lower the typical NOx values one should choose the high air intake setting, but to avoid extremely high values, one needs to use the low air intake setting.