HW Day 2. To hand in. [ ] marks additions to text question.
1.28, p. 28 Hispanic origins. Fig. 1.13 is a pie chart prepared by the Census Bureau to show the origin of the more than 43 million Hispanics in the U.S. in 2006. About what percent of Hispanics are Mexican? Puerto Rican? You see that it is hard to determine numbers from a pie chart. Bar graphs are much easier to use. (The Census Bureau did include the percents in the pie charts)
[Also: To argue with Moore: A pie chart allows you to answer easily questions like: "more than half?", "About 2/3?" "Less than 1/4?",... "Less than 1/8?" Describe these groups in those simple-fraction terms: Mexican, Puerto Rican, Other-Hispanic.]
Type of spam / PercentAdult
Financial
Health
Internet
Leisure
Products
Scams / 19
20
7
7
6
25
9
1.29p.29 Spam ...Here is a compilation of the most common types of [email] spam. Make two bar graphs of these percents, one with bars ordered as in the table (alphabetically) and the other with bars in order from tallest to shortest. Comparisons are easier if you order the bars by height.
[Note the categories "come" in alphabetical order. Just make the bargraphs by hand on ordinary paper. In order by size, it's called a Pareto chart . Also: Calculate how much "other" spam there is, and Use your pie template to make a pie chart. (The pie chart is only "legal" if every item falls into only one box. I wonder: is a Viagra ad Adult, Health, Leisure, or Products?]
marijuana use per yearNever / 1-10 times / 11-50 times / 51+ times
Drivers
Accidents
caused / 452
59 / 229
36 / 70
15 / 156
50
1.40 p. 35 Marijuana and traffic accidents: Researchers in New Zealand interviewed .907 drivers aged 21. They had data on traffic accidents and they asked the drivers about marijuana use. Here are data on the numbers of accidents caused by these drivers at age 19, broken down by marijuana use at the same age.
a) Explain carefully why a useful graph must compare rates (accidents per driver) rather than counts of accidents in the four marijuana use classes.
b) Make a graph that displays the accident raste for each class. What do you conclude? (You can’t conclude that MJ use causes accidents, because risk takers are more likely both to drive aggressively and to use marijuana.
[Divide accidents by # of drivers to get rates. Also: This table has been summarized from a dataset where the Individuals were the 907 drivers interviewed, and the Variables were Marijuana Use during 19th year and Accidents Caused during 19th year. (The drivers were then put into categories by Marijuana Use and the accidents totaled in each category). As presented, Marijuana Use is a Categorical variable. Is it Nominal, or Ordinal?]
2.15, p. 58 Here are the amounts of money (cents) in coins carried by 10 students in a statistics class. 50 35 0 97 76 0 0 87 23 65.
From this dataset (coins), make a Dotplot. Is there anything odd/interesting about the data set? Don't do anything else.
1.9,p.18 Unmarried women. Fig. 1.9 shows the distribution of the state %s of women aged 15 and over who have never been married.
a) the main body of the dist. is slightly skewed to the right. There is one clear outlier, the District of Columbia. Why is it not surprising that the % of never-married women in higher in DC than in the 50 states?
b) The midpoint of the distribution is the 26th state in order of percent of never-married women. In what class does the midpoint fall? [Add up bars left to right till you’ve got “enough”]. About what is the spread (smallest to largest) of the dist.
1.30 fruit ... Many of us don’t eat enough [fruit]. Fig. 1.14 is a histogram of the number of servings of fruit per day claimed by 74 seventeen year old girls in a study in PA. Describe the shape, center, and spread of the dist. What percent of these girls ate fewer than 2 servings per day?
1.32 p. 29 Returns on common stocks. The return on a stock is the change in its market price plus any dividend payments made. Total return is usually expressed as a percent of the beginning price. Fig. 1.16 is a histo of the dist. of the monthly returns for all stocks listed on U.S. Markets from Jan 1985 to Sept 2007 (273 months) The extreme low outlier is the market crash of Oct. 1987, when stocks lost 23% of their value in one month.
a) Ignoring the outliers, describe the overall shape of the distrib. of monthly returns.
b) What is the approx. center of this distribution? (for now take the center to be the value with roughly half the months having lower returns and half having higher returns.)
c) Approx. what were the smallest and largest monthly returns, leaving out the outliers? (This is one way to describe the spread of the distribution)
d) A return less than zero means that stocks lost value in that month. About what percent of months had returns less than zero?
------
1.31, p. 29 IQ [reading stemplot] Fig. 1.15 is a stemplot of the IQ test scores of 79 seventh-grade students in a rural Midwestern school.
a) Four students had low scores that might be considered outliers. Ignoring these, describe the shape, center, and spread of the distribution. (Notice that it looks roughly bell-shaped.)
b) We often read that IQ scores for large populations are centered at 100. What percent of these 78 students have scores above 100?
p. 32, 1.35 doctors. [Do a stemplot, not a histogram. Use hundreds as stems (tens as leaves), and split them as on p. 22. The tallying and the effect is the same as a histogram with classes 150-199, 200-249, 250-299, etc.]
Table 1.5 gives the number of active medical doctors per 100,000 people in each state.
a) Why is the number of doctors per 100,000 people a better measure of the availability of health care than a simple count of the number of doctors in a state?
b) Make a histogram stemplot that displays the distribution of doctors per 100,000 people. Write a brief description of the distrib. Are there any outliers? If so, can you explain them? [I’m not sure the answer book’s explanation is complete!]
Women / Men180 120 180 360 240
120 180 120 240 170
150 120 180 180 150
200 150 180 150 180
120 60 120 180 180
90 240 180 115 120 / 90 120 30 90 200
90 45 30 120 75
150 120 60 240 300
240 60 120 60 30
30 230 120 95 150
0 200 120 120 180
1.38 p. 35 Study times We asked the students in a large first-year college class how many minutes they studied on a typical weeknight. Here are the responses of random samples of 30 women and 30 men from the class:
a) Examine the data. Why are you not surprised that most responses are multiples of 10 minutes? We eliminated one student who claimed to study 30,000 minutes per night. Are there any other responses you consider suspicious?
[continued]
(study times, continued)
b) Make a back-to-back stemplot to compare the two samples. That is, use one set of stems with two sets of leaves, one to the right and one to the left of the stems. (Draw a line on either side of the stems to separate stems and leaves.) Order both sets of leaves from smallest at the stem to largest away from the stem. Report the approximate midpoints of both groups. Does it appear that women study more than men (or at least claim that they do?)
[You can do back to back, or do side by side on the same scale, like fig. 2.5, p. 57. Using hundreds as stems only gives 3 stems. Splitting by 5's (p. 22) might be good enough. Notice the mental rounding of the responses, to quarter hours if not to ten minuteses. Makes "Granular" data.]
1.43, p.36, Housing starts Timeplot. Fig. 1.18 is a time plot of the number of single family homes started by builders each month from Jan. 1990 to Dec. 2007. The counts are in thousands of homes.
a) The most notable pattern in this time plot is yearly up-and-down cycles. At what season of the year are housing starts highest? Lowest? The cycles are explained by the weather in the northern part of the country.
b) Is there a longer-term trend visible in addition to the cycles? If so, describe it.
c) The big economic news of 2007 was a severe downturn in housing that began in mid-2006. How is this downturn visible in the time plot?