Organization and Description Of

Organization and Description Of

1

Chapter 2

ORGANIZATION AND DESCRIPTION OF

DATA

2.1(a)The percentage in other classes is

(b)

(c)The percentage of waste that is paper or paperboard is:

The percentage of waste in the top two categories is:%

The percentage in the top five categories is:

2.2The frequency table for blood type is

Blood type / Frequency / Relative Frequency
O / 16 /
A / 18 /
B / 4 /
AB / 2 /
Total / 40 /

2.3The frequency table for number of activities is

Number of Activities / Frequency / Relative Frequency
0 / 7 / 7/40 = 0.175
1 / 10 / 10/40 = 0.25
2 / 13 / 13/40 = 0.325
3 / 5 / 5/40 = 0.125
4 / 2 / 2/40 = 0.05
5 / 1 / 1/40 = 0.025
6 / 1 / 1/40 = 0.025
7 / 1 / 1/40 = 0.025
Total / 40 / 1.00

This is the relative frequency histogram:

2.4The frequency table for number of crashes per month is

Number of Activities / Frequency / Relative Frequency
0 / 5 / 5/59 = 0.085
1 / 12 / 12/59 = 0.203
2 / 11 / 11/59 = 0.186
3 / 14 / 14/59 = 0.237
4 / 8 / 8/59 = 0.136
5 / 8 / 8/59 = 0.136
6 / 1 / 1/59 = 0.017
Total / 59 / 1.00 (rounding error)

This is the relative frequency histogram:

2.5(a)The table of relative frequencies for workers in the department is

Mode of Transportation / Frequency / Relative Frequency
Drive alone / 50 /
Car pool / 6 /
Ride bus / 14 /
Other / 10 /
Total / 80 /

(b) The pie chart for workers in the department is

2.6The table of relative frequencies for the money raised (in million dollars) is

Source / Frequency / Relative Frequency
Individuals and bequests / 234 /
Industry and business / 48 /
Foundations and associations / 132 /
Total / 414 /

The pie chart for the university fund drive is

2.7There are overlapping classes in the grouping. A report of 3 stolen bicycles will fall in two classes.

2.8There is a gap. A report of 6 complaints in one week does not fall in any class. The last class should be 6 or more.

2.9There is a gap. The response 5 close friends does not fall in any class. The last class should be 5 or more.

2.10The first class should be “less than 175 pounds”. Otherwise, a light weight kicker cannot be assigned to a class.

2.11(a) Yes. (b) Yes. (c) Yes. (d) No. (e) No.

2.12The frequency table of the survey response is

Response / Frequency / Relative Frequency
1 / 14 /
2 / 13 /
3 / 7 /
4 / 16 /
Total / 50 /

2.13(a)The relative frequencies are , 0.48, 0.26, and 0.08 for 0, 1, 2, and 3 bags, respectively.

(b)Nearly one-half of the passengers check exactly one bag. The longest tail is to the right.

(c)The proportion of passengers who fail to check a bag is .

2.14The dot diagram of meter readings is

2.15The dot diagram of amounts of radiation leakage is

2.16The dot diagram of number of bad checks received is

2.17(a) The dot diagram of number of CFUs is

(b)There is a long tail to the right with one extremely large value of 1700 CFU units.

(c)There is one day so the proportion is

2.18(a)The frequency distribution of tornado fatalities is given in the table below.

Class Interval / Frequency / Relative Frequency
[0, 25) / 2 / 2/58 = 0.034
[25, 50) / 19 / 19/58 = 0.328
[50, 75) / 18 / 18/58 = 0.310
[75, 100) / 7 / 7/58 = 0.121
[100, 150) / 5 / 5/58 = 0.086
[150, 200) / 2 / 2/58 = 0.034
[200, 250) / 1 / 1/58 = 0.017
[250, 550) / 3 / 3/58 = 0.052
Total / 58 / 0.982 (rounding error)

(b)The relative frequency histogram is given below.

(c)The proportion of years having 49 or fewer tornado fatalities is .

(d)There is a long tail to the right due to the fact that the last class interval is much wider than the others yet still exhibits a low frequency of observations.

2.19(a)In the following frequency distribution of lizard speed (in meters per second), the left endpoint is included in the class interval but not the right endpoint.

Class Interval / Frequency / Relative Frequency
0.45 to 0.90 / 2 / 0.067
0.90 to 1.35 / 6 / 0.200
1.35 to 1.80 / 11 / 0.367
1.80 to 2.25 / 5 / 0.167
2.25 to 2.70 / 6 / 0.200
Total / 30 / (rounding error)

(b)All of the class intervals are of length 0.45 so we can graph rectangles whose heights are the relative frequency. The histogram is

2.20In the following frequency distribution of order of earthquake magnitudes (as given on the Richter scale), the left endpoint is not included in the class interval, but the right one is.

Class Interval / Frequency / Relative Frequency
(6.0, 6.3] / 12 / 12/55 = 0.218
(6.3, 6.6] / 15 / 15/55 = 0.273
(6.6, 6.9] / 10 / 10/55 = 0.182
(6.9, 7.2] / 10 / 10/55 = 0.182
(7.2, 7.5] / 5 / 5/55 = 0.091
(7.5, 7.8) / 2 / 2/55 = 0.036
(7.8, 8.1) / 1 / 1/55 = 0.018
Total / 55 / 1.0000 (rounding error)

The class intervals all have the same length so we take the option of making the height of a rectangle equal to the relative frequency. The histogram is

2.21This time, the frequency distribution is given by

Class Interval / Frequency / Relative Frequency
(6.0, 6.3] / 12 / 12/55 = 0.218
(6.3, 6.6] / 15 / 15/55 = 0.273
(6.6, 6.9] / 10 / 10/55 = 0.182
(6.9, 7.2] / 10 / 10/55 = 0.182
(7.2, 7.9] / 8 / 8/55 = 0.145
Total / 55 / 1.0000 (rounding error)

The corresponding frequency histogram is as follows:

2.22The stem-and-leaf display of the scores is

2.23The stem-and-leaf display of the amount of iron present in the oil is

2.24The corresponding measurements are

225 238 290 319 344 371 382 397 405 416 433 480 504 568 613

2.25The double-stem display of the amount of iron present in the oil is

2.26The corresponding measurements are

18 20 20 20 20 21 22 22 23 23 23 23 23 24

24 24 25 25 25 26 26 27 29 30 31 31 34

2.27The five-stem display of the Consumer Price Index in 2001 for the given cities is

2.28(a) The median is 4. The sample mean is

(b)The median is 3. The sample mean is

2.29(a) The sample mean is

The ordered measurements are 0, 1, 2, 4, 8, so the median is 2

(b)The mean is

The ordered measurements are: 26, 26, 30, 31, 32, 38

The

(c)The sample mean is

The ordered measurements are: .

The median is 2.

2.30The sample mean is

The ordered measurements are: 5.3, 5.4, 5.5, 5.6, 6.2, 6.5, 6.6, 6.9.

The

2.31(a)

(b)The ordered observations are:

So, the median is 110 CFU units. The one very large observation makes the sample mean much larger. Hence, the sample median is better to use in this instance.

2.32(a)The ordered monthly incomes are: 2300 2350 2400 2450 2575 2650 4700.

, .

(b)For a typical salary, the median is better. Only one person earns more than the mean.

2.33The mean is . The claim ignores variability and is not true. It is certainly unpleasant with a daily maximum temperature 105oF in July.

2.34The sample mean is

cases

The ordered sales times are: 65, 67, 67, 70, 72, 73, 84

2.35(a)

(b)The sample median is 8. Since the sample mean and median are about the same, either of them can be used as an indication of radiation leakage.

2.36The mean, 10.30, is one measure of center tendency and the median, 10.00, is another. These values may be interpreted as follows. On average, there were 10.3 reports of aggravated assault at the 27 universities. Thirteen of the universities had at least 10 such reports while 13 recorded at most 10 such reports. At least one school logged exactly 10 reports.

2.37The mean, 118.05, is one measure of center tendency and the median, 117.00, is another. The value 118.05 tells us that, on average, that a baby weighed 118.05 ounces. The median tells us that about half of the babies weighed at least 117 ounces while roughly half weighed at most 117 ounces.

2.38(a) (activities)

(b) Sample median is 2 activities

(c) The large observations of 5, 6, and 7 activities did not drastically affect the computation of the mean in this instance.

2.39(a) (returns)

(b) Sample median is 2 returns.

2.40(a)Sample median (seconds).

(b) (seconds).

2.41(a) days.

(b) Sample median . Both the sample mean and the sample median give a good indication of the amount of mineral lost.

2.42(a) Sample median for males .

(b)Sample median for females .

(c)Sample median for the combined set of males and females .

2.43The ordered measurements are: 145, 158, 165, 176, 182, 183, 200, 205, 216, 232

Sample median (minutes).

2.44In Exercise 2.43, the sample mean (minutes). The total time for 10 games is minutes and this is meaningful. However median ignores the actual times of the long games and is therefore meaningless.

2.45(a)The dot diagram for the diameters (in feet) of the Indian mounds in southern Wisconsin is

(b). Sample median .

(c), so we count in 4 observations. and .

2.46, an integer, so we average the 4th and 5th observations. and . The median, or days.

2.47(a)Median .

(b), so we need to count in 10 observations. The 11-th smallest observation also satisfies the definition.

2.48 calls per shift.

2.49The ordered data are

Since the number of observations is 25, the median or second quartile is the 13th ordered observation in the list. The first quartile is the 7th observation.

2.50(a)The ordered data are

Since the number of observations is 30, the median or second quartile is the average of the 15th and 16th in the list. Sample median

meters per second. Because , the first quartile is the 8th ordered observation.

(b)Since 0, the 90th percentile is the average of the 27th and 28th observation in the ordered list. Sample 90th percentile

.

2.51(a) The ordered observations are

Since the sample size is 15, the median is the 8th observation 110. To obtain , we find so the first quartile is the 4th observation in the ordered list.

(b) The 90th percentile requires us to count in at least 0 or 14 observations. The 90th sample percentile .

2.52(a)The mean of the original data set is

Adding to the original data set we get: 8, 12, 12, 11, 13, 10. The mean of the new data set is

which equals . Multiplying the original data set by we get: 8, 16, 16, 14, 18, 12. The mean of the new data set is

which equals .

(b)The median of the original data set is

When is added to the original data set, the median of the new data set is

which equals . When the original data set is multiplied by , the median of the new data set is

which equals .

2.53(a)The ordered data are 62, 70, 75, 75, 80. The median is 75oF and the mean is .

(b)The mean of is by property (i) of Exercise 2.52 with . By property (ii)

By similar properties for the median

2.54(a)Company A. The average is highest and a superior machinist would earn above the median.

(b)Company B. A medium quality machinist would be paid near the median. Company B has the higher median.

2.55(a)

(b)

LakeApopka LakeWoodruff

(c)From the dot diagrams, the males in LakeApopka have lower levels of testosterone and their sample mean is only about one-third of that for males in (un-contaminated) LakeWoodruff. This finding is consistent with the environmentalists’ concern that the contamination has affected the testosterone levels and the reproductive abilities.

2.56(a)

(b)Males Females

(c)The dot diagrams of the amount of testosterone seems to be quite similar for males and females although there is a gap in the male diagram. The two means are nearly the same which suggests that the insecticide contamination has pushed hormone concentrations far out of balance because, ordinarily, males should have higher testosterone concentrations.

2.57(a)We carry out all necessary calculations in the following table. The mean is .

7 / 3 / 9
2 / / 4
3 / / 1
Total / 12 / 0.0 / 14

(b)The variance and the standard deviation are

2.58(a)We carry out all necessary calculations in the following table. The mean is .

1 / -4 / 16
10 / 5 / 25
4 / -1 / 1
Total / 15 / 0.0 / 42

(b)The variance and the standard deviation are

2.59(a)We carry out all necessary calculations in the following table. The mean is .

6 / 0 / 0
4 / / 4
12 / 6 / 36
2 / / 16
Total / 24 / 0.0 / 56

(b)The variance and the standard deviation are

2.60(a)We carry out all necessary calculations in the following table. The mean is .

2.6 / 0.3 / 0.09
1.5 / -0.8 / 0.64
3.5 / 1.2 / 1.44
2.4 / 0.1 / 0.01
1.5 / -0.8 / 0.64
Total / 11.5 / 0.0 / 2.82

(b)The variance and the standard deviation are

2.61We carry out all necessary calculations in the following table.

8 / 64
3 / 9
4 / 16
Total / 15 / 89

The variance is

2.62We carry out all necessary calculations in the following table.

6 / 36
4 / 16
12 / 144
2 / 4
Total / 24 / 200

The variance is

2.63(a).

(b).

(c).

2.64(a)Many factors could explain the difference in apartment rents. One possible factor is simply that different landlords may charge different rents. Other factors are the size of the apartment, the proximity of the apartment to key locations such as parks or public transportation, and whether utilities such as water and electricity are included.

(b).

(c)

2.65.

2.66(a).

(b)

2.67(a).

(b).

(c). so . The single very large value greatly inflates the standard deviation.

2.68(a).

(b).

2.69(a).

(b).

(c)

2.70(a) bags.

(b),so that.

2.71(a)Median .

(b).

(c). Hence .

2.72(a)The measure of variation displayed is 7.61, the sample standard deviation. The sample variance is .

(b)The interquartile range is . This means the center half of the data span an interval of length 9.

(c)Any value greater than 7.61 would correspond to greater variation.

2.73(a)The measure of variation displayed is 15.47, the sample standard deviation. The sample variance is .

(b)The interquartile range is . This means the center half of the data span an interval of length 25 ounces.

(c)Any value smaller than 15.47 would correspond to smaller variation.

2.74(i)For the observations 5, 9, 9, 8, 10, 7, and . Add to the observations , we have 9, 13, 13, 12, 14, 11. The sample mean and variance of the new data set are

So the standard deviation of the new data set is which is the same as the standard deviation of x.

(ii)Multiply the observations x by . We get 10, 18, 18, 16, 20, 14. The sample mean and variance of the new data set are

So the standard deviation of the new data set is or d times the standard deviation of x.

2.75Using the data set in Exercise 2.22, in Exercise 2.47, we determined that and . Hence,

Interquartile range points.

2.76From the data set of Exercise 2.33, in Exercise 2.46 we determined that and . Hence,

Interquartile range days.

2.77No. Typically, the middle half of a data set is much more concentrated than the sum of the two quarters, one in each tail. As an example, for the water quality data of Exercise 2.17, the range is because of one extremely large observation. From the quartiles determined in Exercise 2.51, the interquartile range is . The range is six times larger than the interquartile range.

2.78(a) and so is the interval . This interval contains 38 observations or proportion .95 of the observations. And is the interval which contains proportion 1 of the observations.

(b)The empirical guidelines suggests proportion 0.95 in the interval and we observed 0.95. It suggests proportion 0.997 for the interval and we observed 1.000. The agreement is excellent.

2.79(a) and .

(b)The proportion of the observations are given in the following table:

(c)We observe a good agreement with the proportions suggested by the empirical guideline.

2.80(a) and .

(b)The proportion of the observations are given in the following table:

(c)We observe a good agreement with the proportions suggested by the empirical guideline.

2.81(a) and .

(b)The proportion of the observations are given in the following table:

(c)We observe a good agreement with the proportions suggested by the empirical guideline.

2.82(a)The z-values of 350 and 620 are

.

(b)For the z-score of 2.4, the raw score is obtained by solving the equation

so .

2.83(a) (b)

2.84(a),(b)The boxplots for salaries in City A and City B are shown below.

(c)There is a greater difference between the cities with respect to the higher salaries. For instance, any salary above the median in City B is greater than the 75th percentile in City A.

2.85For males, the minimum and the maximum horizontal velocity of a thrown ball are 25.2 and 59.9 respectively. The quartiles are:

For females, the minimum and the maximum horizontal velocity of a thrown ball are 19.4 and 53.7 respectively. The quartiles are

.

The boxplot of the male and female throwing speed are

Comparing the two boxplots, we can see that males throw the ball faster than females.

2.86(a)The differences, arranged in order, between 2007 and 1992 Consumer Price Index are

1313 14 18 20 20 21 21 21 22 23 24 25 25 26 26

27 33 33 34 35 36 37 41

The five-number summary is: 13, 20.5, 24.5, 33.5, 41

Alternatively, from Minitab:

Descriptive Statistics:

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

C1 24 0 25.33 1.59 7.81 13.00 20.25 24.50 33.00 41.00

(b)

2.87(a)

and

(b)Since and only the increase of 41 for Honolulu lies outside the interval. The proportion lies within the interval.

2.88(a)Using the ordered data set from Example 5, we have

(b)The box plot depicting this data set is as follows:

2.89From Exercise 2.38, we know that

and

2.90(a)

(b)

(c)The dot plot is given below

(d)All are losses except for two gains in the 1998 and 2002 elections.

2.91(a) The ordered data are

Median seats lost.

(b)The maximum number of seats lost, 55, occurred when Harry S. Truman was President. The minimum number, or a gain, occurred during G.W. Bush’s term as President.

(c)range

2.92

The process appears to be in statistical control. The pattern is nearly a horizontal band with one possible low value.

2.93

The value 215 from the second pay period looks high and 194 from the fifth period is possibly high.

2.94We calculate and so the upper limit is and the lower limit is .

Only the value 50 calls for worker 20 is out of control.

2.95We calculate and so the upper limit is and the lower limit is which we take as 0.

Only the value 215 from the second pay period is out of control.

2.96

The process appears to be in statistical control between 1994 and 2003, but then begins to taper off from 2003 to 2007.

2.97

The process appears to be in statistical control. Only the value of 2.29 in 1993 is out of control.

2.98We re-calculate without the outlier 5326.

and

so the upper limit is and the lower limit is . All of the points are within the control limits.

2.99(a)The relative frequencies of the occupation groups are:

Relative Frequency
2007 / 2000
Goods Producing / 0.139 / 0.161
Service (Private) / 0.722 / 0.702
Government / 0.139 / 0.136
Total / 1.000 / 1.000

(b)The proportions of persons in private service occupations and government has increased while the proportion in goods producing have decreased from 2000 to 2007.

2.100(a)The frequency table of “intended major” of the students is:

Intended major / Frequency / Relative Frequency
Biological Science / 18 / 0.367
Humanities / 4 / 0.082
Physical Sciences / 9 / 0.184
Social Science / 18 / 0.367
Total / 49 / 1.000

(b)The frequency table of “year in college” of the students is:

Year / Frequency / Relative Frequency
1 / 4 / 0.082
2 / 10 / 0.204
3 / 20 / 0.408
4 / 15 / 0.306
Total / 49 / 1.000

2.101The dot diagrams of heights for the male and female students are

2.102The frequency table of the causes for power outage is:

Frequency Table for Causes of Outage
Cause / Frequency
Trees and limbs / 12
Animals / 9
Lighting / 3
Wind storm / 1
Fuse / 1
Unknown / 4

The Pareto chart for the cause of outage is

2.103(a)Yes. The exact number of lunches is the sum of the frequencies of the first four classes.

(b)Yes. The exact number of lunches is the sum of the frequencies of the last two classes.

(c)No.

2.104The sample mean and sample standard deviation are:

mm.

.

2.105(a)The mean, 227.4, is one measure of center tendency and the median, 232.5, is another. These values may be interpreted as follows. On average, the 20 grizzly bears weigh 227.4 pounds apiece. Half of the grizzly bears sampled weighed at least 232.5 pounds while half weighed at most 232.5 pounds.

(b)The sample standard deviation is 82.7 pounds.

(c)The z score for a weight of 320 pounds is

2.106(a)Median .

(b)We count in or 10 observations to find and .

(c)The proportion of students who scored below 40 is .

The proportion of students who scored 90 or over is .

2.107(a)Sample median .

(b).

(c)The sample variance is

.

2.108(a)The double stem display is

(b)Median .

2.109(a)

(b)By the properties, the new data set has sample mean

and standard deviation 2. By direct calculation, we verify

(c)By the properties, the new data set has sample mean

and standard deviation . By direct calculation, we verify

2.110(a)For the heights of males, .

(b)For the heights of females, .

(c)For the heights of males, median .

(d)For the heights of females, median .

2.111(a)The dot diagrams are

(b)From the dot diagrams we can see the number of flies (grape juice) is centered at about 11 and the number of flies (regular food) is centered near 25. The spread looks about the same.

(c)Regular food: .