Student Number :I Can T Count That High

251y0452 10/20/04ECO251 QBA1

FIRST HOUR EXAM

October 6, 2004

Name:____Key______

Student Number :I can’t count that high

Class Hour:When I’m still asleep

Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer needs a calculation or short explanation to count.

Part I. (7 points)

Use the eleven numbers that you used in the second problem in the take-home exam. Add 2 to the first number. (If you don’t have them – take your student number plus the numbers (3, 9, 9, 12, 21) . Example: Seymour Butz’s student number is 876509, so he gets 8, 7, 6, 5, 0, 9, 3, 9, 9, 12, 21. Of course, he has read “Things That You Should Never Do on an Exam or Anywhere Else” and knows that he can’t use them this way. )

Compute the following:

a) The Median (1)

b) The Standard Deviation (3)

c) The 2nd Quintile (2)

d) The Coefficient of variation (1)

Seymour used the eleven numbers 3, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1 . The numbers in order are

1, 2, 3, 3, 5, 7, 7, 10, 12, 15, 22.

/ 1 / 1
/ 2 / 4
/ 3 / 9
/ 3 / 9
/ 5 / 25
/ 7 / 49
/ 7 / 49
/ 10 / 100
/ 12 / 144
/ 15 / 225
/ 22 / 484
Total / 89 / 1099

a) The middle number is 7.

b) ,. So

c) =4.8. So and .

251y0451 10/20/04

If you enjoy wasting time, you might want to use the definitional formula. I don’t think that there is any point in reworking the problem on the previous page, so I copied this from Version 1 of the exam.

/ 1 / -6.7273 / 45.256
/ 1 / -6.7273 / 45.256
/ 2 / -5.7273 / 32.802
/ 3 / -4.7273 / 22.347
/ 5 / -2.7273 / 7.438
/ 7 / -0.7273 / 0.529
/ 7 / -0.7273 / 0.529
/ 10 / 2.2727 / 5.165
/ 12 / 4.2727 / 18.256
/ 15 / 7.2727 / 52.873
/ 22 / 14.2727 / 203.711
Total / 85 / 0.0003 / 434.182

,. The vast majority of people who thought that they were using the definitional formula used , which, I believe, should have given them . Doing a little bit of homework should have prevented this error.

251y0452 10/20/04

Part II.

The problem in the textbook that gives the data used in the take home also gives the braking distance for a sample of domestic made cars. It is presented below.Cumulative frequency (in red) is needed to get the median and was not given.

Distance(feet) frequency Cumulative frequency

210 - 220 1 1

220 - 230 1 2

230 – 240 1 3

240 – 250 1 4

250 – 260 4 8

260 - 270 3 11

270 - 280 6 17

280 - 290 4 21

290 - 300 324

300 – 310 125

310 –320 025

Sum 25

Minitab was used to calculate statistics from these data. It claims the following: (Note!!!!!) You will not be able to use any of these numbers in b) or c) without some manipulation in parts b and c. Answers below are not acceptable unless you give some evidence in the sample statistics.

a)Do American cars have a shorter braking distance? Compare all 3 measures of central tendency. (2)

b)Are American cars more consistent in braking distance than foreign cars? Use a dimension-free measurement of variability. (2)

c)Compare the direction and degree of skewness in the two distributions. Use one dimension- free measure of skewness. (2)

d)Write a 5-number summary of the results from the first take-home problem. (2) 15

Solution: a) Seymour had given us, for the foreign-made cars., and the mode is 255.

For the median for the domestic cars . Since 13 is above 11 and below 17, the median is in 270-280, which has a frequency of 6. . The mode is the midpoint of the largest group, which is 275.

DomesticForeign

Mean 268.6 260.647

Median 272.5 255.681

Mode 275 255

According to all measures,American cars have a longer braking distance.

b) Seymour says for the foreign cars . If we compute the coefficient of variation,

Domestic Foreign .

American cars are more consistent.

251y0452 10/20/04

c) You can use or

Domestic Foreign

Mean 268.6 260.647

Mode 275 255

-7946.70 8389.92

22.338 23.8271

= .6424

Bothand make Domestic look more skewed. However, Domestic is skewed to the left and Foreign to the right.

d) Lower Limit210

First Quartile243.5

Median255.681

Third Quartile275.357

Upper Limit320

The following numbers refer to miles-per-gallon of a sample of vehicles (Bowerman and O’Connell).

Class (mpg) / / / /
29.8 - 30.3 / ____ / ____ / ____ / .0612
30.4 – 30.9 / ____ / ____ / ____ / .2449
31.0 – 31.5 / ____ / ____ / 24 / ____
31.6 – 32.1 / ____ / .2653 / 35 / .7551
32.2 – 32.7 / 11 / .2245 / 46 / .9388
32.8 – 33.3 / 3 / .0612 / 49 / 1.000

Fill in the missing numbers. (5)20

Even with corrections made above, this had some errors, but I still could check easily to see if you knew what you were doing. The completely corrected results were.

Class (mpg) / / / /
29.8 - 30.3 / 3 / .0612 / 3 / .0612
30.4 – 30.9 / 9 / .1837 / 12 / .2449
31.0 – 31.5 / 12 / .2449 / 24 / .4898
31.6 – 32.1 / 13 / .2653 / 37 / .7551
32.2 – 32.7 / 9 / .1837 / 46 / .9388
32.8 – 33.3 / 3 / .0612 / 49 / 1.000
Total / 49 / 1.0000

251y0452 10/20/04

Part III. (At least 22 points – 2 points each unless marked)

Mark the variables below as qualitative (A) or quantitative (B)

a)Number of days a patient stays at a spaB

b)Per cent change in population between censusesB

c)Preferences for 10 beers on a 1st to 10th scaleA

d)Method of contraceptionA

Which of the following is an example of continuous ratio data?

a)Number of days a patient stays at a spa

b)*Per cent change in population between censuses

c)Preferences for beers on a 1 to 10 scale

d)Method of contraception

e)None of the above.4

A summary measure that is computed to describe a characteristic of a population is called.

a)a census.

b)a statistic.

c)*a parameter

d)An inference

e)None of the above6

In general what are the two types of descriptive statistic most frequently reported

a)Measures of skewness and measures of central tendency

b)Measures of dispersion and measures of skewness

c)*Measures of dispersion and measures of central tendency

d)Measures of kurtosis and measures of dispersion

e)Measures of kurtosis and measures of skewness

f)Measures of kurtosis and measures of central tendency

g)None of the above.8

251y0452 10/20/04

Mark the following formulas (2 each) . Circle a, b or c. b) must be filled in if you have circled it.

(Sample mean)

a)This cannot be negative.

b)If this is negative it means the distribution is ______

c)*This can be negative, but it has no special meaning.

Coefficient of Excess or

a)This cannot be negative.

b)*If this is negative it means the distribution is platykurtic (Flat-topped)

c)This can be negative, but it has no special meaning.

(Skewness)

a)This cannot be negative.

b)*If this is negative it means the distribution is Skewed to the left.

c)This can be negative, but it has no special meaning.

(Variance)

a)*This cannot be negative.

b)If this is negative it means the distribution is ______

c)This can be negative, but it has no special meaning.12

Does it really mean anything to tell me that if one of these statistics is negative, the distribution is negative?

251y0452 10/20/04

Exhibit 1:The following is taken from Problem 3.22 in the text. The data below represent sales tax receipts submitted to a township government by 50 businesses in one quarter.

Sales Taxes ($000)

10.3 11.1 9.6 9.0 14.5 13.0 6.7 11.0 8.4 10.3

13.0 11.2 7.3 5.3 12.5 8.0 11.8 8.7 10.6 9.5

11.1 10.2 11.1 9.9 9.8 11.6 15.1 12.5 6.5 7.5

10.0 12.9 9.2 10.0 12.8 12.5 9.3 10.4 12.7 10.5

9.3 11.5 10.7 11.6 7.8 10.5 7.6 10.1 8.9 8.6

The text solution manual offers the following results.

(a) Stem-and-leaf display of Quarterly Sales Tax Receipts

657

73568

804679

902335689

1000123345567

11011125668

12555789

1300

145

151

(b) = 10.28

(d) 64% of the receipts are within standard deviations of the mean.

(e)94%of the receipts are within standard deviations of the mean.

(f)100% of the receipts are within standard deviations of the mean.

According to the stem and leaf display, what percent of the receipts were below $8000? (1)

7/50 = 14%

If the researcher was directed to present the data in 5 classes, what should the class interval be? Show your calculations. 15

Lowest is 5.3. Highest is 15.1 Let’s try 2

Show the actual intervals you might use. 17

Notice that if we start at 5 an interval of 2 doesn’t cover, so you have to either start above 5 or use a number above 2.

Class / From / to
A / 5.2 / Under 7.2
B / 7.2 / Under 9.2
C / 9.2 / Under 11.2
D / 11.2 / Under 13.2
E / 13.2 / Under 15.2

I think that using 2.5 and starting at 4 works better.

Class / From / to
A / 4.0 / Under 6.5
B / 6.5 / Under 9.0
C / 9.0 / Under 11.5
D / 11.5 / Under 14.0
E / 14.0 / Under 16.5

251y0452 10/20/04

Before we start, most of you seem to have no idea what ‘3 standard deviations from the mean’ signifies. Nevertheless, one student paper put it this way.

or 8.235 to 12.325

or 6.190 to 14.370

or 4.145 to 16.415

Two of these should appear in your answer below.

The description above says that 64% of the receipts are within standard deviations of the mean. Between what numbers does this mean? How does this compare with the empirical rule? Why might there be a discrepancy? (3)

Empirical rule: (For Symmetrical Unimodal distributions only): 68% within one standard deviation of the mean, 95% within two and almost all (99.7%) within three. This is lower and could be because the distribution is not quite symmetric.

The description above says that 94% of the receipts are within standard deviations of the mean. Between what numbers does this mean? How does this compare with the Chebyshev rule? Why might there be a discrepancy? (3) 23

Chebyshef’s Inequality: or . Thismeans that at least 8/9 should be within 3 standard deviations of the mean. In the realworld the number is almost always larger.

251y0452 10/20/04

ECO251 QBA1

FIRST EXAM

October 6, 2004

TAKE HOME SECTION

Name: ______

Student Number: ______

Throughout this exam show your work! Please indicate clearly what sections of the problem you are answering and what formulas you are using. Turn this is with your in-class exam.

Part IV. Do all the Following (11 Points) Show your work!

1. The frequency distribution below represents the braking distance for a sample of foreign made cars.. Personalize the data as follows. Write down your student number. Take the last two digits of the number. Add the largest of the two last numbers to the frequency for 300-310 and the second largest to the frequency for 310-320. Use the results as your frequencies. For example, Seymour Butz’s student number is 876509 so he adds 0 to the last frequency and 9 to the second to last frequency and uses (1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1).

Distance (feet) frequency

210 - 220 1

220 - 230 3

230 – 240 12

240 – 250 15

250 – 260 22

260 - 270 7

270 - 280 7

280 - 290 5

290 - 300 2

300 – 310 1

310 - 320 1

a. Calculate the Cumulative Frequency (0.5)

b. Calculate The Mean (0.5)

c. Calculate the Median (1)

d. Calculate the Mode (0.5)

e. Calculate the Variance (1.5)

f. Calculate the Standard Deviation (1)

g. Calculate the InterquartileRange (1.5)

h. Calculate a Statistic showing Skewness and Interpret it (1.5)

i. Make an ogive of the data showing relative or percentage cumulative frequency (Neatness Counts!)(1.5)

j. Extra credit: Put a (horizontal) box plot below the ogive using the same scale. (1)

Solution:is the midpoint of the class. Our convention is to use the midpoint of 50 to 60, not 50 to 59.999. Note also, that the midpoints have been divided by 10. Most numbers should be multiplied by 10, the variance should be multiplied by 100 and by 1000. Calculations follow for both the computational and definitional formulas. (Don’t do both.) Seymour’s frequencies are used below.

If you used computational formulas, you should have the following.

1 210-220 1 1 215 215 46225 9938375

2 220-230 3 4 225 675 151875 34171875

3 230-240 12 16 235 2820 662700 155734500

4 240-250 15 31 245 3675 900375 220591875

5 250-260 22 53 255 5610 1430550 364790250

6 260-270 7 60 265 1855 491575 130267375

7 270-280 7 67 275 1925 529375 145578125

8 280-290 5 72 285 1425 406125 115745625

9 290-300 2 74 295 590 174050 51344750

10 300-310 10 84 305 3050 930250 283726250

11 310-320 1 85 315 315 99225 31255875

Total 85 22155 5822325 1543144875

251y0452 10/07/04

If you used definitional formulas, you should have the following.

1 210-220 1 1 215 215 -45.6471 -45.647 2083.7 -95113

2 220-230 3 4 225 675 -35.6471 -106.941 3812.1 -135892

3 230-240 12 16 235 2820 -25.6471 -307.765 7893.3 -202439

4 240-250 15 31 245 3675 -15.6471 -234.706 3672.5 -57463

5 250-260 22 53 255 5610 -5.6471 -124.235 701.6 -3962

6 260-270 7 60 265 1855 4.3529 30.471 132.6 577

7 270-280 7 67 275 1925 14.3529 100.471 1442.0 20698

8 280-290 5 72 285 1425 24.3529 121.765 2965.3 72214

9 290-300 2 74 295 590 34.3529 68.706 2360.2 81081

10 300-310 10 84 305 3050 44.3529 443.529 19671.8 872504

11 310-320 1 85 315 315 54.3529 54.353 2954.2 160572

85 22155 0.000 47689.4 712778

(except for a possible rounding error), and

a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole column.

b. Calculate the Mean (1):

c. Calculate the Median (2): . This is above and below so the interval is the 5th one, 250 – 260. so

d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 22 is the largest frequency, the modal group is 250 to 260 and the mode is 255 ..

e. Calculate the Variance (3): or . The computer got 567.731.

f. Calculate the Standard Deviation (2): .

g. Calculate the InterquartileRange (3): First Quartile: . This is above and below so the interval is 240-250. gives us .

Third Quartile: . This is above and below so the interval is 270-280. . .

Note that an answer for the mean, median, mode, first quartile or third quartile that is not between the highest and lowest number, in this case 210 and 340, is not reasonable!

251y0452 10/07/04

h. Calculate a Statistic showing Skewness and interpret it (3):

We had and

or The computer gets 8689.93.

or Pearson's Measure of Skewness

Because of the positive sign, the measures imply skewness to the right.

i. An ogive is a simple line graph with cumulative frequency between zero and one on the y-axis and the numbers 200-340 on the x-axis. The data Seymour showed is:

210 0 0

220 1 .012

230 4 .047

240 16 .188

250 31 .365

260 53 .624

270 60 .706

280 67 .788

290 72 .847

300 74 .870

310 84 .988

320 85 1.000

330 85 1.000

Each number in the column is the corresponding number in the column divided by The y axis should be marked from zero to a 1.00. In spite of the fact that the question tells you that an ogive shows cumulative frequency, many of you gave me a frequency polygon, most of you did not obey the convention that the curve starts at zero and most of you did not convert of per cent.

j. The box plot should show the median and the quartiles and use the same x axis as the ogive.
251y0452 10/07/04

2. Use the frequencies you used in problem 1 in this problem as values of .

For these eleven numbers, compute the a) Geometric Mean b) Harmonic mean, c) Root-mean-square (1point each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10 logarithms. (1 point extra credit each ). While you’re at it, compute the sample mean and bring it and the numbers that you used on this take-home exam to the in-class exam (no credit until you get to the exam – but it won’t hurt).

Solution: Note thatSeymour used the eleven numbers 1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1. He found or This is not used in any of the following calculations and there is no reason why you should have computed it except to use in class!Note that an answer that is not between the highest and lowest number is not reasonable!

a) The Geometric Mean.

. At least, not many of you tried to get the answer by dividing 582112000 by 11, but a number of you seem to have convinced your selves that you could take a square root instead of an 11th root.

b) The Harmonic Mean.

. So

Of course many of you decided that

. This is, of course, an easier way to do the problem, but I warned you that it wouldn’t work. . It is equivalent to believing that

c) The Root-Mean-Square.

. So .

Of course many of you decided that . This is, of course, an easier way to do the problem, but I warned you that it wouldn’t work. It is equivalent to believing that .
251y0452 10/07/04

d) (i) Geometric mean using natural logarithms

So .

(ii) Geometric mean using logarithms to the base 10

So .

Notice that the original numbers and all the means are between 1 and 22.

It’s probably more efficient to handle a problem this large in columns. The arithmetic mean is also computed below.

Row

1 1 1.00000 1 0.00000 0.00000

2 3 0.33333 9 0.47712 1.09861

3 12 0.08333 144 1.07918 2.48491

4 15 0.06667 225 1.17609 2.70805

5 22 0.04545 484 1.34242 3.09104

6 7 0.14286 49 0.84510 1.94591

7 7 0.14286 49 0.84510 1.94591

8 5 0.20000 25 0.69897 1.60944

9 2 0.50000 4 0.30103 0.69315

10 10 0.10000 100 1.00000 2.30259

11 1 1.00000 1 0.00000 0.00000

Total 85 3.61450 1091 7.76501 17.8796

7.727270.32859199.1818 0.705910 1.62542

So, as before and .