Final Exam, STAT2301, Practice

Final Exam, STAT2301, practice

Please do all questions.

1) A 95 percent confidence interval for the mean time taken for a painkiller to provide relief is (15, 20) minutes. This interval can be interpreted to mean that:

only 2.5 percent of all people taking the painkiller are still in pain after 20 minutes.
only 5 percent of all people get relief between 15 and 20 minutes
the probability is .95 that the mean time taken for relief is between 15 and 20 minutes
the probability is .95 that all people get relief in between 15 and 20 minutes
We will claim that the mean time for relief is between 15 and 20 minutes, using a method which gives correct claims 95% of the time.

2) A certain school is concerned about its freshman retention rate. In 2002 the retention rate was 85%. The school implements a new program and in 2009 the retention rate is 87%. Each year’s class is about 1300 students, so there is a margin of error on these values. In fact a 95% confidence interval for the difference in proportions (2009 prop – 2002 prop) is (-1%, +5%). Which of the following conclusions is valid?

As the rate this year is higher than the rate in 2002, we can conclude the new program is a success at raising first year retention rate.
We are 95% sure we have raised the retention rate, although that might be due to something other than the new program.
Since the confidence interval contains zero, the new program has made retention rates lower.
We can’t tell if the new program has been successful in raising retention rates.
There is a 95% chance the new program has been successful in raising retention rates.

3) A pain reliever that has been in use a long time takes 20 minutes on average to be effective. A new pain reliever is tried out on a random sample of 15 patients with post-operative pain. The mean time to relief for these patients is 17.3 minutes with a sample standard deviation of 2.5 minutes. A 5% significance test is to be performed to determine if the mean time for relief for the new painkiller is different from 20 minutes. The calculated value of the test statistic (t) and the critical value from the table (t*) respectively are:

(2.36, 2.262)
(4.18, 2.145)
(16.17, 2.145)
(16.17, 1.761)
(4.18, 1.960)

4)Suppose that two variables X and Y are known to have a correlation of 0.5. Which of the following statements do we know must always be true?

X is normally distributed
There is a 95% chance that Y values will be found within 2 standard deviations of their mean
A regression of Y on X will produce a line with a negative slope
The X variable will have a larger standard deviation that the Y variable
25% of Y’s variation is explained by Y’s linear relationship with X.

5)Which of the following is a false statement about Simpson’s paradox?

a)Simpson’s paradox can occur if you have three categorical variables.

b)Simpson’s paradox is a reversal of association when data are combined.

c)In Simpson’s paradox you have a collapsing variable.

d)Simpson’s paradox concerns outliers in the data.

The following two multiple-choice questions refer to newborn human babies. The weights of the babies in Questions 6 and 7 are normally distributed with a mean of 7.5 pounds and a standard deviation of 1 pound.

6) How much do the lightest 2.5% of these babies weigh?

6.5 pounds or more
5.5 pounds or more
6.5 pounds or less
5.5 pounds or less
8.5 pounds or less

7) What percentage of these babies weigh less than 6.5 pounds?

8) A child’s soccer team has 20 players. Suppose that the average number of goals scored by the children is 4.6 goals/child. However there are two “star” players who have scored 33 and 32 goals. What is average number of goals scored for the other players?

2.67 goals/child
1.5 goals/child
2 goals/child
5 goals/child
3.2 goals/child

The following 4 multiple choice problems relate to the information below on frogs.

We are interested in predicting how far a frog can jump (measured in cm), based on it’s leg length (also in cm). We collect data on 50 frogs, allowing each frog one leap. We find the correlation, r is 0.7, the mean of the leg length is 16cm, with a standard deviation of 2cm, and the mean jumping distance is 20cm, with a standard deviation of 5cm.

9) The equation for the least squares regression line of jumping distance on leg length is

Jumping distance = 10.4+ 0.28 (leg length)
Leg length = -8 + 1.75 (Jumping distance)
Jumping distance = -8+ 1.75 (Leg length)
Jumping distance = -8 + 0.28 (Leg length)
Leg length = 10.4 + 0.28 (Jumping distance)

10)A frog with legs 20cm long would be predicted to jump

31.4cm
25.8cm
29.3cm
15.7cm
27cm

11) What proportion of variation in jumping distances is due to the linear relation to leg length?

49%
90%
0.9%
0.49%
81%

12) The predicted distance jumped by a frog of leg length 4cm is -1cm. The best explanation of this is

Frogs this small can only jump backwards
The researchers made a mistake in measuring distances
The straight line relationship should not be extrapolated for such small frogs
Frogs with legs this small can’t jump at all

We should have done a regression of leg length on distance jumped

13) Suppose we have two events A, and B such that P(A)=0.6, P(B)=0.4 and P(A and B)=0.3. Which of the following statements could be true?

P(not A) = 0.4.
A and B are independent events.
A and B are disjoint.

(i), (ii) and (iii)
(iii) only
(i) and (iii) only
(i) only
(i) and (ii) only

The next six questions relate to the following problem. A couple has three children. Assume that boys and girls are equally likely and that gender is independent from child to child.

14) Let A be the event that the couple has 1st child is a girl. Then P(A) is

(a)1/8

(b)1/3

(d)3/8

(e)1/2

15) Let B be the event that the couple have no girls. Then P(B) is

(a)3/8

(b)1/4

(d)1/8

(e)1

16) Let C be the event that the couple have at least 1 girl and 1 boy. Then P(C ) is

(a)1/2

(b)2/3

(d)1/8

(e)1/4

17) The probability of “A and B” is

(a) 1/4

(b) 1/2

(d) 2/3

(e) 0

18) The probability of “A and C” is

(a) 1/4

(b) 1/2

(d) 2/3

(e) 0

19) Which of the following statements is true?

(a)A is independent of B and independent of C.

(b)A is dependent on B but independent of C.

(c)A is independent of B, but dependent on C.

(d)A is dependent on both B and C.

(e)We do not have enough information to judge dependence of A in relation to B or C.

20) Which of the following variables would definitely not be properly modeled by a Poisson distribution?

(a)The number of goals scored by a child on a soccer team in one game.

(b)The number of cats owned by a randomly chosen household.

(c)The number of radioactive particles emitted in one minute by a lump of uranium.

(d)The number of heads in 10 tosses of a fair coin.

21) If we know we have data that are right-skewed, then what can we say about the relationship between the mean and median of such data?

(a) Mean > median

(b) Mean < median

(d) We can’t be sure which is bigger.

(e) The mean won't exist, but the median will.

22) What can we say about the relationship between the standard deviation (SD) and IQR of a data set?

(a)SD < IQR

(b)SD > IQR

(c)SD=IQR

(d)We can't be sure which is bigger.

(e)S won't exist, but the IQR will.

23) Suppose we have two events A and B, where P(A)=0.3, P(B)=0.8. Which of the following is NOT possible?

P(A| B)=3/8
P(A|B)=0
P(A and B)=0.3
P(A and B)=0.24.

24) Suppose a 95% confidence interval for the true proportion (p) of people who say they approve of Barack Obama as President is (63%, 71%). We may conclude that

The margin of error is plus or minus 3%.
If we took many, many random samples of the same size and from each computed a 95% confidence interval for p, about 95% of these intervals would contain 66%.
There is 95% chance that p is bigger than 63%.
There is 95% chance that p is between 63% and 71%.
We will claim that p is between 63% and 71%.

25) A company claims that its vacuum cleaners last an average of 800 hours. A random sample of vacuum cleaners from the company revealed that their lifetime averaged 740 hours and a 95% confidence interval for the true average was found to be 710 to 770 hours. This interval is interpreted to mean that:

(a)because our specific confidence interval does not contain the value 800 there is a 95% probability that the true average lifetime is not 800 hours.

(b)95% of all vacuum cleaners last between 710 and 770 hours.

(c)if we repeat our survey many times, then about 95% of our confidence intervals will contain the true value of the average lifetime of a vacuum cleaner from this company.

(d)if we were to repeat our survey many times, then about 95% of all the confidence intervals will contain the value 800 hours.

(e)if the study were to be repeated many times, there is a 95% probability that the true average lifetime is 800 hours as the company claims.

26) A demographer, using a random sample of n = 500 people, obtained a 95 percent confidence interval for mean age at marriage () in years for US adults. The CI was (26.4, 27.3). If the analyst had used a 98 percent confidence coefficient instead, the confidence interval would be:

narrower and would be less likely to contain .
wider and would be more likely to contain .
narrower and would be more likely to contain .
It may be wider or narrower, but we know it would be more likely to contain .
narrower but we can’t be sure if it would be more or less likely to contain .

27) The stem and leaf plot below summarizes the final year averages of a graduating honors class in the Dedman College. Select the correct statement.

688

724567

82346

9014

10

The distribution is bimodal
The mean is 95
The IQR is 50
The maximum score is 94.
The median score is 68.

28)Suppose that two variables X and Y are known to have a correlation of 0.5. Which of the following statements do we know must always be true?

a)X is normally distributed

b)There is a 95% chance that Y values will be found within 2 standard deviations of their mean

c)A regression of Y on X will produce a line with a positive slope

d)The X variable will have a larger standard deviation that the Y variable

e)Extrapolation of the regression line will produce nonsense results

29) Which of the following is a correct statement about scatterplots?

a)A scatterplot would be a good tool to use in examining the differences in height of statistics students by sex.

b)Scatterplots tell us if one variable is causing another.

c)Scatterplots can be used to help find an interquartile range.

d)A scatterplot enables us to guess the value of the correlation r.

e)Scatterplots can tell us if we have Simpson’s paradox.

30) A statistics professor used X=” number of class days attended” (out of 45) an independent variable to predict Y=”score received on final exam” for a class of his students. The resulting regression equation was Y=33.4+1.4*X. Which of the following statements is true?

If attendance increases by 1.4 days, the expected exam score will increase by 1 point.
If attendance increases by 1 day, the expected exam score will increase by 33.4 points.
If attendance increases by 1 day, the expected exam score will increase by 1.4 points.
If the student does not attend at all, the expected exam score is 1.4 points.
The expected exam score does not increase if you attend class more.

31) A college at a university has 3 female Professors. The mean and median salaries are $108,700 and $113,733, but it is not known which is which. The highest paid makes $132,000 a year. Which of the following are possible salaries for the other two professors?

a. $100,000 and $110,000

b. $100,000 and $108,700

c. $108,700 and $113,733

d. $80,499 and $108,700

e. $80,367 and $113,733

The following questions concern a study on the treatment of kidney stones. There are two types of treatments, labeled A and B, and in addition patients can be classified by the severity of their condition through the size of the stone (large or small). The following tables categorize cases by stone size, treatment, and whether the treatment was a success.

Small stones / success / not a success
Treatment A / 81 / 6
Treatment B / 234 / 36
Large stones / success / not a success
Treatment A / 192 / 71
Treatment B / 55 / 25

32) The success rates for treatments A and B for small stone patients are respectively;

a)93% and 87%

b)7% and 13%

c)73% and 69%

d)93% and 69%

e)9% and 35%

33) The success rates for treatments A and B for large stone patients are respectively;

93% and 87%
27% and 31%
7% and 13%
73% and 69%
9% and 35%

34) What percentage of small stone patients get treatment A?

a) 50%

b) 75%

c) 76%

d) 44%

e) 24%

35) What percentage of large stone patients get treatment A?

a) 24%

b) 77%

c) 23%

d) 44%

e) 33%

Now suppose we collapse the tables and form a new table as follows

success / Not a success
Treatment A / 273 / 77
Treatment B / 289 / 61

36) The success rates for treatments A and B for all patients are respectively;

a) 50% and 60%

b) 44% and 66%

c) 90% and 50%

d) 78% and 83%

e) 22% and 17%

37) A correct statement about the variables is

a) The collapsing variable is treatment type

b) The collapsing variable is success/no success

c) The collapsing variable is condition of patient (kidney stone size)

d) The explanatory variable is success/no success

e) The response variable is condition of patient (kidney stone size).

38) The correct conclusion to draw here is

a) Condition of patient is not a factor in success of treatment.

b) Yes, treatment B was more successful overall, but it is because it was mostly given to patients in good condition, and treatment A was mostly given to those in poor condition.

c) Yes, treatment A was more successful overall, but it is because it was mostly given to patients in good condition, and treatment B was mostly given to those in poor condition

d) We should give people treatment B, as it had a higher success rate.

e) Kidney stone size has no effect in success of treatment.

Each of the next several statements has just two options, true and false. Circle the one you believe to be correct in each case. Here we say a statement is true if it is ALWAYS true, otherwise we designate it false.

39) T F A median is always lower than a mean

40) T F The standard deviation is sensitive to outliers

41) T F If the data are normally distributed, 95% of the data are within one standard deviation of the mean

42) T F If the P value of a test is 0.03, and we use a significance level of 0.05, we will reject the test

43) T F If A and B are disjoint then P(A| B) is 1

44) T F If A and B are independent then P(A| B) = P(A)

45) T F If P(A)=0.6 and B is some other event then a possible value for P(A and B) is 0.7.

46) T F If P(A|B)=P(B|A) then P(A)=P(B)

47) T F 95% of the time, observations are within two standard deviations of their mean

48) T F Failing to reject the null hypothesis implies the null hypothesis must be true

49) T F If you reject a null hypothesis test, you have made a Type I error

50) T F When we fail to reject a null hypothesis we could have made a Type II error