Name:

Math 17 Section 02/ Enst 24 – Introduction to Statistics

ThirdMidterm ExamPRACTICE 1

Instructions:

  1. Show all work. You may receive partial credit for partially completed problems.
  2. You may use calculators and a one-sided sheet of reference notes, as well as the provided tables (t,chi-square). You may not use any other references or any texts.
  3. You may not discuss the exam with anyone but me.
  4. Suggestion: Read all questions before beginning and complete the ones you know best first. Point values per problem are displayed below if that helps you allocate your time among problems.
  5. Use 4 decimal places for calculations involving proportions.
  6. You MAY NOT use a calculator to do more than the standard arithmetic functions, exponents, and square roots. I.E. You may not use t-test functions, regression functions, and the like.
  7. Good luck!

Problem / 1 / 2 / 3 / 4 / Total
Points Earned
Possible Points / 50

1. A student who attends college in Atlanta, Georgia, and flies home for the holidays decides to investigate an airline's claim that "as distance to the destination increases, our fare increases". The student collects distance and fare data from one-way flights from Atlanta to many other cities, and then analyzes the data generating a graph and output shown. Use the student's work to answer the questions below.

a. In order to investigate the airline's claim, which variable should be the response variable?

Partial Rcmdr Output:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 177.21452 19.99315 8.864 1.43e-07

distance 0.07862 0.02037 3.859 0.00139

Residual standard error: 41.82 on 16 degrees of freedom

Multiple R-squared: 0.482,Adjusted R-squared: 0.4496

b. What is the numerical value of the correlation between distance and fare? Interpret this value.

c. What is the equation of the least squares line generated by the student?

d. If reasonable, predict the fare for a flight from Atlanta to a city that is 1000 miles away. If not reasonable, explain why not in one sentence.

e. Does the regression line fit well? Explain briefly.

f. Is there evidence to support the airline's claim that "as distance to the destination increases, our fare increases"? Perform an appropriate test at a .01 significance level, reporting your hypotheses, test statistic, p-value, and conclusion in context. (Assumptions will be checked below).

Null:Alternative:

Test statistic:p-value:

Conclusion:

g. The student generates basic diagnostic plots to help with checking the regression assumptions. For each graph, state what assumption(s) it can be used to check, then comment on whether that assumption checks out.

Used to check:

Comment:

Used to check:

Comment:

2. A study was conducted to assess the effectiveness of a new antibiotic treatment for strep throat in children. Children 6 to 14 who met the entry criteria were randomized to one of three treatment groups. Group 1 was given standard treatment 1, group 2 was given standard treatment 2, and the last group (group 3) was given the new antibiotic treatment. (The 2 standard treatments are different.) The response measured on each child was the number of days to cure the strep infection. Use the partial ANOVA table provided to answer the questions below.

Source / SS / Df / MS / F / p-value
Treatment / 38.364 / 19.182 / .027
Residuals / 141.273 / 30 / 4.709 / - / -
Total / 179.636 / 32 / - / - / -

a. This output would be generated in order to test what set of hypotheses?

Null:Alternative:

Assume for the parts below that the ANOVA conditions are satisfied.

b. Provide the missing value of the treatment df and the F test statistic.

c. What is your best estimate of the common population variance assumed by the ANOVA? (Provide a numerical value.)

d. Using a .05 significance level, what is your conclusion (in context) for this ANOVA?

e. The following pairwise confidence intervals were generated using Tukey's multiple comparisons methods. If appropriate, use the intervals to summarize the differences. If not appropriate, explain why not.

Estimate LwrUpper

2-1 1.45 -.83 3.74

3-1 -1.18 -3.46 1.10

3-2 -2.64 -4.92 -.36

3. A random sample of 337 college students was asked whether or not they were registered to vote. We wonder if there is an association between a student's sex and whether the student is registered to vote. The data collected is provided in the table below. Use the table to address the following questions.

Men / Women / Total
Registered / 104 / 147 / 251
Not Registered / 33 / 53 / 86
Total / 137 / 200 / 337

a. If a randomly selected student was chosen from this sample, what is the probability a male student was selected?

b. If a randomly selected female student was chosen from this sample, what is the probability the student is not registered to vote?

c. What test should you perform to determine if there is an association between sex and whether or not a student is registered to vote? (Be specific.)

d. Determine the expected counts for your chosen test and write them in the table in parentheses after the observed counts.

e. State and check the conditions necessary for your chosen test.

f. The chi-square test statistic is .249. What distribution does the test statistic have assuming the null hypothesis is true?

g. What can you say about the p-value for your test?

h. State the conclusion for your test (in context).

4. A student wants to investigate the "famous" Fisher iris data set and determine whether or not petal lengths of three different iris species differ on average. The data set contains 50 observations from each of 3 species of iris. A boxplot of the data is shown at right.

a. If the student performed an ANOVA, would it be balanced or unbalanced?

b. Should the student perform an ANOVA? Explain why or why not.

General true/false or fill-in the blank questions.

c.F distributions are skewed right.TrueFalse

d. Correlation implies causation.TrueFalse

e.Assume that you will perform better on an exam if you get more sleep. Then, the random variables – exam score and sleep time – are independent.

TrueFalse

f. One of the ANOVA assumptions is that the population of all the responses is normally distributed.

TrueFalse

g. Correlation detects all forms of association between 2 quantitative variables. TrueFalse

h. If the distribution-related assumptions for ANOVA are not met, you can use a Kruskal-Wallis test, which is an example of a ______test.