Final Exam with Answers

MS & IS 200.21

Final Exam—with answers

A casino manager is interested in determining if a roulette wheel is balanced. For those unfamiliar with roulette, there are 38 spaces (18 red spaces, 18 black spaces, and two green spaces) on the wheel and players have the opportunity to bet on a number of outcomes, one being the individual number. A truly balanced wheel should have should produce outcomes that resemble theoretically equal frequencies (i.e. each number should have a frequency of 1/38 or 2.5%). We are interested in the number of times the number 15 appears (vs. "non-15" occurrences) (questions 1-4).

1. What type of data do we have?

Numerical

Categorical

Continuous

Statistical

The type of data is categorical. The outcome is defined as either 15 or not-15, which mean the outcomes is only two groups.

2. After 100 spins, the number 15 appeared 5 times. Is there a cause for alarm?

Yes, the wheel is unbalanced

No, there is the possibility of sampling error.

Yes, 15 always happens at most 3 times per 100

No, theoretically probability does not apply to gaming. We should use empirical probabilities.

The “2.5%” is a theoretical odds created from the simple event (15) over all possible simple events (38 in all). That 2.5% should be observed in the population of spins, and 100 spins is just a sample.

3. What would be the appropriate null hypothesis to determine if the wheel is balanced?

HO: 1 =2.5

HO: Π = 15

HO: μ = 15

HO: Π = 2.5

We can test to see if the observed proportion differs (statistically) from the population proportion. The “status quo” in this case is 2.5%.

4. What would be the appropriate test statistic to determine if the wheel is balanced?

One sample t test of the mean

One sample t test for proportion

One sample z test of the mean

One sample z test for proportion

One variable, with a proportion (always use z with proportion)

5. Compared to a t-distribution, a Z-distribution has:

A larger mean, depending on the df of the t-distribution

An equal mean, regardless of the df for the t-distribution

A larger standard deviation

The same standard deviation

Both have a mean of zero, however the t-distribution has a greater standard deviation (based on df) than the z-distribution.

6. Parameter is to statistic as:

Sample is to Population

Observed is to Unobserved

Population is to Sample

Estimate is to Actual

Remember, statistics estimate parameters. If the statistic is unbiased, then the expected value of the statistic is the parameter.

7. If the variance of a distribution is 0, the range is 0

True

False

The variance is defined as the average of the squared deviations from the mean. If the values do not vary from the mean, then the values are all the same, and thus the range is zero.

8. A distribution with mode < median < mean is negatively skewed?

True

False

This question describes a positive skewed distribution.

Last year's final produced a distribution with the following:

Mean of 80, σ2 of 25 (assumed to be normal) (questions 9-14).

9. What would be your standard score if you received a 75?

-2

–1

z= (75-80) / 5 (the value 25 represents variance, not the standard deviation).

10. What is the approximate proportion of people who scored over 90?

p(Z > 2)

11. What is the probability that we can select someone who scored between 75 and 85?

100%

p(-1 < Z < 1); from the empirical rule

12. What is the probability that we would select someone below 75 AND above 85?

31.8%

2.5%

68.13%

Trying to remember my thought processes in the spring, I believe I tried to make this a trick question, and the answer is 0%, for you cannot have some exists in both of the described events. I will not put a question of this type on the final.

13. If we took a sample 100, what is the probability that we would observe a mean less than 79.3?

z = (79.3 – 80) / (5/100)

14. Suppose we sampled 25 students from this year. The mean was measured at 77. What is a 90% confidence interval for the true population mean?

75.3, 78.7

68.5, 85.5

76.6, 77.3

75.6, 78.3

77  1.645(5/25); we can use the z-critical for we have the population standard deviation.

15. The correlation between "Amount spent on shoes within the last year" and "Annual salary" is the multiplicative inverse of the correlation between "Annual salary" and "Amount spent on shoes within the last year."

True

False

Correlation is a “symmetric” measure, due to the standard scale.

16. If the range of a distribution is 0, then the standard deviation of the same distribution is also 0.

True

False

Refer to question 7

17. A sampling distribution created with "n = 10" has a larger variance than a sampling distribution created with the same original variable with "n = 30."

True

False

Standard deviation of a sampling distribution is sigma/n

18. If Θ is equal to a value less than , then Σ (Xi - Θ) < Σ (Xi - )

True

False

We would get more positive deviations than negative deviation, thus the sum would be positive if Θ is less than the mean. The right side of the comparison is zero (definition of the mean).

19. If we multiply the variance of a variable (call it "Y") by one less than the number in the sample, we would have:

Multiplying the variance by the degrees of freedom give you the numerator of the variance formula, which is the sum of squares total

20. A sampling distribution created with "n = 10" has a mean smaller than a sampling distribution created from the sample original variable with "n = 30."

False

True

Both have the same mean, which is equal to the population mean.

A researcher read the literature pertaining to the number of movies watched in a movie theatre during a certain year in college. Through her investigation, she believed that neither upperclassmen nor underclassmen watched more movies. She decided to test this belief by sampling 23 upperclassmen and 19 underclassmen. The statistics appear below:

Upperclassmen, mean of 4.3 movies (stdev of 1.1)

Underclassmen, mean of 6.6 movies (stdev of 1.7)

(use for questions 21-27)

21. The researcher's belief before she collected data constitutes:

Research hypothesis

Null hypothesis

Alternative hypothesis

Conclusion

22. These are the variables and classifications for this study.

Class Standing (numerical); number of movies (categorical)

Upperclassmen (categorical); underclassmen (categorical)

Upperclassmen (categorical); underclassmen (categorical); movies (numerical)

Class standing (categorical); number of movies (numerical)

There are two variables (answer C gives three variables). The two variables are not “upperclassmen” and “underclassmen”; those are levels of the categorical variable. The independent variable is class standing (which is categorical, dichotomous) and the dependent is number of movies.

23. If the researcher wanted to use a hypothesis testing, what would be the appropriate null hypothesis?

HO: μ1 = μ2

HO: Π1 = Π2

HO: 1 = 2

HO: Π = 6.6

Two groups, and we have to assume the groups are the same. We have a numerical dependent, thus we are comparing population means.

24. Which is the appropriate test statistic?

Paired sample Z test

Regression analysis

Independent samples t test for mean difference

Independent samples Z test for mean differences

We have two independent groups (the same person is not measured twice). It is a t-test for we do not know the population standard deviations.

25. If α = .05, what would be the appropriate critical value (two tail alternative)?

2.021

1.68

1.96

2.009

The df = 23+19-2 = 40.

26. Instead of a test statistic, the researcher wanted to use a confidence interval (and be as "precise" as she was when she used a test statistic). What is the level of confidence?

Assuming that alpha = .05, level of confidence = 1 – alpha.

27. What is the value of the point estimate?

-2.3

The difference of the observed means.

28. In every hypothesis testing situation, to find a critical value we need to know all of the following except:

alpha

sample size

alternative hypothesis

test statistic

Alpha let you know “how much”; the Ha tells you where; the sample size for df (for the t-critical); the test statistic is only used for comparison.

29. A Z-distribution is a probability distribution created under the assumption of the:

Null hypothesis

Alternative hypothesis

Research hypothesis

Conclusion

Under the assumption of the Ho; repeating samplings of the same population will have an expected mean of zero.

30. If 76% of cows produce high protein milk, what is the standard deviation of this population?

Answer not provided. The correct answer is (.76)(1-.76).

The marketing department at Pepsi is very interested in knowing if there is a difference between males and females who prefer the soft drink. The following is a contingency table created after a survey was undertaken (Questions 31-38).

Males / Females
Prefer Pepsi / 147 / 186
Do not prefer Pepsi / 134 / 156

31. How many categorical variables are in this study?

The two variables are Gender and Drink preference (not “males” and “females”; they are levels of the Gender variable).

32. In creating a test statistic, the denominator includes a term called "pie-hat." Why is "pie-hat" necessary?

There is only one population, hence only one proportion

The assumption of the null hypothesis states the proportions are equal, and hence have the same variance

The assumption of the alternative state that the proportions are the same; hence this is the hypothesized proportion.

Pie-hat is the variance of the hypothesized difference of the two proportions.

For the two categorical variables to have the same variance (an assumption of the two-sample test),  would need to be the same for both groups.

33. What is the value of this "pie-hat"?

.523

.544

.533

.535

Correct answer is .5345 (147 + 186) / (281 + 342)

34. What is the appropriate null hypothesis?

HO: μ1 = μ2

HO: Π1 = Π2

HO: 1 = 2

HO: p1 = p2

The difference between two population proportions.

35. If α = .02, what is the value of the critical value (two tail test)?

1.64

1.96

2.326

2.575

z-value due to proportion

36. What is the appropriate conclusion based on the data provided (and α = .02)?

Men like Pepsi more

Women like Pepsi more

Both like Pepsi equally as well

More information is needed to answer this question.

The test statistic is = -.082

You would fail to reject the null hypothesis.

37. What is the value of the point estimate for a 98% confidence interval?

.021

.544

It can either be positive or negative, depending on who you define as group 1 and group 2 (in this case, females were defined as group 1)

38. What is the proper confidence interval for the Pepsi example?

-.043 , .043

0 , .021

-.057 , .099

.021 , .099

You are not responsible for a confidence interval for the difference in population proportions.

39. The variance of a distribution can never be smaller than the standard deviation.

True

False

For all categorical distributions, the variance is smaller than the standard deviation.

40. A type II error is when we decide the null hypothesis is ____ when it is actually _____.

True; True

False; True

False; False

True; False

The “opposite” error than type I.

A pizza maker believed that he invented a way of making better tasting pizza. He made 40 pizzas the original way and 40 pizzas. Respondents were to rate the pizza on a continuous scale from 1 to 50 (Questions 41-42).

41. What is the null hypothesis for this study?

HO: μ1 > μ2

HO: μ1 = μ2

HO: 1 = 2

HO: 12

Independent variable is type of pizza (old is group 1 and new is group 2). The dependent variable is the rating of the pizza.

42. What is the most appropriate alternative hypothesis for this study (meaning, the alternative hypothesis that conforms to the research hypothesis)?

HO: μ1 > μ2

HO: μ1 ≠ μ2

HO: 1 ≠ 2

HO: 12

The “higher” (being closer to 1) ratings on the pizza should go to group 2, and therefore have a lower mean rating.

The power company will estimate the electricity bill certain months based on the size of the home. The correlation between monthly use (as indicated on the bill) and size of the home is .68 (Questions 43-47).

43. What type of relationship between exists between usage and size of the home?

Positive

Negative

Inverse

No existent

Look at the sign of the correlation coefficient.

44. What is the response variable for this problem?

Electricity Bill

House Size

Houses

Usage per square foot

Response is the dependent variable, the one you are trying to predict.

A sample of 51 houses was used for computing the correlation. Here are the other statistics from the survey:

Mean electricity bill = $67.34, variance of $100

Mean house size = 2300 square feet, variance of 900 square feet

45. What is the regression coefficient (slope) to predict the electricity bill?

.227

.075

6.12

2.04

b1 = r(Sy/Sx) or .68(10/30)

46. If ei is the difference between the actual bill (yi) and the predicted bill (ŷi), what is the sum of all 51 ei's?

67.34

2300

The regression equation is a “substitution” for the mean, and by definition, the sum of the deviation scores for a mean equal 0.

47. In question 46, the predicted bill was found using the regression slope (b1) in question 45 and the subsequent intercept (b0). Say we pick two other values to represent b1 and b0 to create a "new" regression equation. The two values that we picked are smaller (closer to zero) than the originals. How does the "new" ei's compare to the original ei's?

New are smaller

New are larger

They are both the same

Need more information to answer this question.

Disregard this question. Know that any other values of b0 and b1 will have a sum of the error scores “non-zero”

48. If SST is 1548 and SSE is 483, what is R2?

31.2%

68.8%

14.5%

45.3%

R2 = SSR/ SST. SSR (not given) = SST – SSE

SSR = 1548 – 483 = 1065

49. Covariance, correlation and the regression coefficient are measures of association between two variables. What two attributes do these the measures have in common?

Effect size and strength of relationship

Strength and direction

Effect size and direction

Strength of relationship and prediction

While the regression coefficient gives you “effect size” (the change in y per unit change in x), covariance and correlation do not.

50. For process data, ____ is the key statistic for determining "randomness" and ___ is they key statistic for determining if the process is "in control."

Mean; mean

Standard deviation; mean

Standard deviation; Standard Deviation

Mean; Standard deviation