Parametric Statistical Inference Modeling
- Create box plots for both X and Y. Are there any outliers?
Answer - No outliers identified. See box plot below.
2Make a Scatter plot with Regression. Does there appear to be a linear relationship? What two points appear to be potential outliers?
Answer - Yes there does appear to be a linear relationship with two points, row 18 (X=9, Y=6) and row 19 (X=5, Y=9), representing possible outliers.
3.Check for outliers using the semi-studentized method. Are there any outliers? What are the absolute values of the semi-studentized residuals you identified in question 2?
Answer- No as all semi-studentized residuals has an absolute value less than four. From the two points from question two the absolute semi-studentized values are 1.97819 and 2.06186.
4. Do a check of normality by using a probability plot of the residuals. Include:
a. The null and alternative hypotheses,
b. The p-value of the test,
c. Your decision based on a 0.05 level of significance, and
d. Minitab copy of your plot.
Answer- a. Ho: The residuals come from a normal distribution
Ha: The residuals do not come from a normal distribution
b. p-value is 0.942
c. Since p-value is greater than 0.05 we fail to reject Ho and will conclude the assumption of normality is plausible.
d.
5. Do a check of equal variances by performing a Modified Levene Test. Include:
a. The null and alternative hypotheses,
b. The p-value of the test,
c. Your decision based on a 0.05 level of significance, and
d. Minitab copy of your plot.
Answer a. Ho: The variances are equal
Ha: The variances are not equal
b. The p-value is 0.533
NOTE: Remember that the Levine’s test is more robust against violations to normality than is the F-test making the Levinetests a better overall test of equal variances. The only condition for the Levine test is that the variable being tested is continuous.
c. Since the p-value is greater than 0.05 we conclude that the assumption of equal variances is plausible.
d.
6. Perform a Lack of Fit Test to check if linear regression function is appropriate. Include:
a. The null and alternative hypotheses,
b. The correct F-statistic and p-value of the test,
c. Your decision based on a 0.05 level of significance, and
d. Minitab copy of your ANOVA output.
Answer - a. Ho: The linear regression function is appropriate
Ha: The linear regression function is not appropriate
b. F-statistic is 0.95 and p-value is 0.507
c. Since p-value is greater than 0.05 we fail to reject Ho and concludes plausible that linear regression function is appropriate.
d. Analysis of Variance
Source DF SS MS F P
Regression 1 70.769 70.769 27.67 0.000
Residual Error 18 46.031 2.557
Lack of Fit 7 17.364 2.481 0.95 0.507
Pure Error 11 28.667 2.606
Total 19 116.800
7. Even though you may not have found any assumption violations perform a Box-Cox analysis on Y to see if any transformation is suggested. Include the
a. Estimated and rounded lambda values,
b. The interpretation of this value, and
c. The Box-Cox plot.
NOTE: This can only be done using Minitab Version 15 or higher – i.e. student version 14 does not contain Box-Cox program.
Answer a. Estimated value is 1.07 and rounded lambda is 1.00
b.The rounded value implies one raise Y to power of 1.00 which means no transformation necessary
c.
8.Find Bonferroni joint confidence intervals for Bo and B1 with a 90% family confidence level and include your interpretation of these intervals. You can use the Minitab output to find s{bo} and s{b1}
Answer - With sample size, n, of 20 the degrees of freedom are n-2 or 18. Since interested in two joint intervals, Bo and B1, g is equal to 2 for our Bonferroni correction. Using the equations . From t-table the value for the Bonferrroni multiplier using DF of 18 and 1-α/4 for alpha of 0.10 results in a 2.101 t-statistic. Plugging into the equations:
For Bo:
1.377 +/- 2.101*0.8442
= 1.377 +/- 1.774
= -0.397 <= Bo <= 3.151
For B1:
0.8652 +/- 2.101*0.1645
= 0.8652 +/- 0.3456
= 0.5196<= B1 <= 1.2108
Interpretation: We are 90% confident that both intervals contain the true intercept and slope.
9. Use Minitab to find Bonferroni simultaneous confidence intervals for new X observations of and 10 using a 95% family confidence level. Include your the output and interpretation of these intervals.
9.1 Follow-up question 1: What is the interpretation of the level of confidence for the confidence intervals in the output?
9.2 Follow-up question 2: Can you think of a reason why these new X values might not be reliable?
9.3 Follow-up question 3: Show mathematically how one would use the Minitab output to get the simultaneous level of confidence for new observations.
Answer - Interpretation: We are 95 percent confident in both of the following intervals being correct: that the reading achievement stanine for a reading readiness stanine of 0 would be from -0.687 to 3.441 and the reading achievement stanine for a reading readiness stanine of 10 would be from 7.706 to 12.351
Predicted Values for New Observations
Obs Fit SE Fit 97.5% CI 97.5% PI
1 1.377 0.844 (-0.687, 3.441) (-3.044, 5.798)
2 10.029 0.950 (7.706, 12.351) (5.481, 14.576)X
X denotes a point that is an outlier in the predictors.
Values of Predictors for New Observations
Obs X
1 0.0
2 10.0
9.1 Follow-up 1: The 97.5% level of confidence is how confident we are in any ONE of the intervals being correct.
9.2 Follow-up 2: The range of x-values used in this analysis was from 1 to 9 bringing into consideration the possibility of improper extrapolation of applying the regression equation to values outside this range of x.
9.3 Follow-up 3: This 97.5% level of confidence is found using 1 – α/g = 0.975. For this particular problem we are interested in two simultaneous intervals, or a g = 2. Using algebra to
Find alpha we would get α/g = 0.025 resulting in 0.05 alpha or a 95% simultaneous level of confidence.
NOTE: Software systems by default use α/2 when constructing confidence intervals and is why when solving this equation we do not use α/2 but instead α/g. If one were to use α/2g based on the level of confidence in the output you would “double divide” by 2.
10. What is the value and interpretation of the coefficient of determination? Using the output and correct values show two ways this value can be calculated.
Answer From the output the coefficient of determination, or R-squared, is 60.6% meaning that 60.6 percent of the variation in reading achievement stanines can be explained by reading readiness stanines.
S = 1.59914 R-Sq = 60.6% R-Sq(adj) = 58.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 70.769 70.769 27.67 0.000
Residual Error 18 46.031 2.557
Total 19 116.800
Two possible methods for calculating R-squared are:
1. (SSR/SST)*100% = (70.769/116.8)*100% = 60.6%
2. [1 – (SSE/SST)]*100% = [1 – (46.031/116.8)]*100% = 60.6%
11. From our in class example of Sales-Advertising, the tests results were as follows: the intercept had T = -0.16 and p-value of 0.885; the slope test had T = 3.66 and p-value of 0.035; and the ANOVA test had F = 13.66 and p-value of 0.035. Use Minitab to find this p-values by going to Calc > Probability Distributions and selecting appropriately either T or F. Then select the radio button for “Cumulative Probability”, enter the appropriate degrees of freedom for the test, click the radio button for “Input Constant” and enter in the text box the appropriate value of the test statistic. Click OK. From the output show how one gets from this output to the p-value. Include a copy of the Minitab output for each test.
Answer -Test of Intercept: From the output we would take 0.441524 and multiply by two to get 0.883 which is approximately 0.885 due to rounding.
Cumulative Distribution Function
Student's t distribution with 3 DF
x P(X<=x)
-0.16 0.441524
Test of Slope: From output we would subtract 0.982377 from 1 and then double this result getting 0.017623*2 = 0.035246 which is approximately 0.035
Cumulative Distribution Function
Student's t distribution with 3 DF
x P(X<=x)
3.66 0.982377
F-Test: From this output we would simply subtract 0.96526 from 1 to get 0.03474 which is approximately 0.035
Cumulative Distribution Function
F distribution with 1 DF in numerator and 3 DF in denominator
x P(X<=x)
13.66 0.965626
NOTE: When using T, we need to double the result since the hypothesis test is 2-sided and the T is symmetric. When our t-stat is negative we do not need to subtract from 1. For the F-test this is
Already run as a 2-sided test so no need to double the result, but since cumulative for a positive test statistic we must still subtract from 1.