Exam 2 answers

For each of the following short scenarios, describe briefly and specifically how you would respond to the issue raised:

1. What are the purposes of looking at each of leverage, studentized residuals and Cook’s D?

Use leverage to identify outliers in the independent variable(s). Use studentized residuals to identify outliers residuals from the equation. Use Cook’s D to identify observations that may have a substantial influence on the equation. The equation might look very different because of that observation.

2. You expected from theory, that a particular X variable of the 5 you have in the equation should have a direct relationship with Revenues, your dependent variable. However, when you look at your regression results you see a negative slope with a significant P-value of .002. What problem does this suggest and what would you do about it?

Multicollinearity; drop one of the variables from the equation

3. You are at the final stages of a revenues forecasting project and now need to forecast revenues using values for your independent variables for the next 4 quarters. Specify the two very different ways for obtaining values for these independent variables for the next 4 quarters.

Use the existing data to do a forecast of future data

Using external forecasts from published sources

4. One of your models involves transforming sales by taking a natural log. Why can you no longer use the standard error as a means of choosing a ‘best’ model? What should you use in this case?

Standard error has the same units as the dependent variable and this makes it not comparable.

Adjusted R-squared will always work

5. The normal probability plot in Tools-Data Analysis- Regression gives you a normal probability plot of the dependent variable and it shows the dependent variable to be normal. Is that sufficient to satisfy the normality assumption in regression? If not, what should you do?

It is the residuals that need to be normal; check the residuals for normality.

Data is available in Appendix A on household income for households sampled from 4 regions in a large city in the Western United States.

6. We wish to determine whether or not a significant relationship exists between income and region. (a) Set up specific hypotheses that reflect the type of relationship for situations like this and (b)use Appendix A to help you write up a statement of how confident you are, and (c) Check the assumptions that are needed to do this test.

H0: μ1 = μ2 = μ3 = μ4 vs H1: Not all population means are equal

We can be 99.94% confident that the means differ in some way

Standard deviations are all within a factor of 2 of each other and so we accept constant variability.

Normality may be a problem in all of the regions except the SW.

7. Use the post hoc analysis to help you write up a series of statements detailing how confident you are about which areas have significantly higher incomes than which other areas.

We can be at least 99% confident that NE population mean is larger than population means from the SW and SE.

Appendix B details a seasonal and trend analysis of 20 quarters of revenues from Lowe’s Home Improvement ($Millions)

8a. Describe the type and strength of the trend in Lowe’s revenues.

98.73% of the variability of adjusted revenues can be explained by the strong direct linear trend

b. Describe the seasonal nature of Lowe’s revenues.

From quarter 1 which is about the same as an average quarter, revenues increase sharply to being

14% higher than average in the second quarter. Third quarter sales dip to slightly below average and then

4th quarter sales drop way off to about 11% less than an average quarter.

c. Calculate a seasonal forecast for the 21st quarter (a first quarter) using the results of a and b above..

Using the adjusted revenue trend line the 21st quarter forecast would be 6975.15 ($M).

Since this is a first quarter multiply by 1.014 to get a seasonal forecast of 7072.80 ($M).


Appendix C and following Appendices deal with forecasting revenues for Lowe’s. Definitions for all the variables are found at the top of page 7.

9. Look at Appendix C. Explain why the VIF process is used and what you would do as a next step in this process.

VIF is used to help reduce multicollinearity among the independent variables by identifying variables that high VIF and deleting variables one at a time (rerunning to get new VIF’) until all are under 10. In this case I would remove DPI or CPI and get new VIF’.

10. Look at Appendix D. The dependent variable is seasonally adjusted revenues for Lowe’s Home Improvement.

a. Interpret the numbers 0.5556 and 1.2902 across from GDP. Make sure you use the proper units and interpret the numbers in terms of revenues and GDP.

We can be 95% confident that Adjusted Revenues increase from $0.5556M to $1.2902M for each $billion increase in GDP, holding CPI, unemployment rate and new housing starts constant.

b. Does a relationship exist between revenues and housing starts (HSN1F)? Set up appropriate hypotheses and write a conclusion stating your confidence.

Alt: β4 ≠ 0 v. null: β4= 0 (no relationship).

We can be 98.87% confident that the relationship exists.

c. Interpret the number 1.3617 for HSN1F. Make sure you use the proper units.

An increase of a thousand new housing starts is associated with an increase of 1.3617($mil) holding GDP, CPI, and UNRATE constant.

11. Look at Appendix E. (a) What assumption does this plot address? (b). Does the assumption hold? (c) This plot comes from an equation that has a negative coefficient for MRIME. If you were to use the ladder approach and wanted a better fitting model what would your next step be? Be very specific.

Linearity

No it does not hold there is a definitive U shape.

The slope of MPRIME is negative in the equation and combined with the U shape would suggest going down the ladder of re-expression in either X or Y. I would start with a square root of MPRIME.

12. Look at Appendix F. The dependent variable in the left equation is adjusted revenues and the right is the natural log of adjusted revenues. Being careful of units, interpret the slope for GDP in each equation.

As GDP increases by one percent, adjusted revenues increases by $148.42M, holding prime rate constant.

As GDP increases by one billion dollars, Adjusted revenues increases by .036%, Holding prime rate constant.

13. Look at Appendix G. This plot comes from Model D. What assumption does it address? Does this assumption hold?

Constant variability; does not seem like a wedge shape that represents increasing variability so assumption OK.

14. Look at Appendix H. This plot comes from Model D. What assumption does it address? Does this assumption hold? If you could ask for other statistical evidence, what would you ask for?

Normality of residuals; looks OK but you could use skewness and kurtosis to confirm.

15. Since this data is time series, the D-W statistic is appropriate. In Model D, the D-W statistic was 1.1. What does this say about the independence assumption? How might you make this a better model?

It would show lack of independence; you could try adding some other variables to see if the D-W rises to above 1.3.
16. For the following charts: State whether or not the process is in control or out of control, and describe the reason(s) why using Rules 1, 2 and 3.

OOC 8 in a row below

Rule 3

The following situations call for the use of a control chart. Look at each situation and decide which type of control chart is appropriate for the situation.

17. Customers are surveyed leaving the teller window to find out if their transaction was satisfactory. They sample 50 customers each day and record the number of unsatisfactory transactions. What chart is called for?

P chart

18. A chart is constructed by looking at a sample of 25 applications for a consumer loan. Each application is checked thoroughly to see how many questions on the application are not filled out properly. You are to analyze the data but you have only the number of questions not filled out properly. What chart is called for?

C chart

19. A mail-order clothing retailer is interested in improving the time of its customer service agents who enter customer orders into the computerized record system. A manager records how long it takes to enter an order in a daily sample of 15 customer orders. He does this over the course of one month’s time. . What type of chart should they use?

XBAR and R
Appendix A

Appendix B: 22 quarters of revenues from Lowe’s Home Improvement ($Millions)

Appendix C and Following appendices deal with forecasting revenues for Lowe’s and uses data as follows:

Adjusted –Seasonally adjusted revenues $Millions

GDP – Gross Domestic Product $Billions

DPI – Disposable Personal Income $Billions

CCONF – Consumer Confidence Index

CPI – Consumer price index

MORTG – Mortgage interest rate in percent

MPRIME – Prime interest rate in percent

UNRATE – Unemployment rate in percent

HSN1F – Single Family new housing starts (thousands)

PERMITS – Building permits (thousands)

Appendix C:

Appendix D

Regression output / confidence interval
variables / coefficients / std. error / t (df=15) / p-value / 95% lower / 95% upper
Intercept / -17,697.1195
GDP / 0.9229 / 0.1723 / 5.356 / .0001 / 0.5556 / 1.2902
CPI / 60.2911 / 16.4430 / 3.667 / .0023 / 25.2436 / 95.3386
UNRATE / 399.8557 / 35.7735 / 11.177 / 1.13E-08 / 323.6063 / 476.1051
HSN1F / 1.3617 / 0.4717 / 2.887 / .0113 / 0.3563 / 2.3671


Appendix E

Appendix F

Appendix G

Appendix H

5