PREDICTION, , CORRLATION
Prediction:
The fitted regression equation is : . ,
Based on the fitted equation, the predicted value at a specified value is
=The mean of Y at
and
(Since )
Therefore,
.
Note: or achieve their minimum as . That is, we might our best “prediction” in the “middle” of our average range of (. As is far away from , the prediction would be less accurate (since the standard error is getting large).
Thus, a confidence interval for is
, Correlation, and Regression:
(a) :
,
is the ratio of regression sum of square and total sum of square (corrected). is also the ratio of (the distance between model 2 and model 1) and (the distance between data (model 0) and model 1). Since the total sum of square is the sum of the regression sum of square and the residual sum of square
. Large implies the proportion of the total sum of square contributed by the regression sum of square is large. For example, if =0.9, then 90% of total sum of square comes from the regression sum of square. Heuristically, that indicates 90% of .can be explained by . That is, model 2 can fit the data well. In addition, large also implies the regression sum of square is large relative to the residual sum of square. In the above example, the regression sum of square is 9 times larger than the residual sum of square since the residual sum of square contributes 10% of total sum of square (corrected). That is, the distance between model 2 and model 1 is large relative to the variation of the data. As we explain in the previous section, this might imply the slope in the regression is significant. Thus, model 2 might be sensible. is usually recommended as a “useful first thing to look at” in a regression printout.
(b) Correlation:
The correlation coefficient between the covariate X and the response Y is
,
As , then or . That is, implies a significant linear relationship between X and Y. The correlation coefficient is also associated with the regression coefficient .
.
As a positively linear relation.
As a negatively linear relation
As there is no significantly linear relation between X and Y.
Note: measures linear association between X and Y, while measures the size of the change in Y due to a unit change in X. is unit-free and scale-free. Scale change in the data will affect but not
Note: the value of a correlation shows only the extent to which X and Y are linearly associated. It does not by itself imply that any sort of casual relationship exists between X and Y. Such a false assumption has lead to erroneous conclusions on many occations.
Note that is also associated with since
and
,
where
and has the same sign as .
The above equation indicates that large implies strong correlation between the response and the covariate.
Note: (sign of ) R only holds for the simple linear regression .
The correlation between the response Y and the fitted value is
, where .
The derivation of :
Since
,
,
and
,
thus
.
The equation implies large value of also implies the significantly positively linear relation between the observation and the predicted values . In other word, the prediction of is not unrelevant to .
Note: holds not only for the simple linear regression , but also for the multiple linear regression!!
1