IT233: Applied Statistics TIHE 2005 Lecture 07

Simple Linear Regression Analysis

In this section we do an in-depth analysis of the linear association between two variables (called an independent variable or regressor) and (called a dependent variable or response).

Simple Liner Regression makes 3 basic assumptions:

  1. Given a value () of , the corresponding value of is a random variable whose mean (the mean of given the value ) is a linear function of .

i.e. or,

  1. The variation of around this given value is Normal.
  1. The variance of is same for all given values of

i.e. ,for any

Example: In simple linear regression, suppose the variance of when = 4 is 16, what is the variance of when = 5?

Answer: Same

Simple Linear Regression Model:

Using the above assumptions, we can write the model as:

Where is a random variable (or error term) and follows normal distribution with and

i.e.

Illustration of Linear Regression:

Let = the height of father

= the height of son

For fathers whose height is , the heights of the sons will vary randomly. Linear regression states that the height on son is a linear function of the height of father i.e.

Scatter Diagrams:

A scatter diagram will suggest whether a simple linear regression model would be a good fit.

Figure (A) suggests that a simple linear regression model seems OK, though the fit is not very good (Wide scatter around the fitted lines).

Figure (B) suggest that a simple linear regression model seems to fit well (points are closed to the fitted line).

In (C) a straight line could be fitted, but relationship is not linear.

In (D) there is no relationship between and .

Fitting a Linear Regression Equation:

Keep in mind that there are two lines

(True line)

(Estimated line)

Notation:

= Estimate of

= Estimate of

= The observed value of corresponding to

= The fitted value of corresponding to

= = The residual .

Residual: The error in Fit:

The residual, denoted by = , is the difference between the observed and fitted values of . It estimates .

The Method of Least Squares:

We shall find and, the estimates of and , so that the sum of the squares of the residuals is minimum. The residual sum of squares is often called the sum of squares of the errors about the regression line and denoted by . This minimization procedure for estimating the parameters is called the method of least squares. Hence, we shall find and so as to minimize

Differentiating with respect to and and setting the partial derivatives equal to zero, we obtain the equations (called the normal equations).

Which may be solved simultaneously for and .