Classical Linear Regression Model

CLASSICAL LINEAR REGRESSION MODEL

The classical linear regression model is,

Yt = 1Xt1 + 2Xt2 + … + kXtk + t for t = 1,…, T

This indicates we have T-equations, one for each worker in the sample. All workers have the same parameters, but different values of Y and the X’s. We can write the equation for each worker as a system of T equations.

Observation 1Y1 = 1X11 + 2X12 + … + kX1k + 1

Observation 2Y2 = 1X21 + 2X22 + … + kX2k + 2

………………………………………

Observation TYT = 1XT1 + 2XT2 + … + kXTk + T

This system of T equations can be written equivalently in matrix format as follows.

y = X + 

y is a Tx1 column vector of observations on the dependent variable. X is a TxK matrix of observations on the K-1 explanatory variables X2, X3, …Xk. The first column of the matrix X is a column of 1’s representing the constant (intercept) term. The matrix X is called the data matrix or the design matrix.  is a Kx1 column vector of parameters 1, 2 … k.  is a Tx1 column vector of disturbances (errors).

Assumptions

1. The functional form is linear in parameters.

y = X + 

2. The mean vector of disturbances is a Tx1 null vector.

E() = 0

3. The disturbances are spherical. (The variance-covariance matrix of disturbances is a TxT diagonal

matrix).

Cov() = E(T) = 2I

Where superscript T denotes transpose and I is a TxT identity matrix.

4. The disturbance vector has a multivariate normal distribution.

 ~ N

5. The disturbance vector is uncorrelated with the data matrix.

Cov (,X) = 0

6. The data matrix is a nonstochastic matrix.

Classical Linear Regression Model Concisely Stated

The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process described as follows.

y = X + ,  ~ N(0, 2I)

or alternatively

y ~ N(X, 2I)

ESTIMATION

OLS Estimator

For the classical linear regression model, the residual sum of squares function is

RSS(1^, 2^ … k^) = (Yt - 1^ - 2^X12 - … - k^X1k)2

or in matrix format,

RSS(^) = (y - X^)T(y - X^)

The first-order necessary conditions for a minimum are

XTX^ = XTy

These are called the normal equations. If the inverse of the KxK matrix XTX exists, then

I can find the solution vector ^. The solution vector is given by

^ = (XTX)-1XTy

where ^ is a Kx1 column vector of estimates for the K-parameters of the model. This formula is the OLS estimator. It is a rule that tells you how to use the sample of data to obtain estimates of the population parameters.

Properties of the OLS Estimator

The OLS estimator has a multivariate normal sampling distribution. This follows directly from the assumption that the error term has a normal distribution.

Mean of the OLS Estimator

The mean vector of the OLS estimator gives the mean of the sampling distribution of the estimator for each of the K parameters. To derive the mean vector of the OLS estimator, I need to make two assumptions:

1. The error term has mean zero.

2. The error term is uncorrelated with each explanatory variable.

If these two assumptions are satisfied, then it can be shown that the mean vector of the OLS estimator is

E(^) = 

The mean vector of the OLS estimator is equal to the true values of the population parameters being estimated. The OLS estimator is unbiased.

Variance-Covariance Matrix of Estimates

The variance-covariance matrix of estimates gives the variances and covariances of the sampling distributions of the estimators of the K parameters. To derive the variance-covariance matrix of estimates, I need to make four assumptions:

1. The error term has mean zero.

2. The error term is uncorrelated with each explanatory variable

3. The error term has constant variance.

4. The errors are uncorrelated.

If these four assumptions are satisfied, then it can be shown that the variance-covariance matrix of estimates is

Cov(^) = 2(XTX)-1

For the classical linear regression model, it can be shown that the elements in the variance-covariance matrix of OLS estimates is less than or equal to the corresponding elements in the variance-covariance matrix for any alternative linear unbiased estimator; therefore, for the classical linear regression model the OLS estimator is efficient.

Sampling Distribution of the OLS Estimator Written Concisely

^ ~ N(, 2(XTX)-1)

The OLS estimator has a multivariate normal distribution with mean vector  and variance-covariance matrix 2(XTX)-1.

Choosing an Estimator for 2

To obtain an estimate of the error variance, the following estimator is the preferred estimator,

RSS T

2^ =  = 

T – k T – k

Estimating the Variance-Covariance Matrix of Estimates

The true variance-covariance matrix of estimates, 2(XTX)-1 is unknown. The is because the true error variance 2 is unknown. Therefore, the variance-covariance matrix of estimates must be estimated using the sample of data. To obtain an estimate of the variance-covariance matrix, you replace 2 with its estimate 2^ = RSS / (T – K). This yields the estimated variance-covariance matrix of estimates

Cov^(^) = 2^(XTX)-1

HYPOTHESIS TESTING

F-TEST: UNRESTRICTED MODEL APPROACH

One or more linear restrictions can be written in matrix format as R = r, where R is a JxK matrix of zero and/or nonzero numbers;  is a Kx1 vector of parameters; r is a Jx1 vector of zero and/or nonzero numbers; J is the number of linear restrictions being tested; K is the number of parameters in the regression model. The matrix R selects the appropriate parameter(s) from the vector . Each element in the Jx1 vector R defines a linear combinations of parameters. The vector r is a vector of hypothesized values for the J linear combinations. The F-statistic can be written in matrix format as

(R^- r)T[RCov^(^)RT]-1(R^- r)

F =  ~ F(J, T-K)

Cov^(^) is the estimated variance-covariance matrix for the unrestricted model, and all other symbols have been defined above.

WALD TEST

One or more linear and/or nonlinear restrictions can be written in matrix format as R() = r, whereR(), read R function beta, is a Jx1 column vector of linear and/or nonlinear combinations of parameters;  is a Kx1 vector of parameters; r is a Jx1 vector of zero and/or nonzero numbers; J is the number of restrictions being tested; K is the number of parameters in the regression model. The Wald statistic has an approximate chi square distribution and is given by,

W = (R(^) - r)T [GCov^(^)GT]-1 (R(^) - r) ~ 2(J)

All symbols except G have been defined previously. G is a JxK matrix of partial derivatives of the vector R(^); that is,

 R1/1, R1/2, …, R1/k

G = | R2/1, R2/2, …, R2/k |

| ……………………………... |

 RJ/1, RJ/2, …, RJ/k JxK

GENERAL LINEAR REGRESSION MODEL

SPECIFICATION

Assumptions

1. The functional form is linear in parameters.

y = X + 

2. The error term has mean zero.

E() = 0

3. The errors are nonspherical.

Cov() = E(T) = W

where W is any nonsingular TxT variance-covariance matrix of disturbances.

4. The error term has a normal distribution

 ~ N

5. The error term is uncorrelated with each independent variable.

Cov (,X) = 0

Sources of Nonspherical Errors

There are 2 major sources of nonspherical errors.

1. The error term does not have constant variance.

2. The errors are correlated.

Classical Linear Regression Model as a Special Case of the General Linear Regression Model

If the error term has constant variance and the errors are uncorrelated, then W = 2I and the general linear regression model reduces to the classical linear regression model.

General Linear Regression Model Concisely Stated in Matrix Format

The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process described as follows.

y = X + ,  ~ N(0, W) or alternatively y ~ N(X,W)

ESTIMATION

Ordinary Least Squares (OLS) Estimator

The OLS estimator is given by the rule:

^ = (XTX)-1XTy

The variance-covariance matrix of estimates for the OLS estimator is

Cov(^) = 2(XTX)-1

Generalize Least Squares (GLS) Estimator

The GLS estimator is given by the rule:

^GLS = (XTW-1X)-1XT W-1y

The variance-covariance matrix of estimates for the GLS estimator is

Cov(^) = (XTW-1X)-1

To actually use the GLS estimator, we must know the elements of the variance-covariance matrix of disturbances, W. That means that you must know the true values of the variances and covariances for the disturbances. However, since you never know the true elements of W, you cannot actually use the GLS estimator, and therefore the GLS estimator is not a feasible estimator.

Feasible Generalized Least Squares (FGLS) Estimator

To make the GLS estimator a feasible estimator, you can use the sample of data to obtain an estimate of W. When you replace true W with its estimate W^ you get the FGLS estimator. The FGLS estimator is given by the rule:

^FGLS = (XTW-1^X)-1XT W-1^y

The variance-covariance matrix of estimates for the GLS estimator is

Cov(^) = (XTW-1^X)-1

FGLS Estimator as a Weighted Least Squares Estimator

The FGLS estimator is also a weighted least squares estimator. The weighted least squares estimated is derived as follows. Find a TxT transformation matrix P such that μ* = Pμ, where μ* has variance-covariance matrix Cov(μ*) = E(μ* μ*T) = σ2I. This transforms the original error term μ that is nonspherical to a new error term that is spherical. Use the matrix P to derive a transformed model.

Py = PXβ + Pμ or y* = X*β + μ*

where y* = Py, X* = PX, μ* = Pμ. The transformed model satisfies all of the assumptions of the classical linear regression model. The FGLS estimator is the OLS estimator applied to the transformed model. Note that the transformed model is a computational device only. We use it to obtain efficient estimates of the parameters and standard errors of the original model of interest.

SEEMINGLY UNRELATED REGRESSIONS MODEL

SPECIFICATION

Assume that you have M-equations that are related because the error terms are correlated. This system of M seemingly unrelated regression equations can be written in matrix format as follows.

y1 = X11 + 1

y2 = X22 + 2

y3 = X33 + 3

yM = XMM + M

Using more concise notation, this system of M-equations can be written as

yi = Xii + ifor i = 1, 2, …, M

Where yi is a Tx1 column vector of observations on the ith dependent variable; Xiis a TxK matrix of observations for the K-1 explanatory variables and a column vector of 1’s for the ith equation (i.e., the data matrix for the ith equation; i is the Kx1 column vector of parameters for the ith equation; and i is the Tx1 column vector of disturbances for the ith equation.

You can view this system of M-equations as one single large equation to be estimated. To combine the M-equations into one single large equation, you stack the vectors and matrices as follows.

y1X1  11

| y2 || X2 0 || 2 || 2 |

| y3 || X3 || 3 | | 3 |

| . |==| . | | . | + | . |

| . || . || . || . |

| yM|| 0 . || M | | M |

  XM   

(MT)x1 (MT)x(MK) (MK)x1 (MT)x1

This single large equation (henceforth called the “big equation”) can be written more concisely as

y = X + 

Where y is a (MT)x1 column vector of observations on the dependent variables for the M-equations; X is a (MT)x(MK) matrix of observations on the explanatory variables; with the columns of 1’s, for the M-equations;  is a (MK)x1 column vector of parameters for the M-equations; and  is a (MT)x1 column vector of disturbances for the M-equations. The specification of the SUR model is defined by the following set of assumptions.

Assumptions

1. y = X + 

2. E() = 0

3. Cov() = E(T) = W = I

4.  ~ N

5. Cov (,X) = 0

The Variance-Covariance Matrix of Errors

The SUR model assumes that the variance-covariance matrix of disturbances for the big equation has the following structure.

W = I

The sigma matrix, , is an MxM matrix of variances and covariances for the M individual equations

11 12 …….. 1M 

| 21 22 …….. 2M |

 ==| . . . |

| . . . |

 M1 M2 …….. MM MXM

where 11 is the variance of the errors in equation 1, 22 is the variance of the errors in equation 2, etc; 12 is the covariance of the errors in equation 1 and equation 2, etc. The identity matrix, I, is a TxT matrix with ones on the principal diagonal and zeros off the principal diagonal,

 1 0 …….. 0 

| 0 1 …….. 0 |

I ==| . . . |

| . . . |

 0 0 ……… 1  TxT

The symbol  is an operator called the Kronecker product. It tells you to multiply each element in the matrix  by the matrix I. The result of the Kronecker product is the (MT)x(MT) matrix of disturbances for the big equation

11I 12I …….. 1MI 

| 21I 22I …….. 2MI |

| 31I 32I …….. 3MI |

W == I ==| . . |

| . . |

 M1I M2I …….. 2MI  (MT)x(MT)

Seemingly Unrelated Regression Model Concisely Stated in Matrix Format

The sample of MT multivariate observations are generated by a process described as follows.

y = X + ,  ~ N(0, I ) or alternatively y ~ N(X,I )

ESTIMATION

Feasible Generalized Least Squares (FGLS) Estimator

The FGLS estimator is given by the rule:

^FGLS = (XTW-1^X)-1XT W-1^y or equivalently ^FGLS = [XT (-1^I) X]-1XT (-1^I) y

Estimating W

The most often used method for estimating W is Zellner’s method. When Zellner’s method is used to estimate W the FGLS estimator is called Zellner’s SUR estimator. To obtain an estimate of W using Zellner’s method you proceed as follows.

Estimate each of the M-equations separately using OLS.
Use the residuals from the OLS regressions to obtain estimates of the variances and covariances of the disturbances for the M-equations. The estimators are:

i^Ti^ i^Tj^

ii^ =  and ij^ = 

T T

Where ii^ is the estimate of the error variance for the ith equation; ij^ is the estimate of the covariance of errors for the ith and jth equations; i^ is the vector of residuals for the ith equation; j^ is the vector of residuals for the jth equation; and T is the sample size.

Use the estimates of the variances and covariances from step 2 to form an estimate of the MxM matrix .
Construct the TxT identity matrix I.
Apply the formula W^ = ^I to obtain an estimate of the variance-covariance matrix of disturbances for the big equation.

Once you have the estimate of W, you can use the sample data and the rule ^FGLS = (XTW-1^X)-1XT W-1^y

to obtain estimates of the parameters.

Iterated Feasible Generalize Least Squares (IFGLS) Estimator

The steps involved in using the ISUR estimator are as follows.

Estimate the parameters of the big equation using Zellner’s SUR estimator described above.
Use the parameter estimates from this regression to compute the residuals for each of the M-equations.
Use the residuals to obtain new estimates of the variances and covariances of the disturbances for the M-equations, and therefore a new estimate of  and W.
Use the new estimate of W to repeat step 1 and obtain new parameter estimates.
Repeat steps 2, 3, and 4. (Each time you obtain new parameter estimates this completes an iteration).

SIMULTANEOUS EQUATIONS REGRESSION MODEL

IV ESTIMATOR

The IV estimator formula is

IV^ = (ZTX)-1ZTy

where here X is the TxK data matrix for the original right-hand side variables; Z is the TxK data matrix for the instrumental variables; y is the Tx1 column vector of observations on the dependent variable in the equation to be estimated.

2SLS ESTIMATOR

The 2SLS estimator is given by the rule,

^2sls = (XTPX)-1XTPy where P = Z(ZTZ)-1ZT is called the projection matrix

Note that Z is now a TxI matrix, where I is the number of instruments (identifying and other). If the equation is exactly identified, then I = K. If the equation is overidentified, then I > K. If the error term has constant variance and the errors are uncorrelated, then the variance-covariance matrix of estimates is,

cov(^2sls) = σ2(XTPX)-1

The estimated variance-covariance matrix replaces unknown σ2 with the estimate σ2 = RSS/T.

GENERAL METHOD OF MOMENTS ESTIMATOR

The structural equation to be estimated is y = Xβ + u. Assume errors are uncorrelated but may have

non-constant variance. There are I instruments. The data matrix X is TxK. The matrix of instrumental variables Z is TxI. Assume the instruments, I, are exogenous (not correlated with the error term). In the population,

Cov(Z,u) = E[ZTu] = E[ZT(y – Xβ)] = 0

Require this to be true in the sample. Replace the expectations operator with the average operator,

(1/T)[ZT(y – Xβ^)] = 0

This is a system of I equation with K unknown parameters, β^. If I = K the equation is exactly identified, there are as many equations as unknown parameters, and a unique solution exists for β^. If I > K the equation is overidentified, there are more equations than unknown parameters, and a unique solution does not exist for β^. To find a unique solution , apply weights to the instrumental variables. M is an IxI matrix of weights. Find the weighting matrix M that produces asymptotically efficient estimates of β^. This is,

M = [(1/T) (ZTWZ)]-1

W is the TxT variance-covariance matrix of errors. The elements on the principal diagonal are the unknown variances for the T observations. The elements off the principal diagonal are zeros by assumption that the errors are not correlated. The GMM estimator is,

^GMM = (XTZMZTX)-1XTZMZTy

The two-step GMM estimator:

Step #1: Estimate the equation using 2SLS. Save the residuals. Square the residuals. Use the squared residuals to obtain an estimate of W. The squared residuals are estimates of the unknown

variances on the principal diagonal of W. Use the estimate of W to obtain an estimate of M.

Step #2: Apply the GMM estimator rule: ^GMM = (XTZMZTX)-1XTZMZTy.