JPLPage 130/10/2018

M2R Economie internationale, développement, transition Année 2010-2011

EXAMEN D’ECONOMETRIE APPLIQUEE

Question 1

. use "C:\Documents and Settings\Administrador\Mis documentos\International Economics I\Econometrics\Laffargue\exam\mus06

> data.dta", clear

. describe ldrugexp hi_empunion totchr age female blhisp linc

storage display value

variable name type format label variable label

------

ldrugexp float %9.0g log(drugexp)

hi_empunion byte %8.0g Insured thro emp/union

totchr byte %8.0g Total chronic cond

age byte %8.0g Age

female byte %8.0g Female

blhisp float %9.0g Black or Hispanic

linc float %9.0g log(income)

. sum ldrugexp hi_empunion totchr age female blhisp linc

Variable | Obs Mean Std. Dev. Min Max

------+------

ldrugexp | 10391 6.479668 1.363395 0 10.18017

hi_empunion | 10391 .3796555 .4853245 0 1

totchr | 10391 1.860745 1.290131 0 9

age | 10391 75.04639 6.69368 65 91

female | 10391 .5797325 .4936256 0 1

------+------

blhisp | 10391 .1703397 .3759491 0 1

linc | 10089 2.743275 .9131433 -6.907755 5.744476

We can see that the variable linc is contains missing observations. We can also see that the average age of individuals in the sample is 75 years, and that less than 50% of them have a complementary insurance. More than half of them are females. Proportion of blacks and Hispanics is not that high at all.

Question 2

To know if there are missing observations we use codebook command and then we the drop missing ones:

. codebook linc

------

linc log(income)

------

type: numeric (float)

range: [-6.9077554,5.7444763] units: 1.000e-09

unique values: 6914 missing .: 302/10391

mean: 2.74328

std. dev: .913143

percentiles: 10% 25% 50% 75% 90%

1.79176 2.2327 2.74316 3.31506 3.79928

. drop if linc==.

(302 observations deleted)

. des ssiratio lowincome firmsz multlc

storage display value

variable name type format label variable label

------

ssiratio float %9.0g SSI/Income ratio

lowincome byte %8.0g Low income

firmsz float %9.0g Firm size

multlc byte %8.0g Multiple locations

. sum ssiratio lowincome firmsz multlc

Variable | Obs Mean Std. Dev. Min Max

------+------

ssiratio | 10089 .5365438 .3678175 0 9.25062

lowincome | 10089 .1874319 .3902771 0 1

firmsz | 10089 .1405293 2.170389 0 50

multlc | 10089 .0620478 .2412543 0 1

We can see that the variable lowincome is not that high, meaning that the status lowi ncome is rather represents a very small proportion of the observations. We can also see that on average the ssiratio is not that high, meaning that there is not a very high income constraint. We also find that the size of the firms were the individuals are employed, on average are rather small and not operating in much locations.

Question 3

. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio ), first robust

. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio ), first robust

First-stage regressions

------

First-stage regression of hi_empunion:

OLS estimation

------

Estimates efficient for homoskedasticity only

Statistics robust to heteroskedasticity

Number of obs = 10089

F( 6, 10082) = 119.18

Prob > F = 0.0000

Total (centered) SS = 2382.242839 Centered R2 = 0.0761

Total (uncentered) SS = 3856 Uncentered R2 = 0.4292

Residual SS = 2201.062524 Root MSE = .4672

------

| Robust

hi_empunion | Coef. Std. Err. t P>|t| [95% Conf. Interval]

------+------

totchr | .0127865 .0036655 3.49 0.000 .0056015 .0199716

age | -.0086323 .0007087 -12.18 0.000 -.0100216 -.0072431

female | -.07345 .0096392 -7.62 0.000 -.0923448 -.0545552

blhisp | -.06268 .0122742 -5.11 0.000 -.08674 -.0386201

linc | .0483937 .0066075 7.32 0.000 .0354417 .0613456

ssiratio | -.1916432 .0236326 -8.11 0.000 -.2379678 -.1453186

_cons | 1.028981 .0581387 17.70 0.000 .9150172 1.142944

------

Included instruments: totchr age female blhisp linc ssiratio

------

F test of excluded instruments:

F( 1, 10082) = 65.76

Prob > F = 0.0000

Angrist-Pischke multivariate F test of excluded instruments:

F( 1, 10082) = 65.76

Prob > F = 0.0000

Summary results for first-stage regressions

------

(Underid) (Weak id)

Variable | F( 1, 10082) P-val | AP Chi-sq( 1) P-val | AP F( 1, 10082)

hi_empunion | 65.76 0.0000 | 65.81 0.0000 | 65.76

NB: first-stage test statistics heteroskedasticity-robust

Stock-Yogo weak ID test critical values for single endogenous regressor:

10% maximal IV size 16.38

15% maximal IV size 8.96

20% maximal IV size 6.66

25% maximal IV size 5.53

Source: Stock-Yogo (2005). Reproduced by permission.

NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

Underidentification test

Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)

Ha: matrix has rank=K1 (identified)

Kleibergen-Paap rk LM statistic Chi-sq(1)=138.02 P-val=0.0000

Weak identification test

Ho: equation is weakly identified

Cragg-Donald Wald F statistic 183.98

Kleibergen-Paap Wald rk F statistic 65.76

Stock-Yogo weak ID test critical values for K1=1 and L1=1:

10% maximal IV size 16.38

15% maximal IV size 8.96

20% maximal IV size 6.66

25% maximal IV size 5.53

Source: Stock-Yogo (2005). Reproduced by permission.

NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

Weak-instrument-robust inference

Tests of joint significance of endogenous regressors B1 in main equation

Ho: B1=0 and orthogonality conditions are valid

Anderson-Rubin Wald test F(1,10082)= 22.12 P-val=0.0000

Anderson-Rubin Wald test Chi-sq(1)= 22.13 P-val=0.0000

Stock-Wright LM S statistic Chi-sq(1)= 20.71 P-val=0.0000

NB: Underidentification, weak identification and weak-identification-robust

test statistics heteroskedasticity-robust

Number of observations N = 10089

Number of regressors K = 7

Number of endogenous regressors K1 = 1

Number of instruments L = 7

Number of excluded instruments L1 = 1

IV (2SLS) estimation

------

Estimates efficient for homoskedasticity only

Statistics robust to heteroskedasticity

Number of obs = 10089

F( 6, 10082) = 333.25

Prob > F = 0.0000

Total (centered) SS = 18715.11622 Centered R2 = 0.0640

Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9604

Residual SS = 17518.21658 Root MSE = 1.318

------

| Robust

ldrugexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

hi_empunion | -.8975913 .2211268 -4.06 0.000 -1.330992 -.4641908

totchr | .4502655 .0101969 44.16 0.000 .43028 .470251

age | -.0132176 .0029977 -4.41 0.000 -.0190931 -.0073421

female | -.020406 .0326114 -0.63 0.531 -.0843232 .0435113

blhisp | -.2174244 .0394944 -5.51 0.000 -.294832 -.1400167

linc | .0870018 .0226356 3.84 0.000 .0426368 .1313668

_cons | 6.78717 .2688453 25.25 0.000 6.260243 7.314097

------

Underidentification test (Kleibergen-Paap rk LM statistic): 138.015

Chi-sq(1) P-val = 0.0000

------

Weak identification test (Cragg-Donald Wald F statistic): 183.980

(Kleibergen-Paap rk Wald F statistic): 65.760

Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38

15% maximal IV size 8.96

20% maximal IV size 6.66

25% maximal IV size 5.53

Source: Stock-Yogo (2005). Reproduced by permission.

NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

------

Hansen J statistic (overidentification test of all instruments): 0.000

(equation exactly identified)

------

Instrumented: hi_empunion

Included instruments: totchr age female blhisp linc

Excluded instruments: ssiratio

------

First stage results:The effect of the instrument on hi_empunion is negative as expected and is statisticant at significant 1% level.

The fact of a supplementary insurance decreases the expenditure on prescribed medication in 89.76%, which is pretty high.

Question 4

. quietly ivreg ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc )

. estimates store iv

. quietly ivreg ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc ), robust

. estimates store ivrobust

. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc ), gmm

. estimates store GMM

. estimates table iv ivrobust GMM, stat (se r2_a rmse) star

------

Variable | iv ivrobust GMM

------+------

hi_empunion | -.98992691*** -.98992691*** -.99327949***

totchr | .45120505*** .45120505*** .45095079***

age | -.01413842*** -.01413842*** -.01415093***

female | -.02783978 -.02783978 -.02817157

blhisp | -.22370865*** -.22370865*** -.22310484***

linc | .09427483*** .09427483*** .09446321***

_cons | 6.8751877*** 6.8751877*** 6.8778206***

------+------

se |

r2_a | .04087781 .04087781 .04002154

rmse | 1.3339228 1.3339228 1.3340551

------

legend: * p<0.05; ** p<0.01; *** p<0.001

We can see that the results are significant for all regressors, exept for the fact of being a female, suggesting that gender does not have any effect on medical expences.

“Number of chronic conditions” and “log of income” have indeed a positive effect on medical expenses, however the first one is much higher than the second effect (the more ill you are, the more you have to spend on medical care, and the higher your income, the more you can afford it).

Medical expenses decrease with the fact of being black or Hispanic (maybe they have lower income and can afford less meical care), decreases as well with the fact of having an additional insurance (which is normal if the insurance covers the expenses) and apparently these expenses also decrease with age, which is kind of odd (the older you get, the more likely to get sick and the more likely to increase medical expenses).We also notice that the results of iv and ivrobust are identical, and only differ a little bit from GMM.

Question 5

. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc ), gmm

-gmm- is no longer a supported option; use -gmm2s- with the appropriate option

gmm = gmm2s robust

gmm robust = gmm2s robust

gmm bw() = gmm2s bw()

gmm robust bw() = gmm2s robust bw()

gmm cluster() = gmm2s cluster()

2-Step GMM estimation

------

Estimates efficient for arbitrary heteroskedasticity

Statistics robust to heteroskedasticity

Number of obs = 10089

F( 6, 10082) = 325.50

Prob > F = 0.0000

Total (centered) SS = 18715.11622 Centered R2 = 0.0406

Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9594

Residual SS = 17955.42285 Root MSE = 1.334

------

| Robust

ldrugexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

hi_empunion | -.9932795 .2045645 -4.86 0.000 -1.394219 -.5923405

totchr | .4509508 .0103058 43.76 0.000 .4307517 .4711498

age | -.0141509 .0029 -4.88 0.000 -.0198347 -.0084671

female | -.0281716 .0321727 -0.88 0.381 -.0912288 .0348857

blhisp | -.2231048 .0395804 -5.64 0.000 -.300681 -.1455287

linc | .0944632 .0218833 4.32 0.000 .0515727 .1373537

_cons | 6.877821 .2578727 26.67 0.000 6.372399 7.383242

------

Underidentification test (Kleibergen-Paap rk LM statistic): 170.738

Chi-sq(2) P-val = 0.0000

------

Weak identification test (Cragg-Donald Wald F statistic): 110.613

(Kleibergen-Paap rk Wald F statistic): 58.612

Stock-Yogo weak ID test critical values: 10% maximal IV size 19.93

15% maximal IV size 11.59

20% maximal IV size 8.75

25% maximal IV size 7.25

Source: Stock-Yogo (2005). Reproduced by permission.

NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

------

Hansen J statistic (overidentification test of all instruments): 1.048

Chi-sq(1) P-val = 0.3061

------

Instrumented: hi_empunion

Included instruments: totchr age female blhisp linc

Excluded instruments: ssiratio multlc

------

We find that the p value of the Hansen J test is large enough, giving us little evidence for rejecting the null hypothesis. So we don’t reject the instruments.

Including the four instruments:

. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc lowincome firmsz ), gmm

-gmm- is no longer a supported option; use -gmm2s- with the appropriate option

gmm = gmm2s robust

gmm robust = gmm2s robust

gmm bw() = gmm2s bw()

gmm robust bw() = gmm2s robust bw()

gmm cluster() = gmm2s cluster()

2-Step GMM estimation

------

Estimates efficient for arbitrary heteroskedasticity

Statistics robust to heteroskedasticity

Number of obs = 10089

F( 6, 10082) = 335.98

Prob > F = 0.0000

Total (centered) SS = 18715.11622 Centered R2 = 0.0829

Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9612

Residual SS = 17163.61371 Root MSE = 1.304

------

| Robust

ldrugexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

hi_empunion | -.8124043 .1861018 -4.37 0.000 -1.177157 -.4476515

totchr | .449488 .01011 44.46 0.000 .4296728 .4693033

age | -.0124598 .0027643 -4.51 0.000 -.0178777 -.007042

female | -.0104528 .0308857 -0.34 0.735 -.0709876 .050082

blhisp | -.2061018 .0385144 -5.35 0.000 -.2815886 -.130615

linc | .0796532 .0205381 3.88 0.000 .0393992 .1199073

_cons | 6.7126 .2441439 27.49 0.000 6.234086 7.191113

------

Underidentification test (Kleibergen-Paap rk LM statistic): 200.657

Chi-sq(4) P-val = 0.0000

------

Weak identification test (Cragg-Donald Wald F statistic): 62.749

(Kleibergen-Paap rk Wald F statistic): 44.823

Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85

10% maximal IV relative bias 10.27

20% maximal IV relative bias 6.71

30% maximal IV relative bias 5.34

10% maximal IV size 24.58

15% maximal IV size 13.96

20% maximal IV size 10.26

25% maximal IV size 8.31

Source: Stock-Yogo (2005). Reproduced by permission.

NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

------

Hansen J statistic (overidentification test of all instruments): 11.590

Chi-sq(3) P-val = 0.0089

------

Instrumented: hi_empunion

Included instruments: totchr age female blhisp linc

Excluded instruments: ssiratio multlc lowincome firmsz

------

We find that the p value of the Hansen J test becomes very small with the inclusion on the other two instruments, giving us evidence for rejecting the null hypothesis. So we shouldn’t keep the new instruments because the test suggests that they appear to be correlated with the error.

Question 6

Quietly reg ldrugexp totchr age female blhisp linc hi_empunion, robust

. estimates store OLSrobust

. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio ), first robust

. estimates store IVrobust

. estimates table OLSrobust IVrobust , stat (se r2_a rmse) star

------

Variable | OLSrobust IVrobust

------+------

totchr | .44038073*** .45026553***

age | -.00352947 -.01321759***

female | .0578055* -.02040599

blhisp | -.15130678*** -.21742435***

linc | .01048155 .08700179***

hi_empunion | .0738788** -.89759128***

_cons | 5.8611305*** 6.7871701***

------+------

se |

r2_a | .17648308 .06339657

rmse | 1.2360328 1.3177132

------

legend: * p<0.05; ** p<0.01; *** p<0.001

We find age looses significance with OLS, and indicator variable of being a female becomes significant at 1%. However, the most interesting result (not surprising) is that the variable “linc” looses all its significance with OLS, because of its endogeneity.

. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio )

. hausman iv ., constant sigmamore

Note: the rank of the differenced variance matrix (1) does not equal the number of coefficients being tested (7); be sure

this is what you expect, or there may be problems computing the test. Examine the output of your estimators for

anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.

---- Coefficients ----

| (b) (B) (b-B) sqrt(diag(V_b-V_B))

| iv . Difference S.E.

------+------

hi_empunion | -.9899269 -.8975913 -.0923356 .

totchr | .4512051 .4502655 .0009395 .

age | -.0141384 -.0132176 -.0009208 .

female | -.0278398 -.020406 -.0074338 .

blhisp | -.2237087 -.2174244 -.0062843 .

linc | .0942748 .0870018 .007273 .

_cons | 6.875188 6.78717 .0880176 .

------

b = consistent under Ho and Ha; obtained from ivreg

B = inconsistent under Ha, efficient under Ho; obtained from ivreg2

Test: Ho: difference in coefficients not systematic

chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)

= -1.19 chi2<0 ==> model fitted on these

data fails to meet the asymptotic

assumptions of the Hausman test;

see suest for a generalized test

. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio )

. estimates store iv

. quietly reg ldrugexp totchr age female blhisp linc hi_empunion

. hausman iv ., constant sigmamore

Note: the rank of the differenced variance matrix (1) does not equal the number of coefficients being tested (7); be sure

this is what you expect, or there may be problems computing the test. Examine the output of your estimators for

anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.

---- Coefficients ----

| (b) (B) (b-B) sqrt(diag(V_b-V_B))

| iv . Difference S.E.

------+------

hi_empunion | -.8975913 .0738788 -.9714701 .1932748

totchr | .4502655 .4403807 .0098848 .0019666

age | -.0132176 -.0035295 -.0096881 .0019275

female | -.020406 .0578055 -.0782115 .0155602

blhisp | -.2174244 -.1513068 -.0661176 .0131541

linc | .0870018 .0104815 .0765202 .0152238

_cons | 6.78717 5.861131 .9260396 .1842364

------

b = consistent under Ho and Ha; obtained from ivreg2

B = inconsistent under Ha, efficient under Ho; obtained from regress

Test: Ho: difference in coefficients not systematic

chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)

= 25.26

Prob>chi2 = 0.0000

(V_b-V_B is not positive definite)

We find that the hausman test statistic rejects exogeneity of this variable!!

OR

ivreg2 ldrugexp totchr age female blhisp linc hi_empunion (= ssiratio ), robustorthog(hi_empunion)

OLS estimation

Estimates efficient for homoskedasticity only

Statistics robust to heteroskedasticity

Number of obs = 10089

F( 6, 10082) = 376.85

Prob > F = 0.0000

Total (centered) SS = 18715.11622 Centered R2 = 0.1770

Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9652

Residual SS = 15403.0482 Root MSE = 1.236

Robust

ldrugexp Coef. Std. Err. z P>z [95% Conf. Interval]

totchr .4403807 .00936 47.05 0.000 .4220354 .4587261

age -.0035295 .0019363 -1.82 0.068 -.0073246 .0002657

female .0578055 .0253563 2.28 0.023 .008108 .107503

blhisp -.1513068 .0341146 -4.44 0.000 -.2181701 -.0844435

linc .0104815 .0137079 0.76 0.444 -.0163854 .0373485

hi_empunion .0738788 .0259757 2.84 0.004 .0229673 .1247903

_cons 5.861131 .1570491 37.32 0.000 5.55332 6.168941

Hansen J statistic (Lagrange multiplier test of excluded instruments): 24.935

Chi-sq(1) P-val = 0.0000

-orthog- option:

Hansen J statistic (eqn. excluding suspect orthog. conditions): 0.000

Chi-sq(0) P-val = .

C statistic (exogeneity/orthogonality of suspect instruments): 24.935

Chi-sq(1) P-val = 0.0000

Instruments tested: hi_empunion

Included instruments: totchr age female blhisp linc hi_empunion

Excluded instruments: ssiratio

.