JPLPage 130/10/2018
M2R Economie internationale, développement, transition Année 2010-2011
EXAMEN D’ECONOMETRIE APPLIQUEE
Question 1
. use "C:\Documents and Settings\Administrador\Mis documentos\International Economics I\Econometrics\Laffargue\exam\mus06
> data.dta", clear
. describe ldrugexp hi_empunion totchr age female blhisp linc
storage display value
variable name type format label variable label
------
ldrugexp float %9.0g log(drugexp)
hi_empunion byte %8.0g Insured thro emp/union
totchr byte %8.0g Total chronic cond
age byte %8.0g Age
female byte %8.0g Female
blhisp float %9.0g Black or Hispanic
linc float %9.0g log(income)
. sum ldrugexp hi_empunion totchr age female blhisp linc
Variable | Obs Mean Std. Dev. Min Max
------+------
ldrugexp | 10391 6.479668 1.363395 0 10.18017
hi_empunion | 10391 .3796555 .4853245 0 1
totchr | 10391 1.860745 1.290131 0 9
age | 10391 75.04639 6.69368 65 91
female | 10391 .5797325 .4936256 0 1
------+------
blhisp | 10391 .1703397 .3759491 0 1
linc | 10089 2.743275 .9131433 -6.907755 5.744476
We can see that the variable linc is contains missing observations. We can also see that the average age of individuals in the sample is 75 years, and that less than 50% of them have a complementary insurance. More than half of them are females. Proportion of blacks and Hispanics is not that high at all.
Question 2
To know if there are missing observations we use codebook command and then we the drop missing ones:
. codebook linc
------
linc log(income)
------
type: numeric (float)
range: [-6.9077554,5.7444763] units: 1.000e-09
unique values: 6914 missing .: 302/10391
mean: 2.74328
std. dev: .913143
percentiles: 10% 25% 50% 75% 90%
1.79176 2.2327 2.74316 3.31506 3.79928
. drop if linc==.
(302 observations deleted)
. des ssiratio lowincome firmsz multlc
storage display value
variable name type format label variable label
------
ssiratio float %9.0g SSI/Income ratio
lowincome byte %8.0g Low income
firmsz float %9.0g Firm size
multlc byte %8.0g Multiple locations
. sum ssiratio lowincome firmsz multlc
Variable | Obs Mean Std. Dev. Min Max
------+------
ssiratio | 10089 .5365438 .3678175 0 9.25062
lowincome | 10089 .1874319 .3902771 0 1
firmsz | 10089 .1405293 2.170389 0 50
multlc | 10089 .0620478 .2412543 0 1
We can see that the variable lowincome is not that high, meaning that the status lowi ncome is rather represents a very small proportion of the observations. We can also see that on average the ssiratio is not that high, meaning that there is not a very high income constraint. We also find that the size of the firms were the individuals are employed, on average are rather small and not operating in much locations.
Question 3
. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio ), first robust
. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio ), first robust
First-stage regressions
------
First-stage regression of hi_empunion:
OLS estimation
------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 10089
F( 6, 10082) = 119.18
Prob > F = 0.0000
Total (centered) SS = 2382.242839 Centered R2 = 0.0761
Total (uncentered) SS = 3856 Uncentered R2 = 0.4292
Residual SS = 2201.062524 Root MSE = .4672
------
| Robust
hi_empunion | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
totchr | .0127865 .0036655 3.49 0.000 .0056015 .0199716
age | -.0086323 .0007087 -12.18 0.000 -.0100216 -.0072431
female | -.07345 .0096392 -7.62 0.000 -.0923448 -.0545552
blhisp | -.06268 .0122742 -5.11 0.000 -.08674 -.0386201
linc | .0483937 .0066075 7.32 0.000 .0354417 .0613456
ssiratio | -.1916432 .0236326 -8.11 0.000 -.2379678 -.1453186
_cons | 1.028981 .0581387 17.70 0.000 .9150172 1.142944
------
Included instruments: totchr age female blhisp linc ssiratio
------
F test of excluded instruments:
F( 1, 10082) = 65.76
Prob > F = 0.0000
Angrist-Pischke multivariate F test of excluded instruments:
F( 1, 10082) = 65.76
Prob > F = 0.0000
Summary results for first-stage regressions
------
(Underid) (Weak id)
Variable | F( 1, 10082) P-val | AP Chi-sq( 1) P-val | AP F( 1, 10082)
hi_empunion | 65.76 0.0000 | 65.81 0.0000 | 65.76
NB: first-stage test statistics heteroskedasticity-robust
Stock-Yogo weak ID test critical values for single endogenous regressor:
10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
Underidentification test
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic Chi-sq(1)=138.02 P-val=0.0000
Weak identification test
Ho: equation is weakly identified
Cragg-Donald Wald F statistic 183.98
Kleibergen-Paap Wald rk F statistic 65.76
Stock-Yogo weak ID test critical values for K1=1 and L1=1:
10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and orthogonality conditions are valid
Anderson-Rubin Wald test F(1,10082)= 22.12 P-val=0.0000
Anderson-Rubin Wald test Chi-sq(1)= 22.13 P-val=0.0000
Stock-Wright LM S statistic Chi-sq(1)= 20.71 P-val=0.0000
NB: Underidentification, weak identification and weak-identification-robust
test statistics heteroskedasticity-robust
Number of observations N = 10089
Number of regressors K = 7
Number of endogenous regressors K1 = 1
Number of instruments L = 7
Number of excluded instruments L1 = 1
IV (2SLS) estimation
------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 10089
F( 6, 10082) = 333.25
Prob > F = 0.0000
Total (centered) SS = 18715.11622 Centered R2 = 0.0640
Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9604
Residual SS = 17518.21658 Root MSE = 1.318
------
| Robust
ldrugexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
hi_empunion | -.8975913 .2211268 -4.06 0.000 -1.330992 -.4641908
totchr | .4502655 .0101969 44.16 0.000 .43028 .470251
age | -.0132176 .0029977 -4.41 0.000 -.0190931 -.0073421
female | -.020406 .0326114 -0.63 0.531 -.0843232 .0435113
blhisp | -.2174244 .0394944 -5.51 0.000 -.294832 -.1400167
linc | .0870018 .0226356 3.84 0.000 .0426368 .1313668
_cons | 6.78717 .2688453 25.25 0.000 6.260243 7.314097
------
Underidentification test (Kleibergen-Paap rk LM statistic): 138.015
Chi-sq(1) P-val = 0.0000
------
Weak identification test (Cragg-Donald Wald F statistic): 183.980
(Kleibergen-Paap rk Wald F statistic): 65.760
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------
Instrumented: hi_empunion
Included instruments: totchr age female blhisp linc
Excluded instruments: ssiratio
------
First stage results:The effect of the instrument on hi_empunion is negative as expected and is statisticant at significant 1% level.
The fact of a supplementary insurance decreases the expenditure on prescribed medication in 89.76%, which is pretty high.
Question 4
. quietly ivreg ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc )
. estimates store iv
. quietly ivreg ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc ), robust
. estimates store ivrobust
. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc ), gmm
. estimates store GMM
. estimates table iv ivrobust GMM, stat (se r2_a rmse) star
------
Variable | iv ivrobust GMM
------+------
hi_empunion | -.98992691*** -.98992691*** -.99327949***
totchr | .45120505*** .45120505*** .45095079***
age | -.01413842*** -.01413842*** -.01415093***
female | -.02783978 -.02783978 -.02817157
blhisp | -.22370865*** -.22370865*** -.22310484***
linc | .09427483*** .09427483*** .09446321***
_cons | 6.8751877*** 6.8751877*** 6.8778206***
------+------
se |
r2_a | .04087781 .04087781 .04002154
rmse | 1.3339228 1.3339228 1.3340551
------
legend: * p<0.05; ** p<0.01; *** p<0.001
We can see that the results are significant for all regressors, exept for the fact of being a female, suggesting that gender does not have any effect on medical expences.
“Number of chronic conditions” and “log of income” have indeed a positive effect on medical expenses, however the first one is much higher than the second effect (the more ill you are, the more you have to spend on medical care, and the higher your income, the more you can afford it).
Medical expenses decrease with the fact of being black or Hispanic (maybe they have lower income and can afford less meical care), decreases as well with the fact of having an additional insurance (which is normal if the insurance covers the expenses) and apparently these expenses also decrease with age, which is kind of odd (the older you get, the more likely to get sick and the more likely to increase medical expenses).We also notice that the results of iv and ivrobust are identical, and only differ a little bit from GMM.
Question 5
. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc ), gmm
-gmm- is no longer a supported option; use -gmm2s- with the appropriate option
gmm = gmm2s robust
gmm robust = gmm2s robust
gmm bw() = gmm2s bw()
gmm robust bw() = gmm2s robust bw()
gmm cluster() = gmm2s cluster()
2-Step GMM estimation
------
Estimates efficient for arbitrary heteroskedasticity
Statistics robust to heteroskedasticity
Number of obs = 10089
F( 6, 10082) = 325.50
Prob > F = 0.0000
Total (centered) SS = 18715.11622 Centered R2 = 0.0406
Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9594
Residual SS = 17955.42285 Root MSE = 1.334
------
| Robust
ldrugexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
hi_empunion | -.9932795 .2045645 -4.86 0.000 -1.394219 -.5923405
totchr | .4509508 .0103058 43.76 0.000 .4307517 .4711498
age | -.0141509 .0029 -4.88 0.000 -.0198347 -.0084671
female | -.0281716 .0321727 -0.88 0.381 -.0912288 .0348857
blhisp | -.2231048 .0395804 -5.64 0.000 -.300681 -.1455287
linc | .0944632 .0218833 4.32 0.000 .0515727 .1373537
_cons | 6.877821 .2578727 26.67 0.000 6.372399 7.383242
------
Underidentification test (Kleibergen-Paap rk LM statistic): 170.738
Chi-sq(2) P-val = 0.0000
------
Weak identification test (Cragg-Donald Wald F statistic): 110.613
(Kleibergen-Paap rk Wald F statistic): 58.612
Stock-Yogo weak ID test critical values: 10% maximal IV size 19.93
15% maximal IV size 11.59
20% maximal IV size 8.75
25% maximal IV size 7.25
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------
Hansen J statistic (overidentification test of all instruments): 1.048
Chi-sq(1) P-val = 0.3061
------
Instrumented: hi_empunion
Included instruments: totchr age female blhisp linc
Excluded instruments: ssiratio multlc
------
We find that the p value of the Hansen J test is large enough, giving us little evidence for rejecting the null hypothesis. So we don’t reject the instruments.
Including the four instruments:
. ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio multlc lowincome firmsz ), gmm
-gmm- is no longer a supported option; use -gmm2s- with the appropriate option
gmm = gmm2s robust
gmm robust = gmm2s robust
gmm bw() = gmm2s bw()
gmm robust bw() = gmm2s robust bw()
gmm cluster() = gmm2s cluster()
2-Step GMM estimation
------
Estimates efficient for arbitrary heteroskedasticity
Statistics robust to heteroskedasticity
Number of obs = 10089
F( 6, 10082) = 335.98
Prob > F = 0.0000
Total (centered) SS = 18715.11622 Centered R2 = 0.0829
Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9612
Residual SS = 17163.61371 Root MSE = 1.304
------
| Robust
ldrugexp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
hi_empunion | -.8124043 .1861018 -4.37 0.000 -1.177157 -.4476515
totchr | .449488 .01011 44.46 0.000 .4296728 .4693033
age | -.0124598 .0027643 -4.51 0.000 -.0178777 -.007042
female | -.0104528 .0308857 -0.34 0.735 -.0709876 .050082
blhisp | -.2061018 .0385144 -5.35 0.000 -.2815886 -.130615
linc | .0796532 .0205381 3.88 0.000 .0393992 .1199073
_cons | 6.7126 .2441439 27.49 0.000 6.234086 7.191113
------
Underidentification test (Kleibergen-Paap rk LM statistic): 200.657
Chi-sq(4) P-val = 0.0000
------
Weak identification test (Cragg-Donald Wald F statistic): 62.749
(Kleibergen-Paap rk Wald F statistic): 44.823
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85
10% maximal IV relative bias 10.27
20% maximal IV relative bias 6.71
30% maximal IV relative bias 5.34
10% maximal IV size 24.58
15% maximal IV size 13.96
20% maximal IV size 10.26
25% maximal IV size 8.31
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------
Hansen J statistic (overidentification test of all instruments): 11.590
Chi-sq(3) P-val = 0.0089
------
Instrumented: hi_empunion
Included instruments: totchr age female blhisp linc
Excluded instruments: ssiratio multlc lowincome firmsz
------
We find that the p value of the Hansen J test becomes very small with the inclusion on the other two instruments, giving us evidence for rejecting the null hypothesis. So we shouldn’t keep the new instruments because the test suggests that they appear to be correlated with the error.
Question 6
Quietly reg ldrugexp totchr age female blhisp linc hi_empunion, robust
. estimates store OLSrobust
. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio ), first robust
. estimates store IVrobust
. estimates table OLSrobust IVrobust , stat (se r2_a rmse) star
------
Variable | OLSrobust IVrobust
------+------
totchr | .44038073*** .45026553***
age | -.00352947 -.01321759***
female | .0578055* -.02040599
blhisp | -.15130678*** -.21742435***
linc | .01048155 .08700179***
hi_empunion | .0738788** -.89759128***
_cons | 5.8611305*** 6.7871701***
------+------
se |
r2_a | .17648308 .06339657
rmse | 1.2360328 1.3177132
------
legend: * p<0.05; ** p<0.01; *** p<0.001
We find age looses significance with OLS, and indicator variable of being a female becomes significant at 1%. However, the most interesting result (not surprising) is that the variable “linc” looses all its significance with OLS, because of its endogeneity.
. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio )
. hausman iv ., constant sigmamore
Note: the rank of the differenced variance matrix (1) does not equal the number of coefficients being tested (7); be sure
this is what you expect, or there may be problems computing the test. Examine the output of your estimators for
anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| iv . Difference S.E.
------+------
hi_empunion | -.9899269 -.8975913 -.0923356 .
totchr | .4512051 .4502655 .0009395 .
age | -.0141384 -.0132176 -.0009208 .
female | -.0278398 -.020406 -.0074338 .
blhisp | -.2237087 -.2174244 -.0062843 .
linc | .0942748 .0870018 .007273 .
_cons | 6.875188 6.78717 .0880176 .
------
b = consistent under Ho and Ha; obtained from ivreg
B = inconsistent under Ha, efficient under Ho; obtained from ivreg2
Test: Ho: difference in coefficients not systematic
chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -1.19 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;
see suest for a generalized test
. quietly ivreg2 ldrugexp totchr age female blhisp linc ( hi_empunion= ssiratio )
. estimates store iv
. quietly reg ldrugexp totchr age female blhisp linc hi_empunion
. hausman iv ., constant sigmamore
Note: the rank of the differenced variance matrix (1) does not equal the number of coefficients being tested (7); be sure
this is what you expect, or there may be problems computing the test. Examine the output of your estimators for
anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| iv . Difference S.E.
------+------
hi_empunion | -.8975913 .0738788 -.9714701 .1932748
totchr | .4502655 .4403807 .0098848 .0019666
age | -.0132176 -.0035295 -.0096881 .0019275
female | -.020406 .0578055 -.0782115 .0155602
blhisp | -.2174244 -.1513068 -.0661176 .0131541
linc | .0870018 .0104815 .0765202 .0152238
_cons | 6.78717 5.861131 .9260396 .1842364
------
b = consistent under Ho and Ha; obtained from ivreg2
B = inconsistent under Ha, efficient under Ho; obtained from regress
Test: Ho: difference in coefficients not systematic
chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 25.26
Prob>chi2 = 0.0000
(V_b-V_B is not positive definite)
We find that the hausman test statistic rejects exogeneity of this variable!!
OR
ivreg2 ldrugexp totchr age female blhisp linc hi_empunion (= ssiratio ), robustorthog(hi_empunion)
OLS estimation
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 10089
F( 6, 10082) = 376.85
Prob > F = 0.0000
Total (centered) SS = 18715.11622 Centered R2 = 0.1770
Total (uncentered) SS = 442534.2012 Uncentered R2 = 0.9652
Residual SS = 15403.0482 Root MSE = 1.236
Robust
ldrugexp Coef. Std. Err. z P>z [95% Conf. Interval]
totchr .4403807 .00936 47.05 0.000 .4220354 .4587261
age -.0035295 .0019363 -1.82 0.068 -.0073246 .0002657
female .0578055 .0253563 2.28 0.023 .008108 .107503
blhisp -.1513068 .0341146 -4.44 0.000 -.2181701 -.0844435
linc .0104815 .0137079 0.76 0.444 -.0163854 .0373485
hi_empunion .0738788 .0259757 2.84 0.004 .0229673 .1247903
_cons 5.861131 .1570491 37.32 0.000 5.55332 6.168941
Hansen J statistic (Lagrange multiplier test of excluded instruments): 24.935
Chi-sq(1) P-val = 0.0000
-orthog- option:
Hansen J statistic (eqn. excluding suspect orthog. conditions): 0.000
Chi-sq(0) P-val = .
C statistic (exogeneity/orthogonality of suspect instruments): 24.935
Chi-sq(1) P-val = 0.0000
Instruments tested: hi_empunion
Included instruments: totchr age female blhisp linc hi_empunion
Excluded instruments: ssiratio
.