General
Though it is not a requirement for this exercise, we can save a lot of time by combining the two data sets before we do the exercises. To do that, we have to first save the two data files separately, say as:
"M:\pc\Desktop\2007 host 120907 120907 earningsdata_males.dta" and
"M:\pc\Desktop\2007 host 120907 earningsdata_females.dta"
We can then combine the two data sets by using the command ‘append’. We first open one of the two files and append the second one. Here is an example:
. use "M:\pc\Desktop\2007 host 120907 120907 earningsdata_males.dta", clear
. append using "M:\pc\Desktop\2007 host 120907 earningsdata_females.dta"
Next, we need to generate an identifier variable to separate the two data sets. We can call the variable gender and generate it as follows.
. generate gender = "male"
. replace gender = "female" in 5860/9106
gender was str4 now str6
(3247 real changes made)
Problem 1.
. sort gender
. by gender: sum ln_y_
------
-> gender = female
Variable | Obs Mean Std. Dev. Min Max
------+------
ln_y_ | 3247 7.688559 .2854469 6.43294 9.841133
------
-> gender = male
Variable | Obs Mean Std. Dev. Min Max
------+------
ln_y_ | 5859 7.94615 .3150719 6.398595 9.806811
. by gender: ci ln_y_
------
-> gender = female
Variable | Obs Mean Std. Err. [95% Conf. Interval]
------+------
ln_y_ | 3247 7.688559 .0050094 7.678737 7.69838
------
-> gender = male
Variable | Obs Mean Std. Err. [95% Conf. Interval]
------+------
ln_y_ | 5859 7.94615 .0041162 7.938081 7.954219
. by gender: ci ln_y_, level(90)
------
-> gender = female
Variable | Obs Mean Std. Err. [90% Conf. Interval]
------+------
ln_y_ | 3247 7.688559 .0050094 7.680317 7.696801
------
-> gender = male
Variable | Obs Mean Std. Err. [90% Conf. Interval]
------+------
ln_y_ | 5859 7.94615 .0041162 7.939378 7.952921
. ttest ln_y_, by(sex) unequal
Two-sample t test with unequal variances
------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
------+------
famale | 3247 7.688559 .0050094 .2854469 7.678737 7.69838
male | 5859 7.94615 .0041162 .3150719 7.938081 7.954219
------+------
combined | 9106 7.854298 .0034461 .3288497 7.847543 7.861054
------+------
diff | -.2575912 .0064836 -.270301 -.2448815
------
diff = mean(famale) - mean(male) t = -39.7296
Ho: diff = 0 Satterthwaite's degrees of freedom = 7272.12
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
We can also make the above test manually using the display command.
. display (7.94615 -7.688559)/ ((.3150719^2/5859) + (.2854469 ^2/3247))^(1/2)
39.729598
Problem 2
. generate y = exp( ln_y_)
. sort gender
. by gender:sum y
------
-> gender = female
Variable | Obs Mean Std. Dev. Min Max
------+------
y | 3247 2278.268 760.6658 621.9999 18791
------
-> gender = male
Variable | Obs Mean Std. Dev. Min Max
------+------
y | 5859 2980.621 1123.488 600.9999 18157
. sum y
Variable | Obs Mean Std. Dev. Min Max
------+------
y | 9106 2730.177 1063.75 600.9999 18791
. sum ln_y_
Variable | Obs Mean Std. Dev. Min Max
------+------
ln_y_ | 9106 7.854298 .3288497 6.398595 9.841133
. display exp(7.854298 )
2576.7856
Problem 3
. sort gender
. by gender: regress ln_y_ s
------
-> gender = female
Source | SS df MS Number of obs = 3247
------+------F( 1, 3245) = 534.04
Model | 37.3757573 1 37.3757573 Prob > F = 0.0000
Residual | 227.108137 3245 .069987099 R-squared = 0.1413
------+------Adj R-squared = 0.1411
Total | 264.483894 3246 .081479943 Root MSE = .26455
------
ln_y_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
s | .0438482 .0018974 23.11 0.000 .0401279 .0475684
_cons | 7.449845 .0113251 657.81 0.000 7.42764 7.47205
------
------
-> gender = male
Source | SS df MS Number of obs = 5859
------+------F( 1, 5857) = 810.37
Model | 70.6800466 1 70.6800466 Prob > F = 0.0000
Residual | 510.845549 5857 .08721966 R-squared = 0.1215
------+------Adj R-squared = 0.1214
Total | 581.525595 5858 .09927033 Root MSE = .29533
------
ln_y_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
s | .0470292 .0016521 28.47 0.000 .0437906 .0502679
_cons | 7.703346 .0093614 822.88 0.000 7.684994 7.721698
------
Problem 4
.display 7.449845 +.0438482 * 5
7.669086
. display 7.703346 +.0470292 * 5
7.938492
Problem 6
. display (.0470292 - .0438482)/(( .0016521^2+.0018974^2)^(1/2))
1.2643778
Problem 7
When we use just s, as a regressor it is very likely that we are committing omitted variable bias. One possible omitted variable could be e (if e affects ln_y_ and is also correlated with s). Let us see what will happen to the coefficient of s when we include e as an additional regressor.
. by sex: regress ln_y_ s e
------
-> sex = famale
Source | SS df MS Number of obs = 3247
------+------F( 2, 3244) = 301.43
Model | 41.4488911 2 20.7244456 Prob > F = 0.0000
Residual | 223.035003 3244 .068753084 R-squared = 0.1567
------+------Adj R-squared = 0.1562
Total | 264.483894 3246 .081479943 Root MSE = .26221
------
ln_y_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
s | .0526224 .0021992 23.93 0.000 .0483105 .0569342
e | .0065945 .0008568 7.70 0.000 .0049147 .0082744
_cons | 7.30401 .0220225 331.66 0.000 7.26083 7.347189
------
------
-> sex = male
Source | SS df MS Number of obs = 5859
------+------F( 2, 5856) = 585.91
Model | 96.9632034 2 48.4816017 Prob > F = 0.0000
Residual | 484.562392 5856 .08274631 R-squared = 0.1667
------+------Adj R-squared = 0.1665
Total | 581.525595 5858 .09927033 Root MSE = .28766
------
ln_y_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
s | .0603534 .0017743 34.01 0.000 .056875 .0638317
e | .0124004 .0006958 17.82 0.000 .0110364 .0137644
_cons | 7.447337 .0170141 437.72 0.000 7.413983 7.480691
------
. correlate s e
(obs=9106)
| s e
------+------
s | 1.0000
e | -0.4584 1.0000
. correlate s e ln_y_
(obs=9106)
| s e ln_y_
------+------
s | 1.0000
e | -0.4584 1.0000
ln_y_ | 0.3093 0.0061 1.0000
Problem 8
We can run different univariate regressions and compare their R2with the model that includes s as a regressor. Let us try one:
. sort gender
. by gender: regress ln_y_ e
------
-> gender = female
Source | SS df MS Number of obs = 3247
------+------F( 1, 3245) = 25.76
Model | 2.08275579 1 2.08275579 Prob > F = 0.0000
Residual | 262.401138 3245 .080863217 R-squared = 0.0079
------+------Adj R-squared = 0.0076
Total | 264.483894 3246 .081479943 Root MSE = .28436
------
ln_y_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
e | -.0040326 .0007946 -5.08 0.000 -.0055906 -.0024747
_cons | 7.748528 .0128269 604.08 0.000 7.723378 7.773677
------
------
-> gender = male
Source | SS df MS Number of obs = 5859
------+------F( 1, 5857) = 12.37
Model | 1.22562115 1 1.22562115 Prob > F = 0.0004
Residual | 580.299974 5857 .099078022 R-squared = 0.0021
------+------Adj R-squared = 0.0019
Total | 581.525595 5858 .09927033 Root MSE = .31477
------
ln_y_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
e | .0024285 .0006905 3.52 0.000 .0010749 .003782
_cons | 7.909485 .0112063 705.81 0.000 7.887517 7.931454
------