dr Katarzyna Kopczewska
Class 01 / Outline:
- Course introduction
- Requirements
- Intro to STATA
– Linear model example
Course introduction and requirements
Lab in Microeconometrics follows the lecture in Micoreconometrics. Students are expected to attend the class (min. 80% of time). Lab exercises will be based on Case Studies solved in STATA. To proof Your activity during Lab is to fill in “strategic agenda” for Case Study presented. To pass this Lab students have to present their model and fill “strategic agendas”.
Class 01 issues
- Load the data
- Summarize data in tables / on graph
- Estimate linear model using options for regress
- Add a dummy variable
- Run appropriate tests
- Check VIF
- Draw conclusion – fill in “strategic agenda”
Dataset description:
Swiss Fertility and Socioeconomic Indicators (1888) Data - Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0,100].
[,1] Fertility Ig, "common standardized fertility measure"
[,2] Agriculture % of males involved in agriculture as occupation
[,3] Examination % "draftees" receiving highest mark on army examination
[,4] Education % education beyond primary school for "draftees".
[,5] Catholic % catholic (as opposed to "protestant").
[,6] Infant.Mortality live births who live less than 1 year.
All variables but 'Fertility' give proportions of the population.
Details (paraphrasing Mosteller and Tukey):Switzerland, in 1888, was entering a period known as the"demographic transition"; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.
Additional info: Files for all 182 districts in 1888 and other years have been available at <URL:
or
Source:
Mosteller, F. and Tukey, J. W. (1977) _Data Analysis andRegression: A Second Course in Statistics_. Addison-Wesley,ReadingMass.
Kosuke Imai, Gary King and Olivia Lau (2007). Zelig: Everyone'sStatistical Software. R package version 3.0-1.
Commands used for modelling:
File/Import to read the data
Graphics / Two way graph - to make scatterplot in new window
. plot Y X to make scatterplot in results window
. avplots (after regress) to check individual relationships
. correlate to chceck correlation among variables
. sum variable, d to display summary of variable
Statistics / Linear models and related / Linear regression to estimate OLS model
. regress to estimate OLS model
. vif (after regress) Variance Inflation Factor to detect multicollinearity
. estat hettest (after regress) to chech heteroskedasticity
To create dummy variables:
Without missing values / With missing values. gen young = 0
. replace young = 1 if age<25
. gen young = 0 / . replace young = 1 if age<25
. replace young = . if missing(age)
Compare on:
. predict resi, resi to generate fitted values
. egen ze=std(resi) to generate standarised fitted values
. swilk variable to test normality
@@@@@@@@@@@
VIF – Variance Inflation Factor
Assuming y=a+b2x2 + b3x3 + …+ bKxk + e
Regress xk on other xi and check R2 (called R2K)
VIF=1/(1-R2K) , problematic variable K when VIF>10 (usually variables schould be dropped)