Part 1: Multiple Choice Questions (40 points)

Circle the right answer. Only one answer per question. No credit is given for multiple answers or additional explanations. Two points per question for correct answers.

1)Consider the regression model . Suppose it is reasonable to assume that . This assumption implies that we can causally interpret OLS estimates of

  1. .
  2. .
  3. .
  4. and .

2)In multiple regression, the R2 increases whenever an explanatory variable is

a.added unless the coefficient on the added variable is exactly zero.

b.added unless the adjusted R2falls.

c.added unless there is heteroskedasticity.

d.added unless the added variable is not statistically significant at the 5%-level.

3)The estimate on an explanatory variable is not statistically significant at the 5%-level if

  1. the 95 % confidence interval does not include zero.
  2. the t-statistic is greater than 2.5.
  3. the p-value is less than 0.05.
  4. the p-value is greater than 0.95.

4)Consider testing the hypothesis: . Your chosen level of significance is 5 %. One of the following statements is not correct:

  1. This hypothesis can only be tested via an F-test.
  2. This hypothesis can be tested using a t-test or an F-test.
  3. In large samples, you reject the hypothesis if the computed F-statistic > 3.84.
  4. In large samples, you reject the hypothesis if the computed t-statistic > 1.96.

5)In the regression model , where Y denotes earnings, C a dummy variable for having a college degree and Fagender dummy variable,

  1. is the gender difference in earnings for someone with a college degree.
  2. is the gender difference in earnings for someone without a college degree.
  3. is the difference in earnings between those with and without a college degree when .
  4. cannot be estimated since and are perfectly collinear when .

6)The following are all sensible specifications of a non-linear model with the exception of

  1. .
  2. .
  3. .
  4. .

7)External validity

  1. is guaranteed in an ideal randomized experiment.
  2. is threatened if the regression error terms are heteroskedastic.
  3. is threatened if there is omitted variables bias.
  4. is threatened if there is measurement error in the dependent variable.

8)You want to estimate the price elasticity of cigarette demand. To do that you collect time series data on prices and quantities sold in the Stockholm area. The major concern for such a study is:

  1. simultaneous causality.
  2. errors in variables bias.
  3. wrong functional form.
  4. sample selection.

9)You are interested in the effects of participating in a training program (which may be of varying length). You have data on wages after program completion for those who participated in the program and a potential comparison group. Amajor concern for this study is:

  1. misspecification of the functional form.
  2. sample selection bias.
  3. bias caused by a so-called Hawthorne effect.
  4. thatyou miss information on program length.

10)Heteroskedasticity-robust standard errors are invalid in large samples if

  1. the errors are homoskedastic.
  2. the error variance differs across observations.
  3. the errors are correlated across observations.
  4. the dependent variable is binary.

11)Consider the probitmodel , where is a female dummy variable. The marginal effect () of being female(as opposed to male) on is given by

  1. .
  2. .
  3. (where denotes the mean of ).
  4. (where denotes the derivative of ).

12)One of the following statements is not true. In Probit and Logit models

  1. the t-statistic should still be used for testing a single restriction.
  2. you can include binary variables as explanatory variables.
  3. you use Maximum Likelihood estimation.
  4. F-statistics should not be used, since the models are nonlinear.

13)Consider the panel data model: . You can estimate by first eliminating and then estimating the transformed model. Two transformations are "entity-demeaning" and “first-differencing”. These two approaches

  1. yield identical estimates of if .
  2. yield identical estimates of if .
  3. always yield identical estimates of .
  4. never yield identical estimates of .

14)Indicate for which of the following examples you cannot use entity and time fixed effects: a regression of

  1. OECD unemployment rates on unemployment insurance generosity for the years 1980-2006.
  2. the (log of) earnings on years of education, using the Swedish Level of Livings Survey in 2000.
  3. the per capita income level in Swedish municipalities on local tax rates using data for 1980, 1990, 2000, and 2010.
  4. themarket valuesfor 100firms listed on the Swedish stock exchangeon R&D expenditures for the years 1998-2010.

15)The panel data model with entity and time fixed effects

  1. handles any kind of omitted variables bias.
  2. reduces bias caused by measurement error.
  3. deals with simultaneous causality bias.
  4. requires that the variable of interest varies over entities and time.

16)When there is a single instrument and a single (endogenous) regressor, the TSLS estimator for the slope can be calculated as follows ( () denotes estimated covariance (variance))

  1. .
  2. .
  3. .
  4. .

17)You want to estimate the model:, where is a potential instrument for and a control variable. The exogeneity assumption required for TSLSis fulfilled if

  1. you have information on , and has a direct effect on holding and constant.
  2. you have information on, and
  3. you lack information on , and and are uncorrelated.
  4. you lack information on , and and are correlated.

18)With one exception, the following scenarios lend themselves to a Regression-Discontinuity design:

  1. A test score result determines eligibility for a college grant.
  2. Distance to an administrative border determines eligibility for a tax break.
  3. A random subset of Swedish municipalities gets additional funding for schools.
  4. Vote shares in a two-party system determine which party gets into office.

19)In the ideal randomized experiment

  1. youcan estimate the individual causal effects for all individuals participating in the experiment.
  2. youmust control for variables that are correlated with the dependent variable.
  3. self-selection bias is a serious issue.
  4. youcan estimate the average causal effect for individuals participating in the experiment.

20)A Differences-in-Differences (DiD) approach

  1. always requires data from a randomized controlled experiment.
  2. canalways be implemented if you have data covering at least two time points.
  3. can be used with a single cross-section of data
  4. can be implemented if you have data covering at least two time points, given that the treatment affected a sub-set of the population.

Part 2: Discussion Questions (60 points)

Answer the following questions on separate sheets of paper. Answer clearly and concisely. Only legible answers will be considered, others will be disregarded. If you think that a question is vaguely formulated, specify the conditions used for answering it. Each question is worth 30 points.

Discussion Question 1

A long-standing question in labor economics is whether the generosity of unemployment benefits increase unemployment. To study this question, researchers have used panels of countries with observations spanning several years.

In a well-known book (Layard et al. (1991) Unemployment, p. 55), the authors present the regression results reported in the table below. The results come from a cross-section of 20 countries using data from the mid 1980’s. The generosity of unemployment benefits is measured by two variables: (i) benefit duration (i.e. the maximum length of benefit receipt); and (ii) the benefit replacement ratio (i.e. unemployment benefits in relation to the average wage).

Table: The relationship between unemployment and unemployment benefits

(Dependent variable: Average unemployment rate (%), 1983–88)

Estimate
(t-statistic)
Independent variables
Benefit duration (years) / 0.92
(2.9)
Benefit replacement ratio (%) / 0.17
(7.1)
Notes: The regression also includes a constant plus 5 other control variables (spending on active labor market policies, coverage of collective wage bargaining, employer coordination, union coordination, and change in inflation). Number of observations: 20. Adjusted R-squared: 0.91. The critical t-value at the 5%-level (with 12 degrees of freedom) is 2.179.

a)Interpret the two coefficient estimates.

b)Explain why the OLS estimator of the effect of unemployment benefit generosity on unemploymentmay be biased in this case.

c)An alternative to using OLS is to exploit the panel structure of the data. Discuss the fixed effects regression model in the current application. Does the fixed effects model alleviate the problem(s) of OLS?

d)An alternative to OLS and fixed effects regression, is instrumental variables. Consider using a left-wing political majority as an instrumental variable. Discuss whether this would be a valid instrument.

Discussion Question 2

Suppose you are interested in estimating the causal effect of class size on pupil’s test scores. You want to estimate the relationship:

wherei indexes individuals, Y denotes an individual’s test score, CS class size, and Xa set of control variables.

a)Consider estimating the above equation by OLS. Why is OLS likely to be biased? What is the likely sign of the bias?

b)A number of researchers have noted that maximum class size rules can be useful for identifying the causal effect of class size. The solid line in the figure below (labeled “expected class size”) shows an example of such a maximum class size rule. The rule stipulates that new classes are formed when total enrollment in a grade surpasses multiples of 30. Thus,one class is formed when total 4th grade enrollment in a school district is less than 30. When total enrollment is between 31 and 60, two classes are formed; when total enrollment is between 61 and 90, three classes are formed, and so on. The dashed line shows actual class sizes. Actual class sizes do not follow the rule completely.

Explain how the maximum class size rule may help you in estimating the causal effect of class size. What is the key “identifying assumption”? How would you test this identifying assumption? How would you specify the regression(s) that you would use to estimate the causal effect of interest?

c)A regression of actual class size on expected class size (and control variables) yields an estimate on expected class size of 0.335 (with a standard error of 0.051). What does this information tell you about the validity of the research design?

d)Separate regressions of parental education and parental income (measured before the children are age 10)on expected class size (and control variables) produce estimates that have t-ratios of: –0.15 (parental education) and 0.08 (parental income). What does this information tell you about the validity of the research design?