Data Analysis Project
Joseph Kearns
February 20, 2017
Data Description:
My dataset contains a number of variables necessary to estimate the relationship between travel cost and visitation rate associated with anglers at Lake Davis.The content of my dataset can be seperated into two categories: Data collected in a survey conducted by the Center for Economic Development and the data derived from it, and Demographic variables collected by myself from the US Census datasets. The former category contains the following variables:
-"VIS_pop" (visitation rate in a given county)
-"tc" (the cost associated with driving to the lake)
Demographic variables collected by myself are the following:
-"MHI" Median Household Income of the respondents county
-"MHI2" The previous variable squared
-"Age" The average age in the respondents county
-"Education" Percentage of the respondents county possesing a bachelors degree or higher
-"Gender" Percentage of the respondents county that is male
-"Ethnicity" Percentage of the respondents county that is white
-"Urban_Rural" Percentage of the respondents county living in a rural area
Lake_Davis_Data <- read_excel ("C:/Users/leftc/Documents/Spring 2017/Econ Honors/LakeDavisDemographics.xlsx")
Univariate Analysis
Travel Cost (tc)
sumstats_tc <- summary(Lake_Davis_Data$tc)
sumstats_tc
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.876 35.220 59.140 95.610 144.800 559.200
ggplot(Lake_Davis_Data, aes(x=tc)) + geom_histogram(binwidth = 20)
It is evident that the travel cost variable is not normally distributed, and is in fact right-hand skewed. This is what we would expect and is consistent with economic theory; most of the visitors to the lake are individuals who incur a smaller cost in getting there.
Visitation rate (VIS_Pop)
sumstats_visitation <- summary(Lake_Davis_Data$VIS_Pop)
sumstats_visitation
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.934e-07 2.495e-06 5.112e-06 1.718e-05 1.247e-05 2.341e-04
ggplot(Lake_Davis_Data, aes(x=VIS_Pop)) + geom_histogram(binwidth = .000005)
This variable exhibits right hand skew as well, which I would likely attribute to the fact that Lake Davis is a relatively obscure destination. In the surveys time period, approxiamtely 15,000 people visited the various campgrounds at the lake. Lake Davis does not draw many visitors, so it makes sense that vistation rates would be clustered close to zero.
Bivariate Analysis
regression<-lm(VIS_Pop~tc+MHI+Age+Education+Gender+Ethnicity+Urban_Rural, data = Lake_Davis_Data)
summary(regression)
##
## Call:
## lm(formula = VIS_Pop ~ tc + MHI + Age + Education + Gender +
## Ethnicity + Urban_Rural, data = Lake_Davis_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.856e-05 -1.086e-05 -3.889e-06 7.339e-06 1.673e-04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.433e-06 7.596e-05 0.085 0.9326
## tc -7.159e-08 3.039e-08 -2.355 0.0199 *
## MHI -2.214e-10 2.110e-10 -1.049 0.2957
## Age -5.599e-10 5.757e-07 -0.001 0.9992
## Education 1.769e-07 2.830e-07 0.625 0.5330
## Gender 3.049e-07 1.328e-06 0.230 0.8187
## Ethnicity -3.044e-08 2.171e-07 -0.140 0.8887
## Urban_Rural 5.591e-05 1.024e-05 5.462 2.05e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.56e-05 on 142 degrees of freedom
## Multiple R-squared: 0.3726, Adjusted R-squared: 0.3417
## F-statistic: 12.05 on 7 and 142 DF, p-value: 5.129e-12
ggplot(Lake_Davis_Data, aes(x=tc, y=VIS_Pop))+geom_point()+geom_smooth(method = "lm")
Controlling for other demographic characteristics, the linear model shows a negative correlation between travel cost and visitation rate. This is evidenced graphically in the scatter plot. The regression also showed that the relationship between these two variables is significant via the t-score for the tc variable.
Implications
The relationship between travel cost and visitation rate is negative and significant like the law of demand would suggest. Given the results, we can estimate demand for recreation at Lake Davis. From this, we would go on to estimate consumer surplus (an economic concept describing the value above the price that a consumer recieves from a good, in this case the lake). This allows us to value the Lake so we can justify, in this case, poisoning the lake with rotenone to treat an invasive fish population. In my initial travel cost models, I've estimated the value of the lake to be between 2.5 and 5 million dollars.