Final Project

Amber Williams

2/18/2017

Intro

Studying industrial organizations is a common interest in economics. We can look at a firm’s research and development based on its sales and profit with numerical and graphical data analysis. We would expect to be able to model R&D increasing with firm size. Although this data analysis which variable causes which cannot be fully determined we can still peak at what the data looks like and if there is a correlation. We will be looking at the elasticity of R&D based on sales and profit with the use of logarithmic variables for R&D and sales to see percentage changes.

A Univariate Look at The Variables

require(foreign)
rdchem<-read.dta("
library(tibble)
rchem<-as_tibble(rdchem)
library(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.5

head(select(rdchem, contains ("lrd"),contains ("lsales"), contains("profits")))

## lrd lsales profits
## 1 6.0651798 8.427312 186.9
## 2 4.0775380 7.948032 467.0
## 3 3.1570001 6.391582 107.4
## 4 1.2527630 4.894850 -4.3
## 5 0.5306283 3.737670 8.0
## 6 2.1282320 5.966147 47.3

summary(rdchem$lrd)

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5306 2.3840 3.7500 3.6030 4.3630 7.2640

summary(rdchem$lsales)

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.738 6.230 7.191 7.165 7.957 10.590

Looking at where the data lies in quadarants we can see if there seems to be any discrepencies. The minimum for R&D is .5306 and the max is 7.2640, it is a wide range that we can look at closer graphically for outliers. The minimum for sales is 3.738 and the max is 10.590. Sales has a much smaller spread and could be dependent on what the companies specialize in production or R&D.

library(ggplot2)
ggplot(rdchem, aes(x=lsales)) +geom_histogram(colour="pink", fill="blue") +ggtitle("Percentage Change in Sales")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(rdchem, aes(x=lrd)) +geom_histogram(colour="blue", fill="pink") +ggtitle("Percentage Change In R&D")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The two sets of data above are normally distributed and show some correlation. To look at this further we will plot them on bivariate graphs.

A Bivariate Comparison

library(ggplot2)
ggplot(rdchem, aes(lsales,lrd))+geom_boxplot(color="pink")+geom_jitter(color="blue", width = .4)+ggtitle("Box Plot of Percentage Sales w/ Percentage R&D scatter")

## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

This shows the data scatter of R&D compared to sales giving us some insight on the positive relationship of the two variables. It is helpful in visualizing where the data scatter is in comparison.

ggplot(data=rdchem, aes(x=factor(lsales), y=lrd, fill=lsales)) +geom_bar(stat="identity", position=position_dodge(), color="pink")+ggtitle("Percentage Changes in Sales and R&D")

As you can see above there is clear evidence that R&D and sales are positively correlated, thus higher sales more R&D.

ggplot(data=rdchem, aes(x=lsales, y=lrd, colour=lsales)) +
geom_line(color="pink") +
geom_point()+ggtitle("The Percentage Change of R&D to Sales")

As predicted as sales percentage increases so does R&D. We can only use general knowledge about which variable drives which. If we bring profit into the equation, we can see that the higher the sale percentage the higher the R&D and the larger the firms profit.

ggplot(rdchem, aes(x=lrd, y=lsales, col=profits)) +
geom_point() +
geom_smooth(se=FALSE)+ggtitle("The Profit Line for the Percentage Comparison of R&D and Sales")

## `geom_smooth()` using method = 'loess'

Conclusion

From the above data, we can make the assumption that there is a strong positive correlation between the sales and profits of an industrial organization and the amount of R&D the company is investing in. This makes sense, for an industry to grow it must produce innovative technologies that keep up with current market demands.