Stat 359 Assignment 2 Solutions

Thanks to Dr. Laura Cowen for preparing these solutions.

1.  Central Limit Theorem

# First initialize the size of the samples and the number of samples you want to take.

# Also initialize your matrices that will hold your sample means to zero.

size<-c(10,20,50,100)

samp<-c(10,100,1000)

u10<- matrix(rep(0,times=length(size)*10), nrow=length(size), ncol=10)

u100<- matrix(rep(0,times=length(size)*100), nrow=length(size), ncol=100)

u1000<- matrix(rep(0,times=length(size)*1000), nrow=length(size), ncol=1000)

b10<- matrix(rep(0,times=length(size)*10), nrow=length(size), ncol=10)

b100<- matrix(rep(0,times=length(size)*100), nrow=length(size), ncol=100)

b1000<- matrix(rep(0,times=length(size)*1000), nrow=length(size), ncol=1000)

p10<- matrix(rep(0,times=length(size)*10), nrow=length(size), ncol=10)

p100<- matrix(rep(0,times=length(size)*100), nrow=length(size), ncol=100)

p1000<- matrix(rep(0,times=length(size)*1000), nrow=length(size), ncol=1000)

# This part of the code could have be done various ways. Here I take 3 loops that will

# produce 9 matrices of means. For instance, u10 is a 4x10 matrix where the rows

# represent the size of the samples (10,20,50,100) and the columns are the 10 sampled

# means.

for (k in samp){ # k indicates the number of samples 10, 100,l000

index<-0 # counter for the rows in the matrices

for (i in size){ # i changes with size of the sample 10, 20, 50, 100

index<-index+1 # increase the row number by 1

for (j in 1:k){ # j indicates the sample number, 1 to sample size

if (k==10){ # when size of the sample is 10 to produce a 4x10 matrix

u10[index,j]<-mean(runif(i, min=0,max=1))

p10[index,j]<- mean(rpois(i, 5))

b10[index,j]<- mean(rbinom(i, 1, 0.20))

}

if (k==100){ # when size of the sample is 100 to produce a 4x100 matrix

u100[index, j]<- mean(runif(i, min=0,max=1))

p100[index, j]<- mean(rpois(i, 5))

b100[index, j]<- mean(rbinom(i, 1, 0.20))

}

if (k==1000){ # when size of the sample is 1000 to produce a 4x1000 matrix

u1000[index, j]<- mean(runif(i, min=0,max=1))

p1000[index, j]<- mean(rpois(i, 5))

b1000[index, j]<- mean(rbinom(i, 1, 0.20))

}

}

}

}

To conserve plots, put 12 on a page, 1 page for each distribution. Make sure that for comparison purposes the x axis are the same so that we can see how the distributions change. Note that we could also keep the y axis the same, however it may be too different from plot to plot.

par(mfrow=c(4,3))

# Uniform:

hist(u10[1,], main=”10 means of size 10”, xlab=”Uniform(0,1)”, xlim=c(0,0.8))

hist(u100[1,], main=”100 means of size 10”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u1000[1,], main=”1000 means of size 10”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u10[2,], main=”10 means of size 20”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u100[2,], main=”100 means of size 20”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u1000[2,], main=”1000 means of size 20”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u10[3,], main=”10 means of size 50”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u100[3,], main=”100 means of size 50”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u1000[3,], main=”1000 means of size 50”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u10[4,], main=”10 means of size 100”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u100[4,], main=”100 means of size 100”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

hist(u1000[4,], main=”1000 means of size 100”, xlab=”Uniform(0,1)” , xlim=c(0,0.8))

# Poisson:

hist(p10[1,], main=”10 means of size 10”, xlab=”Poisson(5)”,xlim=c(3,7))

hist(p100[1,], main=”100 means of size 10”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p1000[1,], main=”1000 means of size 10”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p10[2,], main=”10 means of size 20”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p100[2,], main=”100 means of size 20”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p1000[2,], main=”1000 means of size 20”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p10[3,], main=”10 means of size 50”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p100[3,], main=”100 means of size 50”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p1000[3,], main=”1000 means of size 50”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p10[4,], main=”10 means of size 100”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p100[4,], main=”100 means of size 100”, xlab=”Poisson(5)” ,xlim=c(3,7))

hist(p1000[4,], main=”1000 means of size 100”, xlab=”Poisson(5)” ,xlim=c(3,7))

# Binomial:

hist(b10[1,], main=”10 means of size 10”, xlab=”Bernoulli(0.2)”, xlim=c(0,0.60))

hist(b100[1,], main=”100 means of size 10”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b1000[1,], main=”1000 means of size 10”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b10[2,], main=”10 means of size 20”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b100[2,], main=”100 means of size 20”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b1000[2,], main=”1000 means of size 20”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b10[3,], main=”10 means of size 50”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b100[3,], main=”100 means of size 50”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b1000[3,], main=”1000 means of size 50”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b10[4,], main=”10 means of size 100”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b100[4,], main=”100 means of size 100”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))

hist(b1000[4,], main=”1000 means of size 100”, xlab=”Bernoulli(0.2)” , xlim=c(0,0.60))


a) Histograms of means of samples taken from the uniform(0,1) where rows vary by size of the sample taken and columns vary by the number of samples taken.


Histograms of means of samples taken from the Poisson(5) where rows vary by size of the sample taken and columns vary by the number of samples taken.


Histograms of means of samples taken from the Bernoulli(0.2) where rows vary by size of the sample taken and columns vary by the number of samples taken.

b) As sample size increases the distributions become narrower, and start to develop a central peak.

c) As the number of samples increases, again the distribution becomes narrower and develops a peak, looking more and more normal.

d) The uniform distribution is symmetric and many samples of size 10 tends to look normal. Whereas the Bernoulli(0.2) is clearly not symmetric. It takes “longer” (more samples and larger sample size) before it starts to look normal. But, in any case, they all eventually look normal.

2.  Exploring distributions

salt<-read.table(“D:\\Stat359\\Assignment 2\\salt.txt”)

salt<-salt[,1]

a)  Is the data symmetric? First check if the mean and median are equal.

summary(salt)

Min. 1st Qu. Median Mean 3rd Qu. Max.

13.53 48.11 51.40 55.62 67.98 103.50

We can also look at a boxplot and/or histogram of the data

par(mfrow=c(1,2))

boxplot(salt)

hist(salt)

We notice that the mean and median differ, but not by a lot. Also there are outliners on either side of the distribution and the histogram appears nearly symmetric. Remember we are discussing only 10 data points.

b)  Skew and Kurtosis

Using the functions found in the text:

skew<-function(x){

m3<-sum((x-mean(x))^3)/length(x)

s3<-sqrt(var(x))^3

m3/s3

}

kurtosis<-function(x){

m4<-sum((x-mean(x))^4)/length(x)

s4<-var(x)^2

m4/s4-3

}

skew(salt)

[1] 0.1723753

kurtosis(salt)

[1] -0.9342198

c) > abs(salt.skew)>2*sqrt(6/length(salt))

[1] FALSE

No, it does not seem skewed based on this rule.

d) > # rough rule of thumb for kurtosis

> abs(salt.kurtosis)>2*sqrt(24/length(salt))

[1] FALSE

Seems symmetric based on this rule.

e) Based on examining the data, the distribution generating these data seems symmetric, with normal tails.

3.  Fecundity of fruit flies

We start by reading in the data, then by looking at some plots to determine if our samples are approximately normally distributed, symmetrical and have equal variance.

fecund<-read.table("D:\\Stat359\\Assignment 2\\fecundity.txt ", header=T)

attach(fecund)

par(mfrow=c(1,3))

boxplot(RS, NS)

qqnorm(RS)

qqnorm(NS)

a)  H0: σ2RS=σ2NS Population variances are equal

var.test(NS, RS)

F test to compare two variances

data: NS and RS

F = 1.3236, num df = 24, denom df = 24, p-value = 0.4974

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.5832755 3.0036468

sample estimates:

ratio of variances

1.323614

As the p-value is large, there is no strong evidence against H0 that the population variances are equal.

b)  H0: μRS=μNS Population means are equal, assuming equal variance

t.test(NS,RS, var.equal=TRUE)

Two Sample t-test

data: NS and RS

t = 3.4251, df = 48, p-value = 0.001268

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

3.351692 12.880308

sample estimates:

mean of x mean of y

33.372  25.256

As the p-value is small, we have strong evidence against H0 that the population means are equal. It appears that the non-selected line has a larger mean number of eggs laid per female per day than the resistant line.

4.  Fusible interlinings

a)  QQ Plot

fuseH<-read.table("D:\\Stat359\\Assignment 2\\fabric.H.txt")

fuseP<-read.table("D:\\Stat359\\Assignment 2\\fabric.P.txt")

H<-t(fuseH[1,])

P<-t(fuseP[1,])

par(mfrow=c(1,2))

qqnorm(H, main=”QQplot for H”)

qqnorm(P,main=”QQplot for P”)

Both samples look as though they have been drawn from a normal distribution.

b)  Boxplot

par(mfrow=c(1,1))

boxplot(H,P, names=c(“H”,”P”))

As there is a lot of overlap in the boxplots, it does not suggest that the population means differ.

c)  H0: μH=μP Population means are equal

First we will see if we can assume the variances are equal. From the boxplots

var.test(H,P)

F test to compare two variances

data: H and P

F = 0.7016, num df = 23, denom df = 7, p-value = 0.4862

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.1585015 2.0362234

sample estimates:

ratio of variances

0.7015781

As we do not reject they hypothesis of equal variances, we now perform the appropriate t test.

t.test(H,P, var.equal=T)

Two Sample t-test

data: H and P

t = -0.4164, df = 30, p-value = 0.6801

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-0.4674695 0.3091362

sample estimates:

mean of x mean of y

1.508333 1.587500

If we concluded from our qqplots that the data was not normal, we could look at a nonparametric wilcoxon test.

wilcox.test(H,P)

Wilcoxon rank sum test with continuity correction

data: H and P

W = 96, p-value = 1

alternative hypothesis: true mu is not equal to 0

Based on the above analyses, we suggest that the true average extensibilities do not differ by types of fabric.

1