LAB 1 Introduction to R, Descriptive Statistics, and Correlat

LAB 4 – T-distributions, Confidence Intervals and Hypothesis Tests for Two Matched Population Means and Two Independent Population Means

To download R onto your own personal computer, go to:

Click on the link for R-2.6.1-win32.exe. Save the file to your computer. Then

click on the file to start the installation to your computer.

Your submission to LAB 4 should consist of answering the numbered questions as you work through the Lab.

***AS YOU ARE WORKING THROUGH THE LAB, copy and paste each output into a blank word file****

You can either print the completed word file out and turn that in, or you can e-mail the word file to me for you LAB 4 grade.

Everything MUST be done in R. CODE AND OUTPUT should be included in your word file.

**********************************************************************************************************************************************

Access R

On the desktop or through the Programs Menu, find the R icon and click on it. You should be brought to a screen with a command prompt.

T-distributions

A t-curve, or Studentized curve, is a density curve like a normal distribution. You can graph t-curves or calculate areas under a t-curve similar to how we did in Lab 2.

First, let’s create a data set of x-values between –4 and 4 with increments of 0.01.

> data<-seq(-4,4,0.01)

Then, plot a normal curve (z-curve) as a dotted line (hence = “lty=2”)

> plot(data, dnorm(data), type="l",lty=2)

Now, overlay a Student’s t curve with df = 5 as a solid line to see the difference.

> lines(data, dt(data, df=5))

Compare the t-curve and the normal curve above in your own words.
Overlay a second Student’s t curve, with df=10. Include your graph and compare this third curve to the previous two.

We can also calculate areas under a Student’s t curve. In Lab 2, to calculate the area to the left of 3 in a normal curve with mean 4 and standard deviation 0.5, the following was the command:

>pnorm(3,mean=4,sd=0.5)

[1] 0.02275013

The command is similar in a t-curve. However, t-curves are always centered at zero. If you wanted to find the area to the left of 3 in a Student’s t-curve with df = 2, the following is the command.

>pt (3,df=2)

[1] 0.952267

Find the area to the right of z = 2.5. For all questions, include code and output in lab.
Find the area to the right of t = 2.5, with df = 5.
Find the area to the right t = 2.5, with df = 25.
What would you have estimated if you used the t-table in your book to answer question 5?
Rank the areas of questions 3-5 from smallest to largest. Explain why the areas are ranked in that order, in your own words.

To download data from my website:

The dataset for this example can be found on my website and is saved as streams.txt.

Create a name for your data set. For example, since this data set looks at biodiversity scores for two sections of a stream, I will call it streams. The command to load in your data is:

> site=”

> streams<-read.table(file=site, header=T)

Then, so that we can call the variables in the future by name, enter the following lines of code:

>attach(streams)

>names(streams)

output: [1] “down” “up”

Two Matched Population Means

As we saw in class, the one population t-test can be used in a two population matched design. You first have to transform the data into a sample of differences and then do the hypothesis test or confidence interval on the differences.

The dataset for this example can be found on my website and is saved as streams.txt.

This example is looking at composite biodiversity scores based off of samples of aquatic invertebrates. The two samples are taken from the same river, one upstream and one downstream from the same sewage outfall. Since the samples are taken from the same river, the data set design is matched or dependent.

The question of interest is whether or not there is a difference between the average scores of upstream and downstream.

First, you need to create the differences between upstream and downstream:

>diff<- up-down

Now, conduct a t-test on the matched pair differences:

> t.test(diff)

output:

One Sample t-test

Data: diff

t= 3.0502, df = 15, p-value = 0.0081

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

0.2635612 1.4864388

sample estimates:

mean of x

0.875

To interpret the hypothesis test, the test statistic is 3.0502, the degree of freedom is 15, the P-value is 0.0081, and the sentence underneath, “alternative hypothesis: true mean is not equal to 0”, is telling you that they used a two-tailed alternative

At the bottom of the output they are telling you that x-bar, or the sample average difference is 0.875. My interpretation of the non-directional alternative is: At 5% significance, data provides evidence that there is a difference between the average biodiversity scores of upstream and downstream.

If you wanted to do a one-tailed test, change the coding to:

> t.test (diff, alt=”less”)for left tailed or:

> t.test(diff, alt=”greater”)for right tailed

To interpret the confidence interval, they tell you your limits of 0.2635612 and 1.4864388. Since the interval consists of all positive numbers, it is telling you that “up” is greater than “down”. My interpretation of the interval would be: At 95% confidence, the average biodiversity score upstream is between 0.2636 to 1.4864 greater than the average biodiversity score downstream.

********************************************************************

A confidence interval will come from a two tailed test. If you want a one tailed test AND a confidence interval, you will have to do the t.test code twice. First do a t.test for the one tailed hypothesis test and second, do a t.test for the two tailed confidence interval.

**********************************************************************

8. Do a hypothesis test and confidence interval for the dataset tire.txt

The data set consists of eleven tires that were measured for treadwear by two methods, one based on weight and one based on groove wear. You are interested in whether there is a difference in the two methods. The data is in thousands of miles. Include your code, output, and interpretations.

9. Do a hypothesis test for the dataset eye.txt

The data set consists of 8 people that have glaucoma in one eye but not in the other eye. The variable is comparing corneal thickness, in microns, between the two eyes. Do a one tailed test to see if the normal eye is, on average, thicker than the glaucoma eye. Include your code, output, and interpretations.

Two Independent Population Means

The dataset for this example can be found on my website and is saved as wings.txt.

This example is looking at two subspecies of dark-eyed Juncos. One of the subspecies migrates each year and the other does not migrate. One of the variables measured was wing length. The unit of measurement is in millimeters. We are interested if there is a difference between the average wing length of migratory and nonmigratory Juncos.

Now, conduct a t-test on the independent population means, first downloading the data set.

>site=”

>wings<-read.table(file=site,header=T)

>attach(wings)

>names(wings)

OUTPUT: [1] “MIGRA” “NONMIGRA”

> t.test(MIGRA, NONMIGRA)

OUTPUT:

Welch Two Sample t-test

Data: MIGRA and NONMIGRA

t= -4.6217, df = 25.614, p-value = 9.422e-05

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-4.046220 -1.553780

sample estimates:

mean of x mean of y

82.1 84.9

To interpret the hypothesis test, the test statistic is –4.6217, the Satterthwaite degree of freedom is 25.614, the P-value is 0.00009422, and the sentence underneath, “alternative hypothesis: true difference in means is not equal to 0”, is telling you that they used a two-tailed alternative.

My interpretation of the non-directional alternative is: At 5% significance, data provides evidence that there is a difference between the average wing length of migratory and nonmigratory Juncos.

To interpret the confidence interval, they tell you your limits of –4.046220 and –1.553780. The interval consists of all negative numbers, it is telling you that “NONMIGRA” is greater than “MIGRA”. My interpretation of the interval would be: At 95% confidence, the average wing length of nonmigratory Juncos is between 1.5538 to 4.0462 millimeters greater than the average wing length of migratory Juncos..

If you wanted to do a one-tailed test, change the coding to:

> t.test (MIGRA,NONMIGRA, alt=”less”) for left tailed or:

> t.test(MIGRA,NONMIGRA, alt=”greater”) for right tailed

If you wanted to change the significance level to anything other than 5%, change the coding to:

>t.test(MIGRA,NONMIGRA,conf.level=0.99) for alpha = 1%

10. Do a hypothesis test for the dataset cloud.txt

The data set consists of results of a study on cloud seeding with silver nitrate. The variable collected is rainfall amounts, in acre-feet, for unseeded and seeded clouds. At 5% significance, is there evidence that average rainfall for seeded clouds is greater than average rainfall for unseeded clouds? Include your code, output, and interpretation.

Do a hypothesis test and confidence interval for the dataset homes.txt

The data set consists of random samples of homes from New York and Los Angeles. The variable collected is home prices in thousands of dollars. At 10%, is there evidence that average home price is different in New York than that in Los Angeles? Include your code, output, and interpretations of both a significance test and of the confidence interval