7

1. For each of the following, i) name the variables measured and ii) state the type of variables that have been explored in the experiment. Where more than one variable has been studied, indicate which if any, is the response versus the explanatory variable. Provide only short answers, full sentences are not necessary. (9 marks)

(Example answers: Variable: femur length – numerical-continuous, response variable

Variable: eye colour – categorical-nominal, explanatory variable).

a) You sample randomly 10 earwigs and measure their weight in grams.

Length – numerical continuous

b) You count the number of blades of grass for eight 3cm x 3cm square quadrates randomly located on the lawn of the York University campus.

Number grass blades – numerical discrete

c) You wish to determine whether consumption of potato chips causes increased lipid concentrations in blood. You randomly sample 20 rats, 10 of which consume a regular diet, while 10 others have a diet supplemented with potato chips. You measure the concentration of lipids in blood (in micrograms per mL) after 3 months on this diet.

Diet – categorical-nominal – explanatory

Lipid level – numerical-continuous - response

d) Some plant species (e.g. clover) show variation for the ability to produce hydrogen cyanide (HCN) gas as a chemical defence. You wish to test whether HCN is a chemical defence against being eaten by insects. You randomly sample 100 plants from a population and determine for each plant whether or not it produces HCN, and whether or not it has been damaged by insects.

HCN production – categorical-nominal – explanatory

Damage – categorical-nominal - response

e) You wish to explore various factors that might increase the growth rate of coffee plants. You randomly assign 7 coffee plants to each of the following combinations of treatments and measure their growth (in cm) after 4 months.

7 plants are shaded, and receive nitrogen fertilizer

7 plants are in full sun, and receive nitrogen fertilizer

7 plants are shaded, and don’t receive nitrogen fertilizer

7 plants are in full sun, and don’t receive nitrogen fertilizer

Growth – numerical-continuous – response

Light – categorical-nominal – explanatory

Fertilizer – categorical-nominal - explanatory

2. Estimate the mean, median, 1st and 3rd quartiles and standard error of the mean for the following data sets. Identify all extreme values, if there are any:

(8 marks)

a) Data: 20, 4, 7, 1, 5 RANKED data: 1 4 5 7 20

Mean = 7.4

Median = 5.0 because n = 5 is odd, the mean is the middle number

1st Quartile =4.0

3rd Quartile =7.0

Standard error of mean =3.3

List extreme values : 20 is the only extreme value

For 1st quartile, j =1/4 x 5 = 1.25, round up gives j = 2 so the second number is 1st quartile

For 3rd quartile, k =3/4 x 5 = 3.75, round up gives k = 4 so the 4th number is 3rd quartile

s2 ={ x2-x2/n}/(n-1), giving s2 =(491 – 37 2/5)/4

so s2 = 54.3, and s = 7.3689 and standard error of mean = 3.3

interquartile range is IQR = 7-4 = 3, 1.5 x IQR = 4.5

so extreme values would be above 4.5 + 7 = 11.5

or below 4-4.5 = -0.5.

b) Data: -3.2 -4.4 -1.3 -3.6 -7.6 -2.1 -0.5 -12.1

RANKED data: -12.1 -7.6 -4.4 -3.6 -3.2 -2.1 -1.3 -0.5

Mean = - 4.35

Median =-3.40 since n =8 is even, it is the mean of the middle two numbers

1st Quartile = - 6.00

3rd Quartile = -1.70

Standard error of mean = 1.35

List extreme values no extreme values

For 1st quartile, j =1/4 x 8 = 2, j is integer so use mean of j=2 and j=3 for 1st quartile

For 3rd quartile, k =3/4 x 8 = 6, k is integer use mean of k=6 and k=7 for 3rd quartile

s2 ={ x2-x2/n}/(n-1), giving s2 =(253.08 – (-34.8) 2/8)/7

so s2 = 14.5286, and s = 3.811636 and standard error of mean = 1.347617

interquartile range is IQR = -1.7 - -6.0 = 4.3, 1.5 x IQR = 6.45

so extreme values would be below -6 + -6.45 = -12.45

or above -1.7 + 6.45 = 4.75.

3. For the distribution shown below:

a) what is the statistical term that best describes its shape ? skewed left (or negative skew)

b) if you were to construct the sampling distribution of means based upon a sample size of n = 10 from the distribution below, what would be the expected shape of the resulting distribution? Shape approximately normal (or bell-shaped or even symmetrical) 1

(2 marks).

4. A biologist tells you that they know that the standard deviation of wing length for a population of adult red-tailed hawks is 3.0 cm, and the mean wing length is 60 cm. They conduct a study and obtain a standard error of the mean equal to 0.5. What is the sample size, n, upon which their mean is based (2 marks)?

Standard deviation, s = 3.0, standard error of mean, SEx = 0.5

Since SEx = s / n, rearranging this gives:

n = (s / SEx) 2

therefore n = (3 / 0.5) 2 = 36 , so the mean is based on a sample size of 36

5. If you roll three dice sequentially, what is the probability that the first shows a 6, the second a 5, and the third a 4? (2 marks)

1/6 x 1/6 x 1/6 = 1/ 216 is the probability of the outcome above (or 0.00463)

6. You are told that African elephants have a mean weight of 5000 kilograms at age 20 years old. You wish to test this claim so you randomly sample and weigh (with quite some difficulty) five 20-year old elephants. Estimate the approximate 95% confidence interval for weight.

Does the 95% confidence interval support the claim that their mean weight is 5000kg? Explain briefly in one sentence why it does or does not? (3 marks).

Weight in kg: 5000, 5050, 5900, 5800, 5700

n = 5, x = 5490.0

Calculate standard error of mean

s2 ={ x2-x2/n}/(n-1), giving s2 =(151442500 – (27450) 2/5)/4

so s2 = 185500, and s = 430.697 and standard error of mean = 192.6

the upper limit 95% confidence interval is given by x + 2 x SEx

= 5490 + 2 x 192.6 = 5875.2

the lower limit 95% confidence interval is given by x - 2 x SEx

= 5490 - 2 x 192.6 = 5104.8

No, the confidence interval doesn’t support the claim, since the confidence interval (5104.9 to 5875.2) does not contain or capture the value 5000 suggesting 5000 is not the true mean. Recall that we’re 95% certain the true mean lies in the stated interval.

7. The proportion of blue-eyed people differs for populations from various parts of the world. In Ireland, the proportion of blue-eyed people is 0.6 while in Spain the proportion is 0.2. Considering only marriages between Irish and Spanish people, what is the probability that neither partner in a randomly chosen couple has blue eyes. State the assumption made in determining this value (2 marks).

Irish not blue-eyed = 1 – 0.6 = 0.4

Spanish not blue = 1 – 0.2= 0.8

Probability neither is blue-eyed is 0.4 x 0.8 = 0.32 assuming eye colour of partners is independent

8. Describe briefly how you would generate the sampling distribution of the coefficient of variation? (2 marks)

Take a random sample of size n, from a population.

Calculate the coefficient of variation which is given by: 100 x s / x

Repeat this procedure infinitely many times .

Construct a histogram of the coefficients of variation

9. For each of the following, state the null (Ho) and alternative (Ha) hypotheses clearly indicating if Ha is one-sided or two-sided. (6 marks)

a) You wish to determine whether regular exercise affects the life-span of hamsters. Forty hamsters are given no extra hours of exercise per day, while 40 are given 1 hour of extra exercise per day. You determine the life span of each animal.

Ho: exercise doesn’t affect lifespan (or lifespan with extra= lifespan without extra)

Ha: exercise does affect lifespan (or lifespan with extra ≠ lifespan without extra)

Two-sided

b. Does chewing bark of the sweet magnolia tree reduce tooth decay? You obtain a random sample of 40 volunteers and ask them to chew the bark for 20 minutes per day, while 40 others do not. You count the number of dental cavities each person has after 2 years.

Ho: cavities of bark chewers = cavities of non-chewers

Ha: cavities of bark chewers < cavities of non-chewers

one-sided

c. You wish to determine whether the expression of a DNA repair gene increases if you expose fruitflies to ultraviolet (UV) light. You expose 10 flies to UV and 10 flies to white light, and then measure their gene expression.

Ho: gene expression with UV = gene expression without UV

Ha: gene expression with UV > gene expression without UV

One-sided

10) Write a single complete SAS program to obtain descriptive statistics for each of two groups of snails for the variable “weight gained” . One group (group A) was fed brown algae, while the other (group B) was fed red algae. The weight (in mg) of each snail was measured at the beginning of the experiment and 1 month later. In the data set below, for each snail I’ve listed the group to which it belonged, followed by its initial weight and then its final weight. Include the data in your program, using exactly the format in which the data is written below (6 marks).

DATA SNAILS;

INPUT FOODTYPE $ INITWT FINALWT ;

WTGAIN = FINALWT – INITWT;

DATALINES;

A 34 40

B 45 44

A 36 36

B 54 52

A 41 43

B 54 50

;

PROC SORT;

BY FOODTYPE;

PROC UNIVARIATE;

BY FOODTYPE;

RUN;