Introductory Statistics 161Assignment 2

Part A: Analysis of traffic data

In this section you will use the data you collected in Assignment 1.

Car Size (m^3) / Body / Year / Year
9.941 / Small Cars / 2003 / Less than 2005
10.251 / Small Cars / 1996 / Less than 2005
10.699 / Small Cars / 2012 / Greater than 2005
9.139 / Medium Cars / 1972 / Less than 2005
10.426 / Medium Cars / 1986 / Less than 2005
11.304 / Medium Cars / 2001 / Less than 2005
12.255 / Medium Cars / 2007 / Greater than 2005
12.332 / Medium Cars / 2007 / Greater than 2005
13.305 / Large Cars / 2004 / Less than 2005
13.387 / Large Cars / 2004 / Less than 2005
13.955 / Large Cars / 2004 / Less than 2005
13.402 / Large Cars / 2011 / Greater than 2005
13.470 / Large Cars / 2011 / Greater than 2005
13.484 / Large Cars / 2010 / Greater than 2005
14.618 / Luxury Cars / 2010 / Greater than 2005
14.987 / Luxury Cars / 2010 / Greater than 2005
13.814 / Luxury Cars / 2002 / Less than 2005
14.278 / Luxury Cars / 2002 / Less than 2005

1. Comparing two means:

a) Confidence interval for difference between means [4 Marks]

(i) Use Minitab to construct a 95% Confidence Interval for the difference between the means of your numeric variable for the different levels of your 2-level categorical variable.

(ii) Interpret your confidence interval in context (remember units).

(iii) Use your confidence interval to draw a conclusion about the difference (if any) between the two levels of your categorical variable.

b) Two-tailed hypothesis test for the difference between two means [4 Marks]

(i) State the null and alternative hypotheses (in words and symbols) for testing if there is a significant difference between the means of your numeric variable for the different levels of your 2-level categorical variable.

(ii) Use Minitab to carry out the test. State the test statistic and corresponding p-value for these hypotheses.

(iii) Explain whether you have evidence for or against the null hypothesis.

(iv) State your conclusion in a form that a non-statistician would understand.

2. Comparing more means: ANOVA

Use Minitab to test if your numerical variable has the same true mean for the different levels of your 3+ level categorical variable.

a) Write down the null and alternative hypotheses to examine this question using ANOVA. [2 Marks]

b) Assumption checking: [5 Marks]

(i) State the assumptions for the ANOVA test.

(ii) Use Mintab to check whether these assumptions are valid for your data. (Hint: use Normal Probability Plots, compare the sample standard deviations and comment on independence).

(iii) Discuss whether your data needs transforming. If a transformation is needed, select a suitable transformation and transform your data.

c) Use Minitab to carry out the ANOVA on your (transformed if necessary) data.

[3 Marks]

(i) State the test statistic and corresponding p-value.

(ii) Explain whether you have evidence for or against the null hypothesis.

(iii) State your conclusion in a form that a non-statistician would understand.

Part B: Transformations and Confidence Intervals

An ecologist has collected data on the height (in cm) of the shrub Melicytusmicranthugrowing in an area where attempts have been made to control the noxious vine old man’s beard. This is the data

Height (cm)
56
67
135
72
104
72
80
120
38
64
115
29
56
72
46
44
50
225
70
46
29
62
135
44
29
34
72
59
93
62
54
93
56
36
48
70
67
72
56
72
96
146
80
82
120
92
36
53
38
44
26

1. Exploratory data analysis [3 Marks]

a) Construct a histogram of the heights. (Minitab hint: Graph: Histograms: Simple)

b) Comment of the shape of the distribution of heights.

c) Find the mean and median height of the shrubs, and comment on the difference between them.

2. Confidence interval [3 Marks]

a) Construct and interpret a 95% confidence interval for the mean height of Melicytusmicranthuin this area.

3. Transforming the data [4 Marks]

a) Create 3 new columns in Minitab containing the following transformations of the shrub heights:

i. square root,

ii. natural log and

iii. inverse (i.e. 1/value)

b) Construct a histogram of the transformed heights for each of the three transformations.

c)Which transformation makes the data appear approximately Normally Distributed?

4. Using the transformation you chose in 2c):[5 Marks]

a) Find the mean and median of the transformed heights.

b) Convert the mean and median of the transformed data back to the original units.

c)Discuss whether the back-transformed mean is closer to the mean or median of the original data, and why.

It is known that in areas where the noxious vine old man’s beard has never grown (natural area) the mean height of the shrub Melicytusmicranthu grows to an average height of 82.5cm.

5. Using the untransformed data, conduct a hypothesis test to determine if the average height of the shrub in the controlled area is significantly smaller than the average height of the shrub in the natural area. [5 Marks]

(a) State your null and alternative hypotheses in words and symbols.

(b) Use Minitab to calculate the test statistic and corresponding p-value.

(c) Explain whether you have evidence for or against the null hypothesis

(d) State your conclusion in a form that a non-statistician would understand.

++++++++