Exercise on Error bars
The following are replicate observations of % growth at each of four pH values. For each set, give the mean, standard deviation, standard error of the mean and range and arrange them as a table. (It is started for you.)
pH % Growth mean s.d. s.e. range
4.0 2.6 2.9 2.7 2.7 3.8 2.98 0.459 0.145 1.2
2.9 2.6 3.1 2.7 3.8
5.0 2.8 3.4 4.9 3.6 3.8 3.58 0.728 0.230 2.2
2.7 3.3 3.1 3.5 4.7
6.0 3.7 3.6 5.9 4.2 3.6 4.22 0.880 0.359 2.3
4.3
7.0 4.4 4.1 3.9 4.5 4.2 4.478 0.494 0.165 1.4
4.0 5.1 5.3 4.8
a) Having calculated the mean, s.d.and standard error for each set of observations, draw a graph of mean % growth against pH and draw error bars at each point.
b) Do you think a straight line is justified as a fit to this data or does it need a curve?
(Hint: What does the standard error bar tell you about the position of the true mean %growth at a particular pH?)
Answers above for calculation in italics. Graph (not given) has pH along the horizontal axis, mean % growth on vertical axis.
a)The point of this is to be sure you can use your calculator in s.d. mode to find mean and s.d. of a small sample of data in one operation. The standard error has to be calculated from the s.d. by you. (Not a calculator automatic function) If your s.d. values are all slightly lower than the above you are using the wrong s.d. (use the n-1 or s, NOTnor ). Also recall that the range of the data (maximum value – minimum value) is related to the s.d.Larger s.d.larger range.
b)The chance that the true mean %growth at each pH lies between the top and the bottom of the standard error bar is only about 68%. To be 95% sure that we have caught the true value we would need at least to double the bar length. That being so there is no evidence against a straight line.
What’s missing from this graph?
pH / Mean / Mean – Std. Error / Mean + Std. Error4 / 2.98 / 2.84 / 3.13
5 / 3.58 / 3.35 / 3.81
6 / 4.22 / 3.86 / 4.58
7 / 4.48 / 4.32 / 4.65
QuestionsPage 8
1. Plate counts using the same basic medium at various pH levels were obtained (after 8 hours incubation). Unfortunately some plates were contaminated and the sample sizes for them ended up smaller. So the results available were as follows:
pH
Count 5.0 6.0 7.0 8.0
52 89 110 120
49 130 162 154
72 75 98 175
63 140 148
47 131
57 142
i) Calculate the mean and s.d. of the counts for each pH.
ii) Find the standard error of the mean for each pH.
iii) Draw a graph of mean count against pH incorporating error bars.
iv) Calculate the coefficient of variation for the mean count at each pH.
pH5.06.07.08.0
Mean56.7108.5131.8149.7
S.d.9.47 31.40 24.07 27.76
Std. Error3.87 15.70 9.83 16.02
Coeff of Variation16.7% 28.9% 18.3% 18.5%
Page 9
2. A Physiology study took numerous measurements on a large number of people.
a) Below is a random sample of left ear and left foot lengths for males.
Left Ear (cm) Left foot (cm)
6.028.0
6.525.0
5.825.5
7.027.0
6.825.2
6.225.2
5.026.0
5.526.0
5.524.5
Which set of measurements is the more variable? This is intentionally somewhat open-ended. You need to decide which measure of variation to use.
Answer
Left earLeft foot
Mean6.0325.8
S.d.0.658 1.09
C.V.10.9% 4.2%
To compare variation of two sets of measurements that are on different scales or that have very different means, the coefficient of variation should be used. This applies here as the mean lengths are very different. Hence we can say that the length of the ear is more variable in relative terms than the foot.
Page 9
3. Two methods are in use to determine the % moisture in wood. A sample of wood is divided into 12 pieces. The % moisture is determined 6 times using Method A and 6 times using Method B.
Method A 42% 51% 43% 50% 54% 44%
B 39% 41% 46% 39% 42% 41%
Which method is the more repeatable?
Answer
Mean S.d.
Method A 47.3%4.97%
Method B41.3%2.58%
Method B has the lower s.d.and therefore it is more repeatable. Repeatability relates to the s.d. of the observations rather than the standard error. Anyone who points out that the range in Method B is smaller than that for Method A and quotes the ranges has also got it right (and saved work!)
Page 9
4. Two methods are in use for vitamin A determination in vitamin pills. To compare methods a solution of known concentration 2 mg/ml was used. Each method was used for 5 repeats on Monday and another 5 repeats on the following Friday. The amount of vitamin A should be identical in each case.
Means.d.
Method A
Monday 2.05 2.10 2.08 2.08 2.062.0740.01949
Friday 1.85 1.87 1.93 1.87 1.901.8840.03130
Method B
Monday 2.10 1.92 2.07 1.96 1.952.0000.07969
Friday 2.06 1.97 2.09 2.01 1.922.0100.06819
Which method (if any) is apparently
i) accurateMethod B …….mean very near to 2 – the known mean
ii) repeatableMethod A……..low s.d. within repeat sets
iii) reproducibleprobably B……less difference in means Mon. to Fri.
iv) precise?Method A……..lower s.d. than B on same no. of observations
In practical terms there is a problem. Method B is more accurate but less precise than Method A. I would want considerably more data if I had to decide between the methods in real life. If all we want is to compare data to see if there has been an increase or decrease then method A may be better. If we need to be accurate against an outside standard we may have to stick with method B and do more repeats each time to increase the precision since the s.d. is higher.
Answers for Confidence intervals for a population mean
Assume that all samples are normally distributed for the questions on this handout. Remember that if the standard deviation has been calculated on a small sample you need to use the tdistribution.
Page 13
1. An analysis was made of peat soils from a number of sites which had similar vegetation. The total phosphate in the soil was as follows:
(mg/100g dry weight)
39.3 46.6 51.7 46.0 68.3 58.0
Calculate the mean and s.d. of the sample and hence find a 95% confidence interval for the mean phosphate content of the soil.
Explain clearly in words what this confidence interval means. WRITTEN answer!!
Solution:
Mean: 51.65
Standard Deviation: 10.27
Critical Value: page 45 statistical tables.
Identify Column:
Identify Row: - degrees of freedom
Thus:
95% confidence interval:
95% of the times when we do this type of calculation we will be correct when we claim that the calculated interval includes the true mean.
Page 13
2. Using specimens from 10 children, determination of the %calcium content of sound teeth gave the following:
36.39 36.19 34.20 35.15 35.47
35.22 36.11 35.63 36.63 35.59
(i) Find 95% and 99% confidence intervals for the mean %calcium of the teeth.
Mean: 35.658
Standard Deviation: 0.7137
Standard Error:
Degrees of Freedom (Row on page 45 of Tables):
Critical Values:
95% interval: 2.262
99% interval: 3.250
95% Confidence Interval:
99% Confidence Interval:
(ii) Important question. Explain why you cannot find a 100% confidence interval.
Because we are assuming an underlying normal distribution and we can never state a range for 100% of a normal distribution. (Curve never quite touches the horizontal axis.)
Page 13
3. The mean indirect bilirubin level of 16 fourday old infants was found to be 5.98 mg/100cc. The s.d. was calculated to be 3.5 mg/100cc.
Find 90%, 95% and 99% confidence intervals for the mean bilirubin level of the population.
Critical Values (from t-tables) are based on 15 degrees of freedom:
90%:1.753
95%:2.131
99%:2.947
Standard Error is: 3.5/4 = .875
Confidence intervals are:
90%:
95%:
99%:
Page 13
4. A sample of 100 apparently normal adult males, aged 25, had a mean systolic blood pressure of 125. Assuming that the s.d. of the sample is 15 find
(i) a 90% C.I. for the population mean
(ii) a 95% C.I. for the population mean.
“Large” sample size – use Normal distribution rather than t-distribution. This is the last row of the t-tables.
Critical Value:
90%:1.645
95%:1.960
90% confidence interval 1
95% confidence interval
For this it is not necessary to use the t – distribution. The normal distribution can be used as the sample size is greater than 30. The z – values can be found either by using Table 5 and looking up the 2.5% point for 95% interval etc or you can use the bottom row of Table 10 which is marked as having an infinite number of degrees of freedom. (Written as )
General point about the precision of the quoted mean and s.d. (s.e.)
The usual rules are that you give the mean to one more decimal place than the data. The standard deviation and the standard error should also be quoted to one more decimal place than the original. BUT this is as a final answer. You need at least one extra figure of precision while doing the calculations.
For example, in question 1 above you work to 3 d.p. at least but you quote answers rounded to 2 d.p. at the end.
If you do not work to greater precision than your final answer you can get a very inaccurate answer.
1