By Now You Should Have Had a Chance to Skim Through the Text and See How It Is Laid Out

Environmental Analysis Analytical Chemistry Lecture 3 October 7, 2002

I have placed one copy of my lecture notes in the Chem Cave. The notes are also a text file in the program Share Directory under Handouts.

Also there is a detailed set of solutions to the firsts week’s homework in the Cave and on the Share. I had prepared a set of solutions that took 2½ pages, but then wrote a detailed set in addition which is 8 pages long. I expect you to try the problems before reading the solutions. On the other hand I don’t want you starring at a problem for hours and hours. The solutions are there so you can get help when you get stuck. Remember to work together, discuss the material and start learning.

Being able to apply concepts to the solution of numerical problems is a major part of this program. When you finish a problem go back over it and ask your self: What concepts did I use? What do the concepts say about tackling this kind of problem? How might I use these concepts to tackle similar problems?

Numerical solutions are just part of the answers. We want you to think about you learning and actively solving problems.

By now you should have had a chance to skim through the text and see how it is laid out.

After these first few weeks we will stop going quite so fast through the material and give you more time to digest the concepts. I hope that much of this is review, but if not, keep in mind that this is a year long program and that you have a whole year of learning ahead of you. You will struggle with some topics and find others easy.

You may find solubility equilibrium difficult now, but there will come a time when you will realize that solubility is now easy. But then, Analysis of Variance doesn’t made sense. Because we are a field application-based program you will apply nearly every concept we cover at some time during the year to your field work - many of them over and over.

XXXXXX

Homework assignment due Wednesday of Week 3 is the following problems from Harris:

3-12,4-11,4-20,4-22,4-3,3-15a,b,g;1-25,3-21

Chapter 3 is concerned with ways to deal with and express experimental errors while Chapter 4 is a discussion of statistics in analytical chemistry.

Today I want to give an introduction to experimental data and the use of statistics in data analysis, followed by doing some of these homework problems in workshop.

Errors in Chemical Analyses

It is impossible to perform a chemical analysis that is totally free of errors, or uncertainties.

All one can hope is to minimize these errors and to estimate their size with acceptable accuracy.

Every measurement is influenced by many uncertainties, which combine to produce a scatter of results. Measurement uncertainties can never be completely eliminated, so the true value for any measured quantity is always unknown.

XXXXXX

As the book points out, we can divide measurement errors into two classes. The first are Systematic errors arise from flaws in your methods or equipment which can be detected and corrected, although not always easily.

On the other hand, Random errors arise from uncontrollable variabilities in the measurement.

Systematic errors tend to decrease as one gains experience performing an assay, so training is important. Another type of systematic error arises from errors in procedures such as making a the wrong calibration standard.

Suppose you are to make a 1.0 mM standard but make instead one that is 0.9 mM. When the instrument reads 1.0 it really should be 0.9. Will your determinations be off on the high side or low side? When you report a sample has a concentration of 2.0 mM, the real value is 1.8 mM.

So you need to practice methods, be extremely careful with procedures and constantly test your results by running standards and calibration samples to eliminate systematic errors as much as possible.

One of the first questions to answer before beginning an analysis is, "What is the maximum error that I can tolerate in the result?" The answer to this question usually determines the time required to do the work. For example, a tenfold increase in accuracy may take hours, days, or even weeks of added labor.

No one can afford to waste time generating data that are more reliable than is needed. On the other hand, data that is known to only 1 or 2 significant figures is often useless.

In the lab you will generally carry three to five aliquots of a sample through an entire analytical procedure so that you can determine the variation between individual measurements.

We then describe a central or best value for the set of data by determining the mean.

XXXXXX

Mean, arithmetic mean, or average is obtained by dividing the sum of the experimental measurements by the number of measurements in the set.

Median is the middle value of a ranked order of the measurements.

XXXXXX

Suppose we are determining the amount of iron(III) in water samples. Six equal portions of an aqueous solution that contained exactly 20.00 ppm of iron(III) are analyzed in exactly the same way.

Note that the results range from a low of 19.4 ppm to a high of 20.3 ppm of iron(III). The average of the data is 19.8 ppm to three significant figures.

Mean is usually written as x(bar). Mean of the iron(III) data is 19.78 and median is 19.7 – determined by the average of the middle two points.

Precision describes the reproducibility of measurements that is, the closeness of results that have been obtained in exactly the same way.

Accuracy indicates the closeness of the measurement to its true or accepted value and is expressed by the error in the measurement.

XXXXXX

This transparency illustrates the basic difference between accuracy and precision. Accuracy measures agreement between a result and its true value. Precision describes the agreement among several results that have been measured in the same way.

Precision is determined by simply repeating a measurement. On the other hand, accuracy can never be determined exactly because the true value of a quantity can never be known exactly. An accepted value must be used instead.

Accuracy is expressed in terms of either absolute or relative uncertainty.

Equations for both are given on page 51. Most of Chapter 3 deals with significant figures and how to determine the absolute and relative uncertainties of calculated quantities. Be sure you understand how these are calculated.

Let’s look at random errors and their effects on the precision of measurements.

Suppose we have four different random errors that combine to give an overall error of a measurement. We will assume that each error has an equal probability of occurring and that each can cause the final result to be high or low by a fixed amount ± U.

XXXXXX

This transparency shows all the possible ways these four errors can combine to give the indicated deviations from the mean value. Note that only one combination leads to a deviation of + 4 U, four combinations give a deviation of + 2 U, and six give a deviation of 0 U.

XXXXXX

If we plot this deviation from the mean we get the top graph in this transparency. The middle graph is for 10 random errors in an experiment.

We see that the most frequent occurrence is zero deviation from the mean. At the other extreme, a maximum deviation of 10 U occurs only about once in 500 measurements.

The bottom graph is for an experiment with a very large number of individual errors which has a bellshaped curve that is called a Gaussian curve or a normal error curve.

XXXXXX

We find empirically that the distribution of replicate data from quantitative analytical experiments approaches that of the Gaussian curve. As an example, consider the data in this Table for 50 different calibration of a 10mL pipet.

The results vary from a low of 9.969 mL to a high of 9.994 mL. This 0.025 mL spread of data results directly from an accumulation of all of the random uncertainties in the experiment.

XXXXXX

The information in the Table is easier to visualize when the data are rearranged into frequency distribution groups. Here, we tabulate the number of data falling into a series of adjacent 0.003mL bins.

This plot is called a histogram. We can imagine that as the number of measurements increases, the histogram would approach the shape of the continuous curve shown as plot B.

This curve is a Gaussian curve derived for an infinite set of data having the same mean and the same precision.

So we see that variations in replicate results arise from numerous small and individually undetectable random errors. Such small errors ordinarily tend to cancel one another and thus have a minimal effect so that most of the values are close to the mean.

Occasionally, however, they occur in the same direction and produce a large positive or negative net error.

Now let us turn to statistical treatments of random error. Statistical analysis of analytical data is based upon the assumption that random errors in an analysis follow a Gaussian, or normal distribution.

In statistics, a finite number of experimental observations is called a sample. The sample is treated as a tiny fraction of an infinite number of observations that could, in principle, be made given infinite time. Statisticians call the theoretical infinite number of data a population.

Statistical laws have been derived assuming a population of data; often they must be modified substantially when applied to a small sample because a few data may not be representative of the population.

The population mean is given the symbol m mu, while the sample mean is x(bar).

In the absence of systematic error, the population mean is also the true value for the measured quantity.

More often than not, particularly when N is small, m differs from x(bar) because a small sample of data does not exactly represent its population. For example, tt is quite possible that if you take only 3 measurements, that all three could be above the actual mean of many measurements.

The probable difference between x(bar) and m decreases rapidly as the number of measurements making up the sample increases; ordinarily, by the time N reaches 20 to 30, this difference is negligible.

Unfortunately (or fortunately) it is rarely cost effective to repeat the same measurement in analytical chemistry 30 times.

Three terms are widely used to describe the precision of a set of experimental data:

the standard deviation, variance, and coefficient of variation.

XXXXXXX

The population standard deviation s (sigma), which is a measure of the precision of a

population of data, and is given by

______

s = Ö S( xi - m)2

XXXXXXX

Here are two distribution curves for two populations of data that differ only in their standard deviations. The standard deviation for the data set yielding the broader but lower curve B is twice that for the measurements yielding curve A. So random errors in measurement B are greater than in A.

One of the goals of analytical chemistry is to find methods that have fewer random errors. Thus methods with a low standard deviation are favored over ones with large standard deviations.

XXXXX

The bottom graph gives another type of normal error curve in which the abscissa is expressed in units of s by the variable z, which is defined as

z = (x - m) / s

Plotted this way the two curves A and B above are identical here. z is a function of the standard deviation – i.e. of measurement precision.

XXXXXX

This generalized Gaussian or normal error curve has several important properties.

(1) The mean occurs at the central point of maximum frequency.

(2) There is a symmetrical distribution of positive and negative deviations about the maximum.

(3) There is an exponential decrease in frequency as the magnitude of the deviations increases.

(4) It can be shown that, regardless of its width, 68.3% of the area beneath a Gauss curve for a population of data lies within one standard deviation (± ls) of the mean m.

Thus, 68.3% of the data making up the population lie within one standard deviation of the mean.

Furthermore, 95.5% of all data are within ± 2s of the mean and 99.7% are within ± 3s.

Thus it is important to realize that although most measurements are written as

Value ± 1 sdev, this range only includes about 2/3 of the measurements in a normal procedure.

Because of area relationships such as these, the standard deviation of a population of data is a useful predictive tool. For example, we can say that the chances are 68.3 in 100 that the random uncertainty of any single measurement is no more than ± 1 s. Similarly, the chances are 95.5 in 100 that the error is less than 2s, and so forth.

Now let's look at sample statistics. Remember samples are finite data sets. The sample standard deviation is given by the equation

______

s = Ö S( xi - x(bar))2

N - 1

The term (N – 1) is called the number of degrees of freedom and is 1 less than the number of data points.

The N 1 term adjusts the math for the fact that small numbers of determinations may not be spread evenly over the Gaussian shape of all values.

You can see that the standard deviation of a single measurement is meaningless as it involves a divide-by-zero error. This is why measurements are repeated a number of times to get a reasonable N and to get a mean that more likely represents the true value.

Most scientific calculators have the standard deviation function written in. However be sure you use the sample standard deviation s, not the population standard deviation s. On some calculators the difference is denoted by n or n-1 on the sigma character.