Chapter 5 Review: Summarizing Bivariate Data

Name ______

Period______

1. During the first 3 centuries AD, the Roman Empire produced coins in the Eastern provinces. Some historians argue that not all these coins were produced in Roman mints, and further that local provincial mints struck some of them. Because the "style" of coins is difficult to analyze, the historians would like to use metallurgical analysis as one tool to identify the source mints of these coins. Investigators studied 8 coins known to have been produced by the mint in Rome in an attempt to identify a trace element profile for these coins, and have identified gold and lead as possible factors in identifying other coins as having been minted in Rome. The gold and lead content, measured as a % of weight of each coin, is given in the table at right, and a scatter plot of these data is presented below.

1. a) What is the equation of the least squares

best fit line?

b) Graph the best fit line on the scatter

plot.

c) What is the value of the correlation

coefficient? Interpret this value.

d) What is the value of the coefficient of determination? Give an interpretation of this

value.

2. Suppose that the coins analyzed in problem 1 are representative of the metallurgical content of coins minted in Rome during the first 300 years AD.

a) If a Roman coin is selected at random, and it's gold content is 0.20% by weight, calculate the predicted lead content. Be sure to use correct notation and units.

b) One of the coins used to calculate the regression equations has a gold content of 0.200%. Calculate the residual for this coin. Be sure to use correct notation and units.

c) Assess the line.(3 parts)

3. The Des Moines Register recently reported the ratings of high school sportsmanship as compiled by the Iowa High School Athletic Association. For each school the spectators and participants were rated by referees, where 1 = superior, and 5 = unsatisfactory. A regression analysis of the average scores given to wrestling spectators and wrestlers is shown below.

a) Identify and Interpret the correlation between the ratings of spectators and wrestlers?

b) Identify and Interpret the coefficient of determination.

c) Identify and Interpret the value of the standard deviation about the least squares line?

d) Identify a possible influential point by circling it on the graph. How would we tell if the point is an influential point?

e) Identify a possible outlier by putting a box around it on the graph. Estimate the order pair of this possible outlier, and find it’s residual.

f) Describe the difference between an outlier and an influential point for bivariate data.

4. The preservation of objects made of organic material is a constant concern to those caring for items of historical interest. For example, some delicate fabrics are natural silks--they are made of protein and are biodegradable. Many silks in museum collections are in danger of crumbling. It would be of great benefit to be able to assess the delicacy of the fabric before making decisions about displaying it. One possibility is chemical analysis, which might give some evidence about the brittle nature of a fabric. To investigate this possibility, bio-chemical data in the form of a ratio of the amount of certain amino acids in the fibers was acquired from the linings of sixteen 19th and early 20th century Japanese kimonos, and the tenacity (breaking stress) of the fabric was also recorded.

Using the data from the Japanese kimonos, construct the least squares best fit line predicting tenacity using amino acid ratio as a predictor.

a) What is the equation of the least-squares line?

b) Identify and interpret the slope and y-intercept.

c) Approximately what proportion of the variability in

tenacity is explained by the amino acid ratio?

Tenacity Vs. AminoAcid Ratio

d) Is a line the best way to summarize the data? Explain.

5.The theory of fiber strength suggests that the relationship between fiber tenacity and amino acid ratio is logarithmic, i.e. , where T is the tenacity and R is the amino acid ratio. Perform the appropriate transformation of variable(s) and fit this logarithmic model to the data.

a) What is the resulting best fit line using this model?

b) For an amino acid ratio of , what is the predicted tenacity?

c) Using your results so far, does it appear that the transformed model in question (5) is no improvement, a slight improvement, or a significant improvement over the linear model in question (4)? Justify your response with an appropriate statistical argument.

Transformed Scatter plotTransformed Residual Plot

6. Paleontology, the study of forms of prehistoric life, can sometimes be aided by modern biology. The study of prehistoric birds depends on fossil information, which typically consists of imprints in stone of a prehistoric creature’s remains. To study the productivity of an ancient ecosystem it would be useful know the actual mass of the individual birds, but this information is not preserved in the fossil record. It seems reasonable that the biomechanics of birds operates much the same today as in the past. For example, relationship between the wing length and total weight of a bird should be very similar today to the relationship in the distant past. The wing lengths of ancient birds are readily obtainable from the fossil record, but the weight is not. Assuming similar biomechanical development for ancient birds and modern birds, a regression model expressing the relationship between wing length and total weight of a modern bird could be used to estimate the mass of similar prehistoric birds and thus gauge some aspects of the ancient ecosystem.

Data is available for some modern birds of prey. Specifically, data on the mean wing length and mean total weight of species of hawk-like birds of prey is given below.

Wing length and total weight of

modern species of birds of prey

Bird species / Wing length
(cm) / Total weight
(kilograms)

Gyps fulvus

/ 69.8 / 7.27
Gypaetus barbatus grandis / 71.7 / 5.39
Catharista atrata / 50.2 / 1.70
Aguila chrysatus / 68.2 / 3.71
Hieraeus fasciatus / 56.0 / 2.06
Helotarsus ecaudatus / 51.2 / 2.10
Geranoatus melanoleucus / 51.5 / 2.12
Circatus gallicus / 53.3 / 1.66
Buteo bueto / 40.4 / 1.03
Pernis apivorus / 45.1 / 0.62
Pandion haliatus / 49.6 / 1.11
Circus aeruginosos / 41.3 / 0.68
Circus cyaneus (female) / 37.4 / 0.472
Circus cyaneus (male) / 33.9 / 0.331
Circus pygargus / 35.9 / 0.237
Circus macrurus / 35.7 / 0.386
Milvus milvus / 50.7 / 0.927

Using these data, construct the least squares best-fit line for predicting total weight using wing length as a predictor.

a) What is the equation of the least-squares line?

b) Approximately what proportion of the variability in weight is explained by the wing length?

7.Biological theory suggests that the relationship between the weight of these animals and their wing length is exponential, i.e. , or where W is the wing weight and L is the wing length. Perform the appropriate transformation of variable(s) and fit an exponential model to the data.

a) What is the resulting best fit line using the transformed model?

b) For a wing length of the data point where (Hieraeus fasciatus), what is the predicted bird weight? Show your work below.

c) How would you evaluate your transformed model in question (7) to see if it is an improvement over the linear model in question (6)?

8. One of the problems when estimating the size of animal populations from aerial surveys is that animals may bunch together, making it difficult to distinguish and count them accurately. For example, a horse standing alone is easy to spot; if seven horses huddled close together some may be missed, resulting in an undercount. The relative frequency of undercounts is typically reported as a percent. For example, if there are 10 horses in a group, a person in the plane may typically count fewer than 10 horses 20% of the time. In a recent study, the percent of sightings that resulted in an undercount was related to the size of the "group" of horses and donkeys; the following data were gathered:

% Undercount vs. Group Size for Horses and Donkeys

Group
Size / % Occurrence
Undercount / Group
Size / % Occurrence
Undercount
2 / 5 / 9 / 6
3 / 5 / 10 / 7
4 / 6 / 11 / 5
5 / 10 / 12 / 5
6 / 5 / 14 / 14
7 / 7 / 16 / 13
8 / 5 / 18 / 23

After fitting a straight line model, , significant curvature was detected in the residual plot, and two nonlinear models were chosen for further analysis, the exponential and the power models. The computer output for these models is given below, and the residual plots are on the next page.

(Exponential) (Power)

Residual Plots

c) Generally speaking, which of the two models, power or exponential, is better at

predicting the log (Percent Undercount)? Provide statistical justification for your

choice.

Chapter 5 Test, Form B

Page 1 of 9