# Linear Regression Problems

1

Linear Regression Problems

- As Earth’s population continues to grow, the solid waste generated by the population grows with it. Governments must plan for disposal and recycling of ever growing amounts of solid waste. Planners can use data from the past to predict future waste generation and plan for enough facilities for disposing of and recycling the waste.

Given the following data on the waste generated in Florida from 1990-

1994, how can we construct a function to predict the waste that was generated in the years 1995-1999? The scatter plot is shown in Figure 1.85.

Year / Tons of Solid Waste Generated (in thousands)1990 / 19,358

1991 / 19,484

1992 / 20,293

1993 / 21,499

1994 / 23,561

a)Make a scatterplot of the data, letting x represent the number of years since 1990.

b)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

c)Graph the function of best fit with the scatterplot of the data.

d)With each function found in part (b), predict the average tons of waste in 2000 and 2005, and determine which function gives the most realistic predictions.

- The numbers of insured commercial banks y (in thousands) in the United States for the years 1987 to 1996 are shown in the table. (Source: Federal Deposit Insurance Corporation).

Year / 1987 / 1988 / 1989 / 1990 / 1991 / 1992 / 1993 / 1994 / 1995 / 1996

y / 13.70 / 13.12 / 12.71 / 12.34 / 11.92 / 11.46 / 10.96 / 10.45 / 9.94 / 9.53

Make a scatterplot of the data, letting x represent the number of years since 1987.

a)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

b)Graph the function of best fit with the scatterplot of the data.

c)With each function found in part (b), predict the average number of insured commercial banks in 2000 and 2005, and determine which function gives the most realistic predictions.

e) Plot the actual data and the model you selected on the same graph. How

closely does the model represent the data?

- U.S. Farms. As the number of farms has decreased in the United States, the average size of the remaining farms has grown larger, as shown in the table below.

Year / Average Acreage Per Farm

1910 / 139

1920 / 149

1930 / 157

1940 / 175

1950 / 216

1959 / 303

1969 / 390

1978 / 449

1987 / 462

1997 / 487

a)Make a scatterplot of the data, letting x represent the number of years since 1900.

b)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

c)Graph the function of best fit with the scatterplot of the data.

d)With each function found in part (b), predict the average acreage in 2000 and 2010 and determine which function gives the most realistic predictions.

- Sports The winning times (in minutes) in the women’s 400-meter freestyle swimming event in the Olympics from 1936 to 1996 are given by the following ordered pairs.

a)Make a scatterplot of the data, letting x represent the number of years since 1972.

b)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

c)Graph the function of best fit with the scatterplot of the data.

d) Plot the actual data and the model you selected on the same graph. How closely does the model represent the data?

Quadratic Regression Problems

- The following data was obtained by throwing a rubber ball at a CBR.

Time (sec) / Height (m)

0.0000 / 1.03754

0.1080 / 1.40205

0.2150 / 1.63806

0.3225 / 1.77412

0.4300 / 1.80392

0.5375 / 1.71522

0.6450 / 1.50942

0.7525 / 1.21410

0.8600 / 0.83173

a)Use the data above to make a scatterplot, letting x represent the number of seconds elapsed.

b)Next, use a graphing calculator to find the model that best expresses the height and vertical velocity of the rubber ball. We can also use this model to predict the maximum height of the ball and its vertical velocity when it hits the face of the CBR.

c)Fit linear, quadratic, cubic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

d)Graph the function of best fit with the scatterplot of the data.

e)Determine the maximum height of the ball (in meters).

f)With the model you selected in part (b), predict when the height of the ball is at least 1.5 meters.

- Stopping Distance A state highway patrol safety division collected the data on stopping distances in Table 2.16.

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, cubic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the stopping distance for a vehicle traveling at 25 mph.

e)Use the regression model to predict the speed of a car if the stopping distance is 300 ft.

Table 2.16 Highway Safety Division

Speed (mph) / Stopping Distance (ft)10 / 15.1

20 / 39.9

30 / 75.2

40 / 120.5

50 / 175.9

- Home Schooling Growth The estimated number of U.S. children that were home-schooled in the years from 1992 to 1997 were:

Table 1.13 Home Schooling

Year / Number1992 / 703,000

1993 / 808,000

1994 / 929,000

1995 / 1,060,000

1996 / 1,220,000

1997 / 1,347,000

(a)Produce a scatter plot of the number of children home-schooled in thousands (y) as a function of years since 1990 (x).

(b)Find the linear regression equation. (Round the coefficients to the nearest 0.01.)

(c)Does the value of suggest that the linear model is appropriate?

(d)Find the quadratic regression equation. (Round the coefficients to the nearest 0.01.)

(e)Does the value of suggest that a quadratic model is appropriate?

(f)Use both curves to predict the number of U.S. children that are home-schooled in the year 2005. How different are the estimates?

(g)Writing to Learn Use the results of this exploration to explain why it is risky to use regression equations to predict y-values for x values that are not very close to the data points, even when the curves fit the data points very well.

- Leisure Time The following table shows the median number of hours of leisure time that Americans had each week in various years.

Year / Median Number of Leisure Hours Per Week

1973, 0 / 26.2

1980, 7 / 19.2

1987, 14 / 16.6

1993, 20 / 18.8

1997, 24 / 19.5

Source: Louis Harris and Associates

(a)Make a scatterplot of the data, letting x represent the number of years since 1973, and determine which model best fits the data.

(b)Use a graphing calculator to fit the type of function determined in part (a) to the data.

(c)Graph the equation with the scatterplot. Then, use the function found in part (b) to estimate the number of leisure hours per week in 1978,1990, and 2005.

- On-line Travel Revenue With the explosion of increased Internet use, more and more travelers are booking their travel reservations on-line. The following table lists the total on-line revenue for recent years. Most of the revenue is from airline tickets.

Year / On-Line Travel Revenue (In Millions)

1996 / $ 276

1997 / 827

1998 / 1900

1999 / 3200

2000 / 4700

2001 / 6500

2002 / 8900

*Source: Travel and Interactive Technology 1999*

(a)Create a scatterplot of the data. Let x= the number of years since 1996.

(b)Use a graphing calculator to fit the data with linear, quadratic, and exponential functions. Determine which function has the best fit.

(c)Graph all three functions found in part (b) with the scatterplot in part (a).

(d)Use the functions found in part (b) to estimate the on-line travel revenue in 2010. Which function provides the most realistic prediction?

Quartic Regression Problems

- Consumer Debt Nonmortgage consumer debt is mounting in the United States, as shown in the table below.

Year / Non-mortgage Debt (In Billions)

1989 / $ 762

1990 / 789

1991 / 783

1992 / 775

1993 / 804

1994 / 902

1995 / 1038

1996 / 1161

1997 / 1216

1998 / 1266

f)Draw a scatter plot of the data.

g)Fit linear, exponential, power, cubic, and quartic functions to the data. By comparing the values of, determine the function that best fits the data.

h)Superimpose the regression curve on the scatter plot.

i)Use the regression model to predict when consumer debt will reach 1400 billion dollars.

*Declining Number of Farms in the United States*Today U.S. farm acreage is about the same as it was in the early part of the twentieth century, but the number of farms has shrunk.

Year / Number of Farms (in millions)

1910 / 6.4

1920 / 6.5

1930 / 6.3

1940 / 6.1

1950 / 5.4

1959 / 3.7

1969 / 2.7

1978 / 2.3

1987 / 2.1

1997 / 1.9

Looking at the table above, we note that the data could be modeled with a cubic or a quartic function.

(a)Model the data with both cubic and quartic functions. Let the first coordinate of each data point be the number of years after 1900. That is, enter the data as (10, 6.4), (20, 6.5), and so on. Then using , the coefficient of determination, decide which functions is the better fit. The -value gives an indication of how well the function fits the data. The closer is to 1, the better the fit.

(b)Graph the function with the scatterplot of the data.

(c)Use the answer to part (a) to estimate the number of farms in 1900, 1975, and 2003.

Exponential Regression Problems

- In the years before the Civil War, the population of the United States grew rapidly, as shown in the following table from the U.S. Bureau of the Census.

Year / Population in Millions

1790 / 3.93

1800 / 5.31

1810 / 7.24

1820 / 9.64

1830 / 12.86

1840 / 17.07

1850 / 23.19

1860 / 31.44

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the population in 1870.

e)Use the regression model to predict the population in 1930. Explain why/why not you feel this prediction has validity. (Hint: you may want to complete this problem after you finish the problem dealing with Census records after the Civil War.)

- Projected Number of Alzheimer’s Patients: German psychiatrist Alois Alzheimer first described the disease, later called Alzheimer’s disease, in 1906. Since life expectancy has significantly increased in the last century, the number of Alzheimer’s patients has increased dramatically. The number of patients in the United States reached 4 million in 2000. The following table lists projected data regarding the number of Alzheimer’s patients in years beyond 2000.

Year, x / Projected Number of Alzheimer’s Patients in the United States (In millions)

2000 / 4.0

2010 / 5.8

2020 / 6.8

2030 / 8.7

2040 / 11.8

2050 / 14.3

a)Draw a scatter plot of the data.

b)Fit linear, exponential, power, logistic and logarithmic functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to estimate the number of Alzheimer’s patients in 2005, 2025, and 2100.

3. Number of physicians: The following table contains data regarding the number of physicians in the United States in selected years.

Year / Total Number of Physicians1950 / 219,997

1955 / 241.711

1960 / 260.484

1965 / 292,088

1970 / 334,028

1975 / 393,742

1980 / 467,679

1985 / 552,716

1990 / 615,421

1994 / 684,414

1995 / 720,325

1996 / 737,764

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the population in 1975.

e)Use the regression model to estimate the number of physicians in 2000 and 2025.

- Credit Card Volume: The total credit card volume for Visa, MasterCard, American Express, and Discover has increased dramatically in recent years, as shown in the table below. (Source, CardWeb Inc.’s CardData)

Year, x / Credit Card Volume, y (In Billions)

1988 / 261.0

1989 / 296.3

1990 / 338.4

1991 / 361.0

1992 / 403.1

1993 / 476.7

1994 / 584.8

1995 / 701.2

1996 / 798.3

1997 / 885.2

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the credit card volume in 2003 and in 2010.

Logarithmic Regression Problems

1. Forgetting In an art class, students were tested at the end of the course on a final exam. Then they were retested with an equivalent test at subsequent time intervals. Their scores after time t, in months, are given in the table.

Time, t (in months) / Score, y1 / 84.9%

2 / 84.6%

3 / 84.4%

4 / 84.2%

5 / 84.1%

6 / 83.9%

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, logarithmic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict test scores after 8, 10, 24, and 36 months.

e)After how long will the test scores fall below 82%?

2Jamie, a meteorologist, is interested in finding a function that explains the relation between the height of a weather balloon (in kilometers) and the atmospheric pressure (measured in millimeters of mercury) on the balloon. She collects the data shown in Table 10.

a)Using a graphing utility, draw a scatter diagram of the data with atmospheric pressure as the independent variable.

b)Fit linear, quadratic, logarithmic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the function in part (b) to predict the height of the weather balloon if the atmospheric pressure is 560 millimeters of mercury.

3. Economics and Marketing The following data represent the price and quantity supplied in 2005 for IBM personal computers.

Price ($/Computer) / Quantity Supplied2300 / 180

2000 / 173

1700 / 160

1500 / 150

1300 / 137

1200 / 130

1000 / 113

(a)Using a graphing utility, draw a scatter diagram of the data with price as the dependent variable.

(b)Using a graphing utility, try a variety of function families. Compare the values to find the function that best fits the data.

(c)Using a graphing utility, draw the function found in part (b) on the scatter diagram.

(d)Use the function found in part (b) to predict the number of IBM personal computers that will be supplied if the price is $1650.

Power Regression Problems

- Use the data in the table below to obtain a model for speed p versus distance traveled d. Consider linear, quadratic, exponential, power, and quartic models. Then use the model you selected as the best fit to predict the speed of the ball at impact, given that impact occurs when m.

Table 2.12 Rubber Ball Data from CBR Experiment

Distance (m) / Speed (m/s)0.00000 / 0.00000

0.04298 / 0.82372

0.16119 / 1.71163

0.35148 / 2.45860

0.59394 / 3.05209

0.89187 / 3.74200

1.25557 / 4.49558

- The length of time that a planet takes to make one complete rotation around the sun is its year. The table shows the length (in earth years) of each planet’s year and the distance of that planet from the sun (in millions of miles). Find a model for this data in which x is the length of the year and y the distance from the sum.

Planet / Year / Distance

Mercury / .24 / 36.0

Venus / .62 / 67.2

Earth / 1 / 92.9

Mars / 1.88 / 141.6

Jupiter / 11.86 / 483.6

Saturn / 29.46 / 886.7

Uranus / 84.01 / 1783.0

Neptune / 164.79 / 2794.0

Pluto / 247.69 / 3674.5

*Cholesterol Level and the Risk of Heart Attack.*The data in the following table show the relationship of cholesterol level in men to the risk of a heart attack.

Cholesterol Level, x / Men, Per 10,000, Who Suffer A Heart Attack, y

100 / 30

200 / 65

250 / 100

275 / 130

300 / 180

(a)Use a graphing calculator to fit a model function to the data. Consider linear, exponential, power, and cubic functions.

(b)Graph the function with the scatterplot of the data.

(c)Use the answer to part (a) to estimate the heart attack rate for men with cholesterol levels of 150, 350, and 400.

Logistic Regression Problems

- After the Civil War, the U.S. population increased, as shown below.

Year / Population in Millions

1870 / 38.56

1880 / 50.19

1890 / 62.98

1900 / 76.21

1910 / 92.23

1920 / 106.02

1930 / 123.20

1940 / 132.16

1950 / 151.33

1960 / 179.32

1970 / 202.30

1980 / 226.54

1990 / 248.72

2000 / 281.42

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the data. By comparing the values of, determine the function that best fits the data.

c)Use the regression model to predict the population in 1975 and in 2010. Explain why/why not you feel this prediction has validity.

- Effect of Advertising A company introduces a new software product on a trial run in a city. They advertised the product on television and found the following data relating the percent P of people who bought after x ads were run.

Number of Ads, x / % Who Bought, P

0 / 0.2

10 / 0.7

20 / 2.7

30 / 9.2

40 / 27

50 / 57.6

60 / 83.3

70 / 94.8

80 / 98.5

90 / 99.6

Draw a scatter plot of the data. Then, fit linear, exponential, power, logistic and logarithmic functions to the data. By comparing the values of, determine the function that best fits the data. Then use the regression model to predict the percent P of people who will buy the software after 100 ads are run.

* Relate what you have discovered in this exercise to what you have observed in television ads. What could the company do to change this pattern?

Project AMP A Quesada Director Project AMP