Linear Regression Problems

Linear Regression Problems

1

Linear Regression Problems

  1. As Earth’s population continues to grow, the solid waste generated by the population grows with it. Governments must plan for disposal and recycling of ever growing amounts of solid waste. Planners can use data from the past to predict future waste generation and plan for enough facilities for disposing of and recycling the waste.

Given the following data on the waste generated in Florida from 1990-

1994, how can we construct a function to predict the waste that was generated in the years 1995-1999? The scatter plot is shown in Figure 1.85.

Year / Tons of Solid Waste Generated (in thousands)
1990 / 19,358
1991 / 19,484
1992 / 20,293
1993 / 21,499
1994 / 23,561

a)Make a scatterplot of the data, letting x represent the number of years since 1990.

b)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

c)Graph the function of best fit with the scatterplot of the data.

d)With each function found in part (b), predict the average tons of waste in 2000 and 2005, and determine which function gives the most realistic predictions.

  1. The numbers of insured commercial banks y (in thousands) in the United States for the years 1987 to 1996 are shown in the table. (Source: Federal Deposit Insurance Corporation).

Year / 1987 / 1988 / 1989 / 1990 / 1991 / 1992 / 1993 / 1994 / 1995 / 1996
y / 13.70 / 13.12 / 12.71 / 12.34 / 11.92 / 11.46 / 10.96 / 10.45 / 9.94 / 9.53

Make a scatterplot of the data, letting x represent the number of years since 1987.

a)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

b)Graph the function of best fit with the scatterplot of the data.

c)With each function found in part (b), predict the average number of insured commercial banks in 2000 and 2005, and determine which function gives the most realistic predictions.

e) Plot the actual data and the model you selected on the same graph. How

closely does the model represent the data?

  1. U.S. Farms. As the number of farms has decreased in the United States, the average size of the remaining farms has grown larger, as shown in the table below.

Year / Average Acreage Per Farm
1910 / 139
1920 / 149
1930 / 157
1940 / 175
1950 / 216
1959 / 303
1969 / 390
1978 / 449
1987 / 462
1997 / 487

a)Make a scatterplot of the data, letting x represent the number of years since 1900.

b)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

c)Graph the function of best fit with the scatterplot of the data.

d)With each function found in part (b), predict the average acreage in 2000 and 2010 and determine which function gives the most realistic predictions.

  1. Sports The winning times (in minutes) in the women’s 400-meter freestyle swimming event in the Olympics from 1936 to 1996 are given by the following ordered pairs.

a)Make a scatterplot of the data, letting x represent the number of years since 1972.

b)Use a graphing calculator to fit linear, quadratic, cubic, and power functions to the data. By comparing the values of , determine the function that best fits the data.

c)Graph the function of best fit with the scatterplot of the data.

d) Plot the actual data and the model you selected on the same graph. How closely does the model represent the data?

Quadratic Regression Problems

  1. The following data was obtained by throwing a rubber ball at a CBR.

Time (sec) / Height (m)
0.0000 / 1.03754
0.1080 / 1.40205
0.2150 / 1.63806
0.3225 / 1.77412
0.4300 / 1.80392
0.5375 / 1.71522
0.6450 / 1.50942
0.7525 / 1.21410
0.8600 / 0.83173

a)Use the data above to make a scatterplot, letting x represent the number of seconds elapsed.

b)Next, use a graphing calculator to find the model that best expresses the height and vertical velocity of the rubber ball. We can also use this model to predict the maximum height of the ball and its vertical velocity when it hits the face of the CBR.

c)Fit linear, quadratic, cubic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

d)Graph the function of best fit with the scatterplot of the data.

e)Determine the maximum height of the ball (in meters).

f)With the model you selected in part (b), predict when the height of the ball is at least 1.5 meters.

  1. Stopping Distance A state highway patrol safety division collected the data on stopping distances in Table 2.16.

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, cubic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the stopping distance for a vehicle traveling at 25 mph.

e)Use the regression model to predict the speed of a car if the stopping distance is 300 ft.

Table 2.16 Highway Safety Division

Speed (mph) / Stopping Distance (ft)
10 / 15.1
20 / 39.9
30 / 75.2
40 / 120.5
50 / 175.9
  1. Home Schooling Growth The estimated number of U.S. children that were home-schooled in the years from 1992 to 1997 were:

Table 1.13 Home Schooling

Year / Number
1992 / 703,000
1993 / 808,000
1994 / 929,000
1995 / 1,060,000
1996 / 1,220,000
1997 / 1,347,000

(a)Produce a scatter plot of the number of children home-schooled in thousands (y) as a function of years since 1990 (x).

(b)Find the linear regression equation. (Round the coefficients to the nearest 0.01.)

(c)Does the value of suggest that the linear model is appropriate?

(d)Find the quadratic regression equation. (Round the coefficients to the nearest 0.01.)

(e)Does the value of suggest that a quadratic model is appropriate?

(f)Use both curves to predict the number of U.S. children that are home-schooled in the year 2005. How different are the estimates?

(g)Writing to Learn Use the results of this exploration to explain why it is risky to use regression equations to predict y-values for x values that are not very close to the data points, even when the curves fit the data points very well.

  1. Leisure Time The following table shows the median number of hours of leisure time that Americans had each week in various years.

Year / Median Number of Leisure Hours Per Week
1973, 0 / 26.2
1980, 7 / 19.2
1987, 14 / 16.6
1993, 20 / 18.8
1997, 24 / 19.5

Source: Louis Harris and Associates

(a)Make a scatterplot of the data, letting x represent the number of years since 1973, and determine which model best fits the data.

(b)Use a graphing calculator to fit the type of function determined in part (a) to the data.

(c)Graph the equation with the scatterplot. Then, use the function found in part (b) to estimate the number of leisure hours per week in 1978,1990, and 2005.

  1. On-line Travel Revenue With the explosion of increased Internet use, more and more travelers are booking their travel reservations on-line. The following table lists the total on-line revenue for recent years. Most of the revenue is from airline tickets.

Year / On-Line Travel Revenue (In Millions)
1996 / $ 276
1997 / 827
1998 / 1900
1999 / 3200
2000 / 4700
2001 / 6500
2002 / 8900

Source: Travel and Interactive Technology 1999

(a)Create a scatterplot of the data. Let x= the number of years since 1996.

(b)Use a graphing calculator to fit the data with linear, quadratic, and exponential functions. Determine which function has the best fit.

(c)Graph all three functions found in part (b) with the scatterplot in part (a).

(d)Use the functions found in part (b) to estimate the on-line travel revenue in 2010. Which function provides the most realistic prediction?

Quartic Regression Problems

  1. Consumer Debt Nonmortgage consumer debt is mounting in the United States, as shown in the table below.

Year / Non-mortgage Debt (In Billions)
1989 / $ 762
1990 / 789
1991 / 783
1992 / 775
1993 / 804
1994 / 902
1995 / 1038
1996 / 1161
1997 / 1216
1998 / 1266

f)Draw a scatter plot of the data.

g)Fit linear, exponential, power, cubic, and quartic functions to the data. By comparing the values of, determine the function that best fits the data.

h)Superimpose the regression curve on the scatter plot.

i)Use the regression model to predict when consumer debt will reach 1400 billion dollars.

  1. Declining Number of Farms in the United States Today U.S. farm acreage is about the same as it was in the early part of the twentieth century, but the number of farms has shrunk.

Year / Number of Farms (in millions)
1910 / 6.4
1920 / 6.5
1930 / 6.3
1940 / 6.1
1950 / 5.4
1959 / 3.7
1969 / 2.7
1978 / 2.3
1987 / 2.1
1997 / 1.9

Looking at the table above, we note that the data could be modeled with a cubic or a quartic function.

(a)Model the data with both cubic and quartic functions. Let the first coordinate of each data point be the number of years after 1900. That is, enter the data as (10, 6.4), (20, 6.5), and so on. Then using , the coefficient of determination, decide which functions is the better fit. The -value gives an indication of how well the function fits the data. The closer is to 1, the better the fit.

(b)Graph the function with the scatterplot of the data.

(c)Use the answer to part (a) to estimate the number of farms in 1900, 1975, and 2003.

Exponential Regression Problems

  1. In the years before the Civil War, the population of the United States grew rapidly, as shown in the following table from the U.S. Bureau of the Census.

Year / Population in Millions
1790 / 3.93
1800 / 5.31
1810 / 7.24
1820 / 9.64
1830 / 12.86
1840 / 17.07
1850 / 23.19
1860 / 31.44

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the population in 1870.

e)Use the regression model to predict the population in 1930. Explain why/why not you feel this prediction has validity. (Hint: you may want to complete this problem after you finish the problem dealing with Census records after the Civil War.)

  1. Projected Number of Alzheimer’s Patients: German psychiatrist Alois Alzheimer first described the disease, later called Alzheimer’s disease, in 1906. Since life expectancy has significantly increased in the last century, the number of Alzheimer’s patients has increased dramatically. The number of patients in the United States reached 4 million in 2000. The following table lists projected data regarding the number of Alzheimer’s patients in years beyond 2000.

Year, x / Projected Number of Alzheimer’s Patients in the United States (In millions)
2000 / 4.0
2010 / 5.8
2020 / 6.8
2030 / 8.7
2040 / 11.8
2050 / 14.3

a)Draw a scatter plot of the data.

b)Fit linear, exponential, power, logistic and logarithmic functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to estimate the number of Alzheimer’s patients in 2005, 2025, and 2100.

3. Number of physicians: The following table contains data regarding the number of physicians in the United States in selected years.

Year / Total Number of Physicians
1950 / 219,997
1955 / 241.711
1960 / 260.484
1965 / 292,088
1970 / 334,028
1975 / 393,742
1980 / 467,679
1985 / 552,716
1990 / 615,421
1994 / 684,414
1995 / 720,325
1996 / 737,764

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the population in 1975.

e)Use the regression model to estimate the number of physicians in 2000 and 2025.

  1. Credit Card Volume: The total credit card volume for Visa, MasterCard, American Express, and Discover has increased dramatically in recent years, as shown in the table below. (Source, CardWeb Inc.’s CardData)

Year, x / Credit Card Volume, y (In Billions)
1988 / 261.0
1989 / 296.3
1990 / 338.4
1991 / 361.0
1992 / 403.1
1993 / 476.7
1994 / 584.8
1995 / 701.2
1996 / 798.3
1997 / 885.2

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, cubic, exponential, quartic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict the credit card volume in 2003 and in 2010.

Logarithmic Regression Problems

1. Forgetting In an art class, students were tested at the end of the course on a final exam. Then they were retested with an equivalent test at subsequent time intervals. Their scores after time t, in months, are given in the table.

Time, t (in months) / Score, y
1 / 84.9%
2 / 84.6%
3 / 84.4%
4 / 84.2%
5 / 84.1%
6 / 83.9%

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, logarithmic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the regression model to predict test scores after 8, 10, 24, and 36 months.

e)After how long will the test scores fall below 82%?

2Jamie, a meteorologist, is interested in finding a function that explains the relation between the height of a weather balloon (in kilometers) and the atmospheric pressure (measured in millimeters of mercury) on the balloon. She collects the data shown in Table 10.

a)Using a graphing utility, draw a scatter diagram of the data with atmospheric pressure as the independent variable.

b)Fit linear, quadratic, logarithmic, and power functions to the data. By comparing the values of, determine the function that best fits the data.

c)Superimpose the regression curve on the scatter plot.

d)Use the function in part (b) to predict the height of the weather balloon if the atmospheric pressure is 560 millimeters of mercury.

3. Economics and Marketing The following data represent the price and quantity supplied in 2005 for IBM personal computers.

Price ($/Computer) / Quantity Supplied
2300 / 180
2000 / 173
1700 / 160
1500 / 150
1300 / 137
1200 / 130
1000 / 113

(a)Using a graphing utility, draw a scatter diagram of the data with price as the dependent variable.

(b)Using a graphing utility, try a variety of function families. Compare the values to find the function that best fits the data.

(c)Using a graphing utility, draw the function found in part (b) on the scatter diagram.

(d)Use the function found in part (b) to predict the number of IBM personal computers that will be supplied if the price is $1650.

Power Regression Problems

  1. Use the data in the table below to obtain a model for speed p versus distance traveled d. Consider linear, quadratic, exponential, power, and quartic models. Then use the model you selected as the best fit to predict the speed of the ball at impact, given that impact occurs when m.

Table 2.12 Rubber Ball Data from CBR Experiment

Distance (m) / Speed (m/s)
0.00000 / 0.00000
0.04298 / 0.82372
0.16119 / 1.71163
0.35148 / 2.45860
0.59394 / 3.05209
0.89187 / 3.74200
1.25557 / 4.49558
  1. The length of time that a planet takes to make one complete rotation around the sun is its year. The table shows the length (in earth years) of each planet’s year and the distance of that planet from the sun (in millions of miles). Find a model for this data in which x is the length of the year and y the distance from the sum.

Planet / Year / Distance
Mercury / .24 / 36.0
Venus / .62 / 67.2
Earth / 1 / 92.9
Mars / 1.88 / 141.6
Jupiter / 11.86 / 483.6
Saturn / 29.46 / 886.7
Uranus / 84.01 / 1783.0
Neptune / 164.79 / 2794.0
Pluto / 247.69 / 3674.5
  1. Cholesterol Level and the Risk of Heart Attack. The data in the following table show the relationship of cholesterol level in men to the risk of a heart attack.

Cholesterol Level, x / Men, Per 10,000, Who Suffer A Heart Attack, y
100 / 30
200 / 65
250 / 100
275 / 130
300 / 180

(a)Use a graphing calculator to fit a model function to the data. Consider linear, exponential, power, and cubic functions.

(b)Graph the function with the scatterplot of the data.

(c)Use the answer to part (a) to estimate the heart attack rate for men with cholesterol levels of 150, 350, and 400.

Logistic Regression Problems

  1. After the Civil War, the U.S. population increased, as shown below.

Year / Population in Millions
1870 / 38.56
1880 / 50.19
1890 / 62.98
1900 / 76.21
1910 / 92.23
1920 / 106.02
1930 / 123.20
1940 / 132.16
1950 / 151.33
1960 / 179.32
1970 / 202.30
1980 / 226.54
1990 / 248.72
2000 / 281.42

a)Draw a scatter plot of the data.

b)Fit linear, quadratic, exponential, power, logarithmic, and logistic functions to the data. By comparing the values of, determine the function that best fits the data.

c)Use the regression model to predict the population in 1975 and in 2010. Explain why/why not you feel this prediction has validity.

  1. Effect of Advertising A company introduces a new software product on a trial run in a city. They advertised the product on television and found the following data relating the percent P of people who bought after x ads were run.

Number of Ads, x / % Who Bought, P
0 / 0.2
10 / 0.7
20 / 2.7
30 / 9.2
40 / 27
50 / 57.6
60 / 83.3
70 / 94.8
80 / 98.5
90 / 99.6

Draw a scatter plot of the data. Then, fit linear, exponential, power, logistic and logarithmic functions to the data. By comparing the values of, determine the function that best fits the data. Then use the regression model to predict the percent P of people who will buy the software after 100 ads are run.

* Relate what you have discovered in this exercise to what you have observed in television ads. What could the company do to change this pattern?

Project AMP A Quesada Director Project AMP