Regression Exercise #1
Dr. Lambert
Rsch 6110
UNC Charlotte
- Design a study to determine whether the temperature, as recorded by the thermometer in your backyard, agrees with the temperature reported by the local cable station. Assume the cable station is reporting the current temperature as recorded at the local airport 10 miles from your house.
- Now suppose that you implement the study you designed and collect the airport temperature and your backyard temperature for 30 days. You record the airport temperature in degrees Celsius and Fahrenheit. You record the backyard temperature only in degrees Fahrenheit.
- Without looking at any data, what would you expect the regression equation to be if you used the airport degrees Celsius values to predict the airport degrees Fahrenheit?
- What you expect the r2 to be?
- Suppose the temperature in your backyard is typically 5 degrees hotter than at the airport. What would you expect the regression equation to be if you used airport degrees Fahrenheit to predict backyard degrees Fahrenheit?
- Suppose the temperature in your backyard is typically 3 degrees colder than at the airport. What would you expect the regression equation to be if you used airport degrees Fahrenheit to predict backyard degrees Fahrenheit?
- What would you expect r2 to be for the situations described in questions 5 & 6?
- Now open the file called Regression Exercise #1 Data on the website. Use the data contained on the Temperature Problem tab to find the regression equations and r2 values for the following problems:
Airport Celsius predicting Airport Fahrenheit
Airport Celsius predicting Backyard Fahrenheit
Airport Fahrenheit predicting Backyard Fahrenheit
Regression Exercise #2
Dr. Lambert
Rsch 6110
UNC Charlotte
- You are the manager of a small manufacturing unit within a larger company. Your unit makes a special kind of cable used in the telecommunications industry. Your supervisor asks you to determine how much it costs to make a single unit so that he can use the information to bid on some potential business. Open the file called Regression Exercise #2 Data on the website. The data contained on the tab labeled Productivity Problem represents your weekly costs and number of units produced for the past year. Use descriptive statistics only to estimate the cost of producing a single unit.
- Now use regression to estimate the cost of producing a single unit.
- Interpret each component of the regression equation. What does the y-intercept mean in the context of this problem? What does the slope mean in the context of this problem? How can they help you give your supervisor a more complete picture of the cost of production?
- How much will it cost to produce 643 units in one week?
- Use the table of residuals to calculate r2.
Regression Exercise #3
Dr. Lambert
Rsch 6110
UNC Charlotte
1. Use the data contained in the file called Regression Exercise #3 to answer the following question: Can an individuals Blood Alcohol Content (BAC) be predicted given knowledge of how many beers they have consumed? The data was collected by randomly assigning each one of 15 volunteer college students to drink a specific number of beers in a designated time period. The BAC was then measured for all subjects.
2. Begin by calculating descriptive statistics on the data and looking for outliers.
3. Perform a regression analysis using the Data Analysis function in Excel.
4. What would you conclude about the relationship between Beers consumed and BAC?
Regression Exercise #4
Dr. Lambert
Rsch 6110
UNC Charlotte
1. Use the data contained in the file Regression Exercise #4 to answer the following question. Can the weight in grams of an infant in the neonatal intensive care unit (NICU) be predicted given knowledge of their gestational age at the time of the observation? The data was collected from all of the babies who were in the NICU of a local hospital during the last calendar year.
2. Begin by calculating descriptive statistics on each variable. What do you observe? Are there any outliers?
3. Perform a regression analysis using the Data Analysis function in Excel.
4. What would you conclude about the relationship between gestational age at observation and weight for this population?
5. Now examine the scatterplot and residual plot. What do you observe?
6. Suppose you present your findings to a group of health care providers at the local hospital. In the course of the conversation, it comes out that the data you were given includes many infants who are not the typical NICU patient. Many of the children have been in the NICU for quite a while, are quite heavier than other babies in the unit, and have major health complications. You recommend that the hospital staff identify the babies that represent the more typical patients so you can redo the analysis. The new data contains only infants who were truly born prematurely, have not been in the NICU for an extended period of time, and do not have major health complications. Perform another regression analysis using the second set of observations.
7. How do your conclusions differ from those following the first analysis?
8. Create a Contingency Table that shows your prediction for the weight of an infant for each of the following gestational ages: 24, 26, 28, 30, and 32.
Running Regression Analyses Through Microsoft Excel
1. If the Analysis ToolPak has not been installed, open Excel and click on Tools on the top tool bar. Next, click on Add-Ins…, then select Analysis ToolPak. It will then proceed with installation of the data analysis module. It does not take very long. If you are running Excel on a network, the network will retrieve the necessary files from the server. If you are running Excel on a desktop machine, you may have to insert the installation CDs that came from Microsoft.
2. Open the file called Regression Exercise #3.xls on the desktop of your machine. This file provides an example of regression output that we will recreate together. This example is from the Regression chapter in the Yates book.
3. Next, enter the data to be analyzed into an Excel spreadsheet. The Beer data is already entered into the spreadsheet. It is easiest to enter that data in a data matrix format so that each column contains a separate variable and each row contains an individual observation. I typically ask the students to use some exploratory and descriptive data analysis methods at this point to become familiar with the data.
4. To run the regression analysis, click on Tools, then click on Data Analysis, then select Regression.
5.Enter the range of cells that contains the data for the X and Y variables. For example, if the predictor data are contained in the first column and there are 10 observations and a column heading, then you would enter a2:a11 as the X range.
6. Select all of the options under Residuals. This will produce a scatterplot and a residual plot as well as standardized residuals.
7. The regression output will appear on a separate spreadsheet. Click on the tab at the bottom of the data sheet that represents the new sheet.
8. The output will not necessarily look real pretty. I usually have the students reformat the column widths to 15. I also suggest that you have them write a note at the top of the page to remind themselves about the X and Y variables. It also helps to reformat the number of decimal points shown on the output to a standard number like 3 or 4. The plots can easily be resized and reformatted as well.
9. The links called Regression Exercise #1 - #4 on my website give examples of class exercises I use to introduce the topic of regression.