Stat 280 Lab 3: Correlation and Regression

Stat 280 Lab 3: Correlation and Regression

Stat 280 Lab 3: Normal Distribution and Correlation

Objectives: This lab is designed to review Chi-Square goodness of fit test and normality and introduce the concept of correlation and issues that result when describing relationships between variables.

Directions: Follow the instructions below, answering all questions. Your answers should be in the form of a brief report (MS Word), to be handed in to the instructor before you leave. Please include plots and descriptive statistics in your report.

1.Data Description –bloodtype.MTW:

We are interested in investigating the proportions of four blood types (ABO system). Somehow we believe that 34% of people have blood type A, 15% blood type B, 23% blood type AB, and 28% blood type O. We go out to campus and collect a sample of 100 students, and find the following:

A: 12 B: 56 AB: 2 O: 30

a. What is the null hypothesis in this experiment?

b. What are the expected counts for the four blood types, say A, B, AB, O in this experiment? Do they satisfy the large sample criteria?

c. What test statistic are you using to test the hypothesis? What will the distribution be in a large sample?

d. Plot a group chart for expected frequency and observed frequency. Comment.

Graph-> Chart, select Frequency in Y; and Blood in X; In Data Display :click For each , choose group ; Select Code in Group variables; Click Option, select Cluster: Select Code in Cluster. OK.

e. Carry out the chisquare goodness of fit test. Is the data compatible with your null hypothesis?

How to do chisquare goodness of fit in Minitab?

name c7 as "deviation", "Calc-> Calculator", store result in c7, expression: (c2-c3)**2/c3;

Then "Calc-> Calculator", store result in c8 ( name it as Chisq), expression: sum(c7).

Compute 1-p-value: "Probability Distribution-> Chi-square", click Cumulative probability; enter appropriate degree of freedom, select c8 (Chisq) in Input column: Enter c9 (name it as '1-pvalue') in Optional storage. OK.

Display your result: "Manip-> Display Data", Select c1-c3, c7-c9 in Display. O.K.

2. Data Description – reg1spou.MTW

This data set contains information on 193 husband-wife families whose state of residence in March 1989 was in the Northeast region of the United States. Variables in the data set include the ages, educational levels, and personal incomes for the husbands and wives, as well as family size and family income.

Educational Level Codes: 0 Did not complete first grade

1 First grade

2 Second grade

3 Third grade

4 Fourth grade

5 Fifth grade

6 Sixth grade

7 Seventh grade

8 Eighth grade

9 Ninth grade

10 Tenth grade

11 Eleventh grade

12 Twelfth grade

13 One year of college

14 Two years of college

15 Three years of college

16 Four years of college

17 Five years of college

18 Six or more years of college

In Minitab produce a scatter plot of wife educational level (Eduwife) vs. husband educational level (Eduhusb) and answer the following questions. (Graph  Plot)

  1. What are the average and the standard deviation of the wife educational level (husband educational level)? (Stat  Basic Statistics  Display Basic Statistics)
  1. What is the correlation coefficient between the two variables and how would you describe the relationship between wife-educational level and husband-educational level?

(Stat  Basic Statistics  Correlation)

  1. Is the answer in part b) what you would expect and why?

Data Description

A student in an Economics class collected the following data on 32 movies released in the period 1997-1998.

ColumnNameCountDescription

C1-TMovie32Title of the movie

C2Opening32Gross receipts for the weekend after the

movie was released (in millions of dollars)

C3Budget32The total budget for the movie (in

millions of dollars)

C4-TStar?32Whether or not the movie has a

superstar; Star or NoStar

C5-TSummer?32Whether or not the movie was released in

the summer; Summer or NoSummer

3. In Minitab, open worksheet movies.mtw.

  1. Determine the relationship between movie bugdet and opening weekend sales (e.g. use scatterplot, correlation coefficient).
  1. Construct a dotplot of Opening by variable Star. What is the relationship between opening sales and whether or not there is a Star in the movie?

(Graph  Dotplot)

  1. What are some confounding factors that may affect the relationship

between opening sales and a Star being in the movie?