Chapter 4: More on Two-Variable Data

Section 4.1: Transforming Relationships

Nonlinear relationships between two quantitative variables can sometimes be changed into linear relationships by transforming one or both of the variables. When the variable being transformed takes only positive values, the power transformations are all monotonic. This implies that there is an inverse transformation that returns to the original data from the transformed values. Transformation is particularly effective when there is reason to think that the data are governed by some mathematical model. We can fit exponential growth and power models to data by finding the least-squares regression line for the transformed data, then doing the inverse transformation.

Section 4.2: Cautions about Correlation and Regression

Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you must be aware of their limitations, beginning with the fact that correlation and regression describe only linear relationships. Also remember that the correlation r and the least-squares regression line are not resistant. One influential observation or incorrectly entered data point can greatly change these measures.

Extrapolation:

Extrapolation is the use of a regression line for prediction far outside the domain of values of the explanatory variable x that you used to obtain the line or curve. Such predictions are not accurate.

Lurking Variable:

A lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

Confounding:

Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables.

The most important lesson of this section is that even a very strong association between two variables is not by itself good evidence that there is a cause-and-effect link between the variables. High correlation does not imply causation.

Homework: #’s 4.27 – 4.37

Section 4.3: Relations in Categorical Data

To this point we have concentrated on relationships with quantitative variables, now we will look at describing relationships between two or more categorical variables. To analyze categorical data, we use counts or percents of individuals that fall into various categories. To do this we will look at two-way tables. A two-way table is used to describe two categorical variables; the rows make up one variable while the columns make up the other variable. Data is then organized and broken down into each of the groups.

The Distributions of the totals for each row and each column are called Marginal Distributions.

One Variable with its categories
Second variable with its categories / Cat 1 / Cat 2 / Cat 3 / Cat 4 / Total
Cat 1 / X1-1 / X1-2 / X1-3 / X1-4 / Marginal Distributions
Cat 2 / X2-1 / X2-2 / X2-3 / X2-4
Total / Marginal Distributions / Total

The Distributions of one specific category with respect to its total is called a Conditional Distribution.

One Variable with its categories
Second variable with its categories / Cat 1 / Cat 2 / Cat 3 / Cat 4 / Total
Cat 1 / Conditional distribution / Conditional distribution / Conditional distribution / Conditional distribution
Cat 2
Total / T1 / T2 / T3 / T4 / Total
One Variable with its categories
Second variable with its categories / Cat 1 / Cat 2 / Cat 3 / Cat 4 / Total
Cat 1 / Conditional distribution / T1
Cat 2 / Conditional distribution / T2
Total / Total

Simpson’s Paradox:

Simpson’s Paradox refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group.

Homework: #’s 4.62 – 4.70

Chapter Review:

Homework: #’s 4.72a-c, 4.73 – 4.75, 4.81