Alka Indurkhya EPI 826 10/4/99

Lecture 10: Interactions and Confounders

Kleinbaum (Chapter 6)

The above strategy involves backward elimination according to a well defined criteria. There are others such as forward selection.

Kleinbaum’s recommended strategy.

1. Variable selection

A. Start with D, E, and . Remember the goal. Our model is to provide valid estimate of disease exposure relationship. So you need to specify which variable in a particular model is considered the main exposure of interest.

B. Choose V’s from C’s based on prior research or theory and considering potential statistical problems, e.g., collinearity; simplest choice is to let V’s be C’s themselves. Variables in his terminology are those that will be entered as main effects or interactions between themselves but not with exposure.

C. Choose W’s from C’s to be either V’s or product of two V’s usually recommend W’s to be C’s themselves or some subset of C’s. W’s are the components of the terms that will measure the effect of another variable in the model on the disease-exposure relationship. They represent the terms in the model and if the b coefficient for these terms is significant we will call them effect modifiers.

2. Interaction Assessment

There are several issues to consider in interaction assessment.

A. Hierarchically well-formulated (HWF) models

a. Definition: given any variable in the model, all lower-order components must also be in the model.

b. Examples of models that are and are not hierarchically well formulated.

c. Rationale: If model is not hierarchically well formulated, then tests for significance of the highest-order variables in the model may change with the coding of the variables tested; such tests should be independent of coding.

Look at the printout if we have HWF models but change coding from (0,1) for BP to (2,1) where 2 is for low BP, and 1 = MB. This type of coding is common for yes, no variables, i.e. usually yes is coded as 1, but no sometimes is coded as 0 and sometimes as 2.

B. The hierarchical backward elimination approach

a. Flow diagram representation.

b. Flow description: evaluate EViVj terms first, then EVi terms, then Vi terms last.

c. Use statistical testing for interaction terms, but decisions about Vi terms should not involve testing.

d. The hierarchical backward elimination approach is also recommened by Selvin on p. 220.

“When a logistic model is used to explore the relationships in a set of data, there are two typical ways to begin: the simplest model or the most complex model. In the first case, variables are added to the model until a useful description is achieved (“forward”). In the second case, variables are removed from the most complex model until a simpler but satisfactory statistical structure is found (“backward”). In this section the most complicated (most parameters) model serves as a starting point for analyzing the risk of a disease at k levels of a categorical risk factor at two levels of another variable, a 2 x 2 x K table.”

Confounding: Refer to Lecture 3

Review Complete, Conditional, Partial Independence relationships between categorical variables.

**SAS CODE ON NEXT PAGE**
SAS code for examples in Kleinbaum:

libname a 'a:';

data a.evans;

input strata CHD total CAT AGE ECG;

label CHD = 'CHD cases'

total = 'number at risk in stratum'

CAT = 'catecholamine level'

Age = 'Age Group'

ECG = 'ECG Abnormality';

CATAGE=CAT*AGE;

CATECG=CAT*ECG;

AGEECG=AGE*ECG;

CATAGECG=CAT*AGE*ECG;

datalines;

1 17 274 0 0 0

2 15 122 0 1 0

3 7 59 0 0 1

4 5 32 0 1 1

5 1 8 1 0 0

6 9 39 1 1 0

7 3 17 1 0 1

8 14 58 1 1 1

;

proc format;

value catfmt 0 = 'low'

1 = 'high';

value agefmt 0 = ' < 55 '

1 = '>= 55';

value ecgfmt 0 = 'normal'

1 = 'Abnormal';

run;

proc logistic data=a.evans;

model CHD/total=CAT AGE ECG CATAGE CATECG AGEECG CATAGECG;

Title 'Fully saturated model that computes strata specific OR';

run;

/* Test if ECG is a potential confounder in the CHD-CAT relationship */

proc logistic data=a.evans;

model CHD/total=CAT AGE ECG;

title 'Main effects model';

run;

proc logistic data=a.evans;

model CHD/total=CAT AGE;

title 'No ECG in model';

run;

proc logistic data=a.evans;

model CHD/total = CAT;

Title 'Crude OR for CHD-CAT';

run;

3