Epid 200B

Winter 2008

Practice Final Exam Answer Key

Epi 200B

Questions 1-7 address: Pednekar M et al. Joint effects of tobacco use and body mass on all–cause mortality in Mumbai India: results from a population based cohort study. AJE 2008; 167:330-340

Question 1 (5 points)

The objective of the study was to explore the role of tobacco use and body mass on all cause mortality.

a. What is the source population for this cohort study?

All persons >=35 years of age (from voters lists as the sampling frame) who lived in the main city of Mumbai India between 1991-1997except for polling stations of upper middle class housing complexes (i.e. mostly affluent housing complexes).

b. List two important reasons why the researchers chose a cohort design.

They are able to study mortality even after a short follow-up period of 5 years since it is not a rare event in a cohort of subjects older than age 35.

It would be hard to assess tobacco use or BMI validly in a retrospective manner unless records documenting these behaviors exist for the entire population, which is highly unlikely (given that they do not even seem to collect death certificates for the entire population) and reports by proxies for the dead subjects may be unreliable and/or biased.

c. Why do you think the researchers did not investigate specific causes of deaths such as types of cancers?

Not even a central death registration system for all deaths in this population seems to exist; while the door-to door follow-up survey allows them to ascertain deaths such a procedure would probably not provide valid data on causes of death. Even death certificates filled out by doctors are know to be error prone and they would be more likely misreported or unknown to the proxy informant.

Question 2 (6 points)

The investigators listed 11,588 subjects as not being available for interview at follow-up and a group of 7,265 subjects whose vital status was unknown at follow-up, mostly due to the demolition of buildings units.

a. What do you call these groups of subjects?

We would consider these subjects as ‘lost to follow-up’.

b. State, how the authors dealt with each of these groups of subjects in their analyses.

They counted the 11,588 subjects as not available for interview as alive (because they where known to be alive) and censored them at the last date of attempted contact, i.e. their person time counted until that last attempt. From the paper it remains, however, unclear how they censored the 7,265 subjects whose vital status was unknown, i.e. whether they contributed person time only in the first year or throughout follow-up. They should stop counting observation time at the last time they were seen alive. If no observation time was available at all they have to be excluded from the study all together.

c. Would you be concerned about the impact the group of 7,265 subjects has on the results presented? Explain.

Heavy loss to follow-up can introduce selection bias in a cohort study. As a rule of thumb important selection bias in a cohort is more likely if more than 10% of the populations are lost to follow up, thus it seems unlikely that this loss had a large impact even if these subjects were not lost at random. Other reasons for loss to follow-up than demolition of their homes may be of more concern, like being lost for no apparent reason.

On the other hand, if loss to follow up is associated with both the exposure and the end point results will be biased. Social factor may be linked to loss to follow up; not the best houses are demolished.

Question 3 (2 points)

Describe two important potential sources of information bias in the study and the expected direction of the bias.

Mis-reporting by subjects (1) and mis-recording by interviewer (2) of risk factor data such as tobacco use and BMI, but also confounders. Mostly non-differential bias since both the subjects or interviewers most likely did not anticipate the subject to die soon and thus would not be expected to report or record differentially according to the anticipated outcome (one could however argue that a subject that died during early follow-up was feeling/looking ill and this influenced the reporting or probing for the behaviors of interest). It is unlikely that a death is misreported, unless there are financial incentives in reporting a missing person as dead.

Question 4 (5 points)

a. List two important reasons why the investigators excluded residents who were very ill or bedridden from this cohort.

Practical and ethical reasons since these subjects would be hard to interview and measure. These are also subjects who may have an advanced stage disease that caused them to change weight and/or health behaviors due to illness (reverse causation).

b. What would be one major advantage and one major disadvantage of further restricting cohort membership, for example to also exclude subjects, who at baseline report having been diagnosed with a cancer?

Major advantage: these subjects may have lost weight due to disease or changed behaviors and, thus, their weight and reported behaviors may have changed due to illness, excluding them avoids the problem of reverse causation. The high mortality among cohort with a low BMI strongly indicated that some of them had lost weight because of a disease.

Major disadvantage: there may be differences in who may know such diagnoses and who may not know them due to differential access to health care and, thus, we would exclude only a subset of cases and these may then also differ with regard to behaviors and weight. Excluding subjects with a severe disease that affects mortality based upon a standard medical examination would have been ideal.

Another disadvantage: If cancer and non-cancer mortality share risk factors, you could accidentally induce collider bias by conditioning on cancer (see DAG). Conditioning on non-cancer is less bias prone.

Cancer

(U)

Smoking Mortality

Question 5 (5 points)

Suppose the authors ascertained cause of death in cohort members, and you decide to conduct a nested case control study of cooking fuel use (wood versus gas stove) and oral cancer mortality in women at the end of follow-up. You will need to send interviewers back to the homes of cases and selected controls to ascertain the cooking fuel type.

a. Would you recommend addressing the issues of tobacco use in women in the sampling of cases and controls via an approach of restriction, matching, or neither? Please provide an explanation for your choice.

Restrict to livelong non-using women only, would probably be the best way to assure that tobacco use does not confound the associations that’s if you trust their data on non-smoking, and one would not need to adjust for it either. However, this would reduce the sample size unless very few women are active users.

Match on tobacco use status; then control for tobacco use more finely and distinguish by type of and duration of tobacco use. This might be the most efficient way to assess the associations without excluding users.

Not to match but control for tobacco use in detail in the analysis. We may end up with residual confounding since we are unable to adjust to tobacco use as finely as possibly necessary.

b. Describe one advantage and one disadvantage of matching cases and controls on polling station.

Logistical ease of revisiting cases and controls from same areas/ neighborhoods, also may match on access to health care in the community and, thus, diagnostic accuracy by area of residency

We may be seriously overmatching on fuel type if the same type of cooking fuel is likely to be used by neighbors more often. And in case of incomplete matching sets have to be dropped

Question 6 (10 points)

a. Calculate the crude death rate in both never-users and cigarette smoking men who are 1) extremely thin and 2) overweight. Then compare the crude to the adjusted rates reported in the Table 2. What do you conclude from this?

never-users (male) smoking (male)

Crude rates Age –Adj. Crude rates Age –Adj.

Extremely thin: 202/3553= 56.85 41.80 497/7,341= 67.70 55.60

Overweight: 369/27,238= 13.55 12.97 427/21,985= 19.42 17.91

Age –Adjustment makes more of a difference (i.e. is more important) for the group of extremely thin men.

b. Calculate both the crude RR and the age-only adjusted RR for being an extremely thin, never-tobacco using man compared to an overweight, never-tobacco using man and compare these two estimates to the adjusted estimate presented in Table 4. What do you conclude from this?

Crude RR=56.85/13.55 = 4.2

Age-adjusted RR=41.80/12.97 = 3.22

Table 4 adjusted RR = 2.68

It is important to adjust beyond age also for the potential confounding variable included in table 4 in the adjusted analyses, i.e. the crude RR of this comparison is likely confounded by not only age but socio-economic factors captured by religion, education and language.

c. In Tables 3 and 4, the investigators chose to present models adjusted for several potential confounders: 1) explain why you think they adjusted for religion; and 2) suppose the investigators ascertained active tuberculosis status in all subjects at baseline, would you suggest that they adjust for it? Supply a DAG for each answer to illustrate and support your choices.

religion

BMI, tobacco use death

Religion may well be a surrogate for SES or for other types of behavior that may contribute to mortality such as alcohol use or diet and sexual behaviors and these may all be related to tobacco use and or BMI, thus making it a proxy for confounding due to these underlying factors.

Active TB

BMI, tobacco use death

BMI and tobacco use could have activated TB and thus this might be an intermediate in the pathway and thus should not be adjusted for; if students make a credible argument for it being a confounder also they can argue to include it nevertheless but have to acknowledge the potential bias.

TB

Smoking religion

death

Question 7 (5 points)

In Table 6, the investigators compare the joint effect of tobacco use and BMI to an expected RR.

a.  Specify what kind of interaction effect they observed for extremely thin females who never use and who use smokeless tobacco in terms of multiplicativity of effects compared to the expected value.

Based on the point estimates provided in Table 6: for extremely thin never using women the investigators estimated a more than multiplicative RR while for the tobacco using women the RR estimate is almost the same (or could be called sub-multiplicative) compared to the expect RR (given a multiplicative interaction model). However, since we have not been given a confidence interval for these estimates it is also likely that the 95% CIs of the observed and expected RRs overlap strongly and we can call these interactions possibly multiplicative.

b. Describe what influence smoking of tobacco has on mortality in 1) extremely thin men, and 2) obese men in terms of a multiplicative interaction.

Tobacco smoking seems to increase the risk of dying during follow-up more dramatically in extremely thin man (3.36 compared to 2.68) than (1.56 compared to 1.34).

Question 8 (2 points)

A study shows that fire fighters have a 20% lower mortality of cardiovascular disease (SMR=0.80) than the general population. The authors consider this to reflect the beneficial effects of physical exercise. Suggest alternative explanations for this low risk and discuss the most important methodological problem(s) in using this comparison group.

To be a firefighter you need to be in good health (The healthy worker selection) and the general population contains people with severe diseases that make them unfit on the job market (The sick population effect), or the comparison violates the counterfactual principles (The stupid investigator’s effect).

Question 9 (4 points)

A case control study among pregnant nurses on exposure to chemotherapeutic drugs and risk of spontaneous abortions showed the following results:

Exposure / Cases (spontaneous abortion) / Controls
+ / 10 / 10
- / 90 / 190
100 / 200

a. What is the association between the exposure and spontaneous abortions?

OR = 10/90 = 2.11

10/190

b. In the study all cases accepted the invitation, but only 80% of the invited controls took part. What is the needed exposure distribution among non responding controls to fully explain away the increased risk observed in the table (to obtain an OR of 1)?

Exposure / Cases (spontaneous abortion) / Controls
+ / 10 / X
- / 90 / 250-X
100 / 250=200/0.8

OR = 1 = 10/90

X / (250-X)

X=25

15 out of 50 non-responding controls (30%) have to be exposed (compared with 5% among responding controls.

Exposure / Cases (spontaneous abortion) / Controls
+ / 10 / 25
- / 90 / 225
100 / 250

OR = 10/90 =1.0

25/225

c. Say that all truly non-exposed subjects are classified as non-exposed, but only 90% of the exposed subjects report this in the interview. Assume this error applies to cases as well as controls. If you assume that the table provides

non-misclassified data, what would the table with the expected data (including misclassified data) look like and what is the OR?

Exposure / Cases (spontaneous abortion) / Controls
+ / 10-1 / 10-1
- / 90+1 / 190+1

OR = 9/91 = 2.10

9/191


Question 10 (4 Points)

To investigate the short term effect of alcohol consumption on myocardial infarction, investigators questioned 200 persons with a recent myocardial infarction about their alcohol consumption in the 24 hours preceding the onset of symptoms (the “window” period). For comparison, the cases were also questioned about their alcohol use during the 24 hours immediately proceeding the window period (the “reference” period). Results are presented in the table below: