Intermediate Methods in Epidemiology

2008

Exercise No. 2 - Measures of Disease Frequency and Association

Main topics covered in this laboratory exercise:

I.The relationship of incidence rates and ratios to prevalence rates and ratios and how these relate to the interpretation of cross-sectional studies.

II.Calculation of

a.Relative Risk and Relative Odds: Similarities and differences

b.Relative odds in matched and non-matched samples

c.Confidence limits for relative odds and relative risks

III.Case-Control Studies

a.Assumptions

b.Controls

1.Non-cases

2.Population sample

IV.Random variability vs. bias

V.Matched studies

a.Advantages and disadvantages

b.Assumptions

VI.When and how to use a statistical significance trend test.

Department of Epidemiology - Johns Hopkins University - Copyright 1999

This laboratory exercise assumes familiarity with the following concepts:

Incidence and prevalence

Case fatality rate

Relative risk and relative odds

Cohort and case-control study design

Matched case-control design

Confounding

Matching as a method of confounder control

Sampling variability

How to label graphs

1

PART I: MEASURES OF DISEASE FREQUENCY

1.BACKGROUND INFORMATION

The material for this part of the exercise will all be taken from a longterm study of tuberculosis in Muscogee County, Georgia. In 1946, the county had a population of about 100,000 consisting of a city of 75,000 population and surrounding suburban and rural areas. The city is industrial in character and is adjacent to a large military post. Thirty per cent of the population was black.

Survey and Followup Procedures

In May and June, 1946, a combined tuberculosisvenereal disease survey aimed primarily at persons over 15 years of age was conducted by the local and state health departments with federal aid. Three months later a special census of the county population was made which permitted a better estimation of the completeness of coverage than the Federal Census figures because individuals enumerated in the special census could be matched against those reached in the survey, and because the 1940 Federal Census was rather badly out of date.

A 70 mm photofluorograph[1] was offered to each participant over the age of 12 years. Each photofluorograph was interpreted independently by two readers and all suspects of either reader were requested to return for a standard 14 x 17 inch chest radiograph.

Persons classified as tuberculosis cases or suspects after the 14 x 17 inch Xray were advised to remain under observation for at least five years. During this period examinations were to be repeated at least every three months. Only a few persons with clear-cut evidence that the suspected abnormality was not tuberculous were discharged from followup before the end of the five year period.

Tuberculin tests were done on most cases and suspects, and skin tests with fungal antigens were done where indicated. Sputum examinations, including cultures, were done on about onehalf of the cases and suspects, including nearly all who were producing sputum or were suspected of having active disease. Gastric washings were not obtained.

1

On the average, each patient had more than 12 clinic visits, with radiographs, up to July 1, 1952. Sixty per cent of the persons initially classified as having tuberculous were examined in the 5th year after the survey; 11 per cent had died; 8 per cent had moved away; and 21 per cent were not examined in that year. Almost all of the last group were known to be living and apparently well.

Casefinding Subsequent to the Survey

In 1950, 75,000 persons were radiographed in a second mass survey of the metropolitan area exclusive of the military post. In addition, approximately 22,000 screening films were made each year as a result of the preemployment and foodhandler requirements, prenatal services, referral by private physicians, or other resources. Death certificates, hospital records and other sources of medical information were continuously scanned for information on known cases and suspects, as well as for possible unreported or unrecognized cases. Reporting is quite complete in this community, and it can be assumed that all diagnosed tuberculosis came to the attention of the health department, except for persons who had moved away prior to their diagnosis.

During the fiveyear period following the 1946 survey, new cases of tuberculosis were matched to the file of 1946 survey participants. Observations on these cases were continued to June 1952 to give even the most recently diagnosed case a followup period of at least one year. Tuberculosis deaths among the cases diagnosed in the 1946 survey were also counted to June 1952.

Migration from the Community

To estimate the number of persons remaining in the area in 1952, a sample, systematically selected, was visited in 1961 as part of a study of blood pressure levels. The proportion of persons who had left the county is shown in Table 1. The very few individuals whose whereabouts could not be ascertained were counted as having moved.

( 1) From the data in table 1, briefly describe the differences between individuals who emigrate and those who do not in this population. What kind of implications could these differences have for epidemiologic studies based on this population?

In addition, the residence status of a systematic sample of participants in the 1950 survey who were then over 20 years of age was investigated in 1964. Sixtysix percent were found to be still residing in the study area, with no differences between those who reacted to tuberculin in 1950 and those who did not. (A reactor in this instance was defined as a person having five or more mm of induration to 5 TU of tuberculin.)

1

Table 1

Emigration between Sept. 1, 1946 and July 1, 1961 in a sample of the surveyed population, by race, sex, age and subcutaneous fatness in 1946, Muscogee County, Georgia.

Characteristics / Number
in sample / Per cent
emigrated
Total / 464 / 30
Race-Sex
White Males / 95 / 35
White Females / 183 / 36
Black Males / 67 / 24
Black Females / 119 / 27
Age in 1946
15-34 / 262 / 39
35-54 / 168 / 23
55+ / 34 / 15
Fat thickness in mm.
over trapezius muscle*
0-4 mm. / 157 / 35
5-9 mm. / 216 / 32
10+ mm. / 91 / 22

* Percentages adjusted to race-sex-age composition of total examined

population.

1

PREVALENCE vs. INCIDENCE

Strictly speaking the term relative risk is defined as the ratio of two incidence risks. However, this term is commonly used with many quantities which approximate this ratio such as the ratio of two rates, odds, or prevalences. This laboratory exercise uses the term "relative risk" in this looser sense, however it is important to note how each estimate of the relative risk is calculated. More specific uses of these and related terms are described in the relevant lecture.

Prevalence of Tuberculosis

Table 2 presents the number of persons screened, by race and age, and the cases of pulmonary tuberculosis determined at the end of the 5year observation period to have had tuberculosis when surveyed. The tuberculosis deaths among these cases to June 1952 is also given.

Table 2

Prevalence of pulmonary tuberculosis in 1946 survey, by race and age, Muscogee County, Georgia.

White / Black
Total / <45 / 45+ / <45 / 45+
Population screened
in survey / 38,190 / 17,699 / 4,939 / 12,336 / 3,216
Cases of tuberculosis / 568 / 134 / 245 / 119 / 70
Prevalence/1000 / 14.9 / 7.6 / 49.6 / 21.8
Prevalence Ratio / 1.0 / 6.53 / 2.87
Tuberculosis deaths to
June 1952 / 34 / 2 / 5 / 16 / 11
Case fatality, % / 6.0 / 1.49 / 2.04 / 15.71

1

Incidence of New Cases

In similar fashion, the number of new cases known to have developed among the screened population during the next five years, together with the tuberculosis deaths among them and the midpoint populations at risk are shown in Table 3.

Table 3

Incidence of new cases of pulmonary tuberculosis per 1000 estimated survey-negative population during a 5-year period, 1946-1951, Muscogee County, Georgia.

White / Black
Total / <45 / 45+ / <45 / 45+
Estimated midpoint
population / 33,656 / 15,554 / 4,348 / 10,915 / 2,839
New Cases of
tuberculosis / 110 / 22 / 12 / 66 / 10
Incidence/1000/year / 0.65 / .28 / .55 / .70
Relative Risk / 1.0 / 1.96 / 2.50
Tuberculosis deaths
to June 1952 / 31 / 3 / 1 / 22 / 5
Case fatality, % / 28.2 / 13.6 / 8.3 / 50.0

( 2) Calculate the prevalence, incidence and case fatality for younger blacks in Tables 2 and 3. Note that for Table 3, the new cases of tuberculosis were accumulated over a five year period so that the calculation of the annual incidence rate must account for this. Do you observe the same patterns of associations of race and age with tuberculosis prevalence and case fatality from Table 2 (crosssectional data) as you do with tuberculosis incidence and case fatality from Table 3 (prospective data)?

1

A high degree of tuberculin sensitivity is also considered to be a risk factor for tuberculosis in many populations. Data on this characteristic from Muscogee County, Georgia can also be used to compare results of a crosssectional and a prospective study. As mentioned earlier, a communitywide screening project was carried out in 1950. In this project, participants were given a standard tuberculin skin test and a chest photofluorograph. Persons with abnormal photofluorographs were recalled for a 14x17 radiograph and a clinical examination. Table 4 shows the frequency of pulmonary tuberculosis among the screened population by size of tuberculin reaction (diameter of induration in mm). Table 5 shows the rate of new tuberculosis developing during the next 14 years among the population with normal chest radiographs in 1950, according to the size of their tuberculin reaction in 1950.

Table 4

Cases of pulmonary tuberculosis detected in surveyed population by size of tuberculin reaction, Muscogee County, Georgia, 1950.

Induration
(mm) / No. of
Cases / Cases
per 1000 / Relative
Risk
Total / 496 / 11.8
0- 2 / 34 / 4.1 / 1.0
3- 4 / 38 / 4.5 / 1.1
5- 7 / 71 / 8.5 / 2.1
8-12 / 142 / 16.9
13+ / 211 / 25.2 / 6.1

1

Table 5

Cases of pulmonary tuberculosis among surveyed population developed during a 14-year period (1950-1964), by size of tuberculin reaction in 1950, Muscogee County, Georgia.

Induration
(mm) / No. of
Cases / Cases
per 1000 / Relative
Risk
Total / 239 / 26.6
0 / 20 / 8.7 / 1.0
1 - 4 / 43 / 16.8 / 1.9
5 - 9 / 66 / 26.9 / 3.1
10 - 14 / 72 / 61.9
15+ / 38 / 77.0 / 8.9

( 3) Complete tables 4 and 5, i.e., calculate the missing relative risks, using persons with the smallest tuberculin reactions as the reference group. Note that relative risk is calculated as rate of exposed divided by rate of unexposed (e.g., 0 mm or 0-2 mm induration). Is the association of tuberculosis with size of reaction similar in the crosssectional and prospective studies?

1

[To assist you with answering the above question, it would be wise to plot the relative risk values from the two tables onto a single piece of semi-log graph paper. (xaxis = mm induration, and yaxis = relative risk). Label the graph succinctly but completely.]

( 4) When are associations observed from cross-sectional (prevalence) studies similar to those observed from prospective studies?

1

PART II: RELATIVE RISK AND RELATIVE ODDS

For this part of the exercise, a defined population has been selected among persons identified in a private census of Washington County, Maryland, as of 15 July 1963, namely white males and females who were aged 45 through 64 inclusive on the census date. This subset includes over 95% of the white population in the county in that age range. Ethnic groups other than whites have been excluded because there is some evidence that their experience with the illness of interest (cancer of the colon) is different from that of whites, and there are too few non-whites in Washington County to allow reliable estimates of their experience. A further simplifying assumption is that there were no losses (e.g., migration) from the population.

White males and females aged 45 through 64 on 15 July 1963 who could be identified in the 1963 census lists were the source population for this study. Cases were in the county cancer register as having had a diagnosis of cancer of the colon first made in the 12-year period 15 July 1963 through 14 July 1975. It is believed that identification of cancer cases is very nearly complete, but this cannot be known for certain. For the purposes of this exercise, assume that ascertainment was complete.

( 5) Under what circumstances would incomplete ascertainment affect estimates of relative risks or relative odds?

Cancer of the colon has been reported to be more common in urban populations than in rural populations, and more common in high socio-economic groups (as measured by average education levels in the area in which they reside) than in lower socio-economic groups.

Table 6 shows the information needed to estimate the relative risks of developing colon cancer by urban residence and high socio-economic status (defined as having completed 13 or more grades of school). Urban includes Hagerstown suburbs.

1

Table 6

Number and rate per 1000 white residents of Washington County, MD aged 45 through 64 years on 15 July 1963, of cases of cancer of the colon diagnosed 15 July 1963 through 14 July 1975, by residence and grades of school completed.

Initial characteristic / Population / Cases / Relative risk / 95% Confidence limits** / Relative odds
N / Rate
(per1000)
Total / 18,125 / 116 / 6.4
Residence
Urban / 9,351 / 50 / 5.3 / 0.71 / 0.49 - 1.03 / 0.71
Rural / 8,774 / 66 / 7.5 / 1.00 / 1.00
Grades completed
13+ / 2,418 / 23
<13, NS* / 15,707 / 93 / 5.9 / 1.00 / 1.00

* NS: not stated.

** 95% confidence limits for the Relative Risk, see formula below.

See lecture handout entitled "Measures of Association" for method of calculating the confidence limits for the relative risk (Katz et al, Biometrics 34: 469-74, 1978). Briefly, the variance of the natural log of the relative risk can be approximated as follows (Katz et al. 1978, Kahn & Sempos, pp. 62-63):[2]

1

( 6) Complete the blanks in table 6. Do the relative risks and odds ratios differ from each other to any meaningful degree? Be sure you know under what conditions they would be expected to be similar and markedly dissimilar.

A sample of controls was selected for the source population who were never identified as cases. Pertinent information was abstracted from the listing and entered on the computer file in the same way that had already been done for the 116 cases. Table 7 shows the results.

Table 7

Residence and grades of school completed for cases of cancer of the colon and a sample of controls, white males and females aged 45 through 64, identified in the 1963 census of Washington County, MD

Initial characteristic / Cases / Controls / Relative odds / 95% Confidence limits
Total / 116 / 116
Residence
Urban / 50 / 66
Rural / 66 / 50 / 1.00
Grades completed
13+ / 23 / 12 / 2.14 / 1.01 - 4.55
<13, NS* / 93 / 104 / 1.00

* NS: Not stated

( 7) What type of epidemiologic study is this?

1

See lecture handout entitled "Measures of Association" for method of calculating these confidence limits (Woolf B: Ann Hum Gen 10:251-253, 1955). Briefly, the variance of the natural log of the odds ratio can be approximated as follows (Woolf, 1955, Kahn & Sempos, pp. 56-58):

where a,b,c, and d are the entries in the 2x2 table (see footnote on page 12).

( 8) What is the total reference population from which the control group was

selected?

( 9) Calculate the relative odds and its 95% confidence limits for residence in

table 7.

(10) Do the results in this table differ from those in Table 6? Why?

In this exercise, you have the unique opportunity of seeing how closely the control odds (urban/rural; 13+/0-12, NS) reflect the true odds which can be obtained from Table 6. Only rarely in the real world will you have such an opportunity. In the future, always keep in mind that your findings may result as much or even more from sampling variation than from any true association. Even at very low p-values, chance may have produced your findings. Rare events are happening all the time!

(11) From Table 6, what are the expected numbers of urban and rural controls in table 7? How might you explain the discrepancy with the number of controls actually selected (table 7)?

1

As it happened, in the initial selection of controls, one case was included in the 116 persons. When this was noted, searching of the lists was resumed at the point where the case was located, and the next person who met the criteria for controls and who was not a case was substituted for them. The control group, therefore, is a sample of non-cases in the population.

If the cases had been retained in the sample, however, the groups would not have been a sample of non-cases but rather a sample of the total study population. Table 8 shows a sample of the study population, which in this particular instance happened to include 1 case.

Table 8

Residence and grades of school completed for cases of cancer of the colon and a sample of white males and females aged 45 through 64 years who were identified in the 1963 census of Washington County, MD

Initial characteristic / Cases / Sample
Total / 116 / 116
Residence
Urban / 50 / 58
Rural / 66 / 58 / 1.00
Grades completed
13+ / 23 / 9
<13, NS* / 93 / 107 / 1.00

* NS: Not stated

(12) When the cross-products approach used in Table 7 is applied to Table 8, what measure of association is produced?

1

When cases are compared in this way with a sample of the population, there are no simple ways of obtaining confidence limits. If, however, the disease is rare, the same formula used for confidence limits of relative odds will give limits that are suitable for most practical purposes.

A second set of controls matched to the cases was drawn. For each case, a non-case of the same race, sex and year of age was selected. The results are shown in Table 9 as counts of individual controls, not as matched pairs.

Table 9

Residence and grades of school completed for cases of cancer of the colon and randomly selected controls matched to cases by race, sex and year of age, white males and females aged 45 through 64 identified in the 1963 census of Washington County, MD

Initial characteristic / Cases / Matched
Controls / Relative odds
Total / 116 / 116
Residence
Urban / 50 / 55
Rural / 66 / 61 / 1.00
Grades completed
13+ / 23 / 16
<13, NS* / 93 / 100 / 1.00

* NS: Not stated

(13) Why do the results in Table 9 differ from those in Tables 6 and 7? Which of the two control sets is likely to be most appropriate for the assessment of risk associated with education level attained? Why?

1

While the relative odds calculated in this way (e.g. by pooling matched pairs) yields an estimate that is likely to be closer to the truth than if matching were not done, it is not a socially accepted method. Some epidemiologists and biostatisticians even become rather violent in their objections. With matched controls, a different method of calculating the relative odds should be employed. Not only is the arithmetic simpler, but the resulting odds ratio is unbiased. The table is set up as illustrated in Table 10 for residence. Use the data in Table 11 to obtain the numbers of years of school completed.

Table 10

Numbers of pairs of cases of cancer of the colon and matched controls by residence and schooling classification of case and control in each pair