Case-Control and Cohort Studies - FAQs

This FAQ will assist you as you complete the tasks related to Case-control and Cohort studies.

Case-Control Studies
What is a Case-control study and how is it typically used in epidemiology? / In a case control study there are typically people who have become sick (or who have some disease) and you’re trying to learn more about what caused them to get sick/diseased.
A case-control studycomparespeople who have the disease (cases) to people who don’t (controls) tofind which exposure variable(s)are different between the two groups. If a higher percentage of cases were exposed to a certain variable (e.g., a food, an environment, a product) than controls were exposed to, it gives an indication that the variable may be related to the illness.
It’s important to note though, that the results of a case control study won’t tell you the cause of a disease. The results can only show you the association between the disease and an exposure variable, or the odds that someone who is sick or diseased was exposed to the variable.The results help guide decisions of where further investigation should be directed.
What is a case? / A case is someone who is sick (has the disease, which is sometimes referred to as the “outcome”). Other criteria may also be used to define a case (e.g., the person must have become ill within a certain time period or been present at an event). See Developing Outbreak Case Definitions for more information on defining cases.
What is a control? / A control is someone who looks similar to a case except that they are not sick (or diseased).
Once you have determined who your cases are, you have to select a comparison group of controls. The controls are people you find that aresimilar to cases on some important measures (e.g., same age, gender, income level, orlives in the same city) but DO NOT have the sickness/disease. If a person has the sickness/disease they cannot be included as a control even if they don’t fit other criteria of the case definition.Controls must be similar to cases on some other measures so more accurate comparisons can be made between the two groups. If the controls aren’t similar to cases on important measures confounding variables may be introduced that can influence the results of the study. See Matching Case-Controls FAQ for more information on matching controls to cases.
What are the steps involved in a Case-control study? / A case-control study identifies a group of people with a disease of interest and compares it to a group of people without the disease to determine what exposures or risk factors may have contributed to acquiring the disease. Here are the steps in completing a case control study:
  1. Define a case (those with the disease) and gather all of the known cases you’re going to include in your study.
  2. Select controls that are similar to your cases, except that they don’t have the disease (match them for age, gender, income, employment, etc.).
  3. Identify the potential exposure(s) that you think may be associated with the disease or illness and find out who has been exposed to the different variables through administration of a questionnaire.
  4. Calculate the OR for each exposure variable you are interested in studying using a 2x2 table (see the next question for more details).

What is a 2X2 table and how does it help me calculate an Odds Ratio? / Here is a common 2X2 table use in epidemiology, also called a 2X2 contingency table:
Disease No Disease Total
Exposed A B A+B
Not exposed C D C+D
Total A+C B+D A+B+C+D
Note: Depending on the design of the study, the 2x2 table can be set-up many different ways.
You can calculate the OR from the 2X2 table values A, B, C and D using the formula below. (Important note: in the formula below, the middle column shows you the detailed formula while the right column shows you the simplified formulathat is created by using basic algebra.)
OR = A/(A+C) A
Odds of exposure in disease group C/(A+C) C
______= ______= _____
B/(B+D) B
Odds of exposure in no disease group D/(B+D) D
What does the term “odds” mean? How is it different from probability? / The probability that an event will occur is afraction based on the number of times you expect to see that event divided by the total number of trials. The odds of an event occurring is the probability that it will occur divided by the probability that it will not occur. Probabilities always range between 0 and 1, while odds take on different valuesand can be expressed in different ways (e.g., 7:2 or ”7 to 2”).
Examples – the probability of rolling a 1 on a six-sided dice is 1/6. The probability of not rolling a 1 is 5/6. The ODDS of rolling a 1, however, is 1/5, or1:5, or simply .2. In the above 2x2 table the probability of a person with the disease being exposed to a variable is A/(A+C). The probability of a person with the disease not being exposed to the variable is C/(A+C). So, the odds of exposure in the disease group is (A/(A+C)) / (C/(A+C)).
What is an Odds Ratio (OR) and what does it tell me about exposure to a certain variable and the likelihood of becoming sick/diseased? / The OR tells you the ratio of the odds that people in the Disease group were exposed to a variable vs. the odds that someone in theNoDisease group was exposed to the variable.
Example
Odds in disease group A/C 2.1
______= ___ = 1.05
Odds in no disease group B/D 2.0
In the above example, the odds of exposure in the Disease group is similar to the odds of exposure in the no disease group so the OR is very close to 1. This means that the people in the Disease group were just as likely to have the exposure as those in the No Disease group. This can be interpreted to mean that having the disease is not associated with the exposure because people who had the disease and people who don’t have the disease have the same odds of exposure to the variable in question.
If the OR isgreaterthan 1(e.g., 8), then you mayconclude that there is an association between being exposed to the variable and having the disease. The Disease group would have 8 fold greater odds of exposure than the No Disease group. If the OR is less than 1, you may conclude that there is an association between not being exposed to a variable and having the disease (e.g., not being exposed to vitamin C and developing scurvy). The further from 1 an OR is the greater the association is. Note: You’ll notice in the above formula that the numerator would become the denominator by changing what is classified as a disease and not a disease, with resultant changes in the OR. Therefore, an OR less than one has the same associative strength as its reciprocal OR (e.g., an OR of .125 or 1/8 has the same associative strength as an OR of 8 or 8/1).
What is one example of a Case-control study? / There was an outbreak of Giardiasis (symptoms: malodorous diarrhea, nausea, fatigue and weight loss) among workers at a local hotel. Many hotel employees called in sick and hotel management began an immediate investigation to try to determine the source of the outbreak. They did lab tests on all of the employees and gathered all of the cases (those who were diagnosed with Giardiasis). They then matched the cases with controls from the staff. They surveyed both the cases and controls and calculated the Odds Ratios for the following exposures:
  • Those who swam in the pool (OR=1.1)
  • Those whose kids went to the onsite daycare (OR=5.8)
  • Those who ate at the hotel restaurant (OR=1.2)
  • Those who drank the hotel water tap water (OR=1.0)
In the case of the tap water, an OR=1 means that those who were sick had the same odds of drinking the hotel tap water as those who weren’t sick, which means the tap water isn’t likely problem. Most of the ORs for the other variables were all close to 1, meaning the likelihood that the source of infection is from those sources is low. The hotel daycare, however,had an OR of 5.8, meaning that those who were sick had nearly 6 times greater odds of having kids who went to the onsite daycare. It is important to keep in mind that calculating an OR does not prove causation, and further investigation would still be needed to verify the source of infection (e.g., environmental testing at the daycare).
Cohort Studies
What is a cohort study? / A cohort studyinvolves observation of a group of people over time to measure their exposure to different variables and see how it relates to their outcomes. With a cohort study, you typically have a more defined population than in a case-control study and know who has been exposed to certain variables and you’re looking to see what happens to this group over time in comparison to what happens to the group who has not been exposed.
In other words, something potentially bad happens (exposure) and you anticipate it may cause some illness or disease, so you follow the exposed group and compare them to a similar group that wasn’t exposed. You look at what happens to each group over time and make a comparison that will tell you the risk of getting a specific illness or disease in relation to the exposure.
Cohort studies can be prospective (following a group for a period of time to look at their exposure and outcomes) or retrospective (gathering data from things such as historical records or surveys to determine whether a person was exposed to the variable or not).
In a cohort study you begin by looking at people who have been exposed to a variable and then investigate whether they develop the disease whereas in a case-control study you begin by looking at people who have a certain disease and investigate what variables they have been exposed to.
What is a “cohort” or cohort group? / The cohort is a group of people whoshare a common characteristic or who all experience a particular event at a given time.
Examples - A classic “cohort” is everyone who was born in a certain timeframe (say, between 1980 and 1985). Other possible cohorts include everyone who attended an event (e.g., wedding, party, a rock concert), a group or co-workers who worked in the same facility, or everyone who purchased a certain product.
What two groups do you compare in a cohort study? / A cohort study identifies a group of people who have been exposed to a variable(a cohort) and compares them to a group of people who have not been exposed to the variable to determine what illness(es) or disease(s) exposure to the variable may be associated with. The comparison group (unexposed) is frequently matched to the cohort group (exposed) as much as possible (excluding exposure) to minimize the influence of confounding variables.
In a cohort study, how do you compare the two groups? / At a high level, here are the steps in completing a cohort study:
  1. Define the cohort.
  2. Define the exposure(s) of interest and the outcome (disease) you are trying to study.
  3. Decide how you will evaluate the exposure and outcome (e.g., surveys, lab tests, medical records, exams).
  4. Set-up a 2X2 table for each exposure variable you are interested in studying and calculate the Relative Risk (RR) for each variable (see the next question for more details).

What is a 2X2 table and how does it help me calculate Relative Risk (RR)? / As in the case-control study, here is a common 2X2 table use in epidemiology (also called a 2X2 contingency table):
Disease No Disease Total
Exposed A B A+B
Not exposed C D C+D
Total A+C B+D A+B+C+D
You can calculate the RR from the 2X2 table values A, B, C and D using the formula below.
RR = A
Risk of disease in exposed group (A+B)
______= ______
C
Risk of disease in unexposed group (C+D)
What is Relative Risk and what does it tell me about exposure to a certain variable? / Relative risk tells you the risk of an outcome (e.g., disease) occurring in those exposed to a certain variable versus the risk of getting the disease in those who weren’t exposed to the variable.
Example - If the probability of getting sick with a certain illness after eating contaminated spinach was 1 in 200 (.005) and the probability of getting the same illness for other reasons was only 1 in 1000 (.001) for those who didn’t eat the spinach, then the RR = 5.
RR = 1
Risk of disease in exposed grp 200 .005
______= ______= ______= 5
1
Risk of disease in unexposed grp 1000 .001
This would be interpreted to mean that you were five times more likely to get sick if you ate the contaminated spinach than if you didn’t eat it.
But what if the RR was 1.2?
A RR pretty close to 1 means that the risk of disease or illness in the exposed group was about the same as those in the unexposed group. This can be interpreted to mean that the exposure you are studying is likely not associated with getting the disease because it appears that people who were exposed and unexposed became diseased or ill at the same rate.
Note: As with Odds Ratios, RRs greater than 1 are equivalent to their reciprocal RR less than 1. This, again, is related to how “exposed” can be conceptualized in different ways, changing the groups from the numerator to the denominator.
What is one example of a cohort study? / There are many different types of cohort studies, but here are the highlights of one cohort study that looked at samples of residents of Framingham, Massachusetts:
  • Began in 1948, when researchers sought to determine which biological and environmental factors contributed to death and illness from heart disease.
  • Studied healthy male and female residents between 30 and 60 years old (had to be free of heart disease).
  • Every 2-4 years participants are given extensive exams and surveys to track identified risk factors (e.g., obesity, level of physical activity) and various heart disease conditions. The study continues today with remaining members of the original cohort.
  • Study evaluates several different hypotheses (e.g., increased physical activity is associated with a decrease in the development of heart disease).
The cohort is determined by separating out individuals into two groups for each risk factor or exposure under investigation (e.g., level of physical activity). In this example, the exposure group(s) would be based on the amount of physical activity (low or moderate).
So, those with a low level of physical activity (sedentary) are compared to those with a moderate level of physical activity to determine what the risk is for heart disease between the two groups. If the researchers found the RR in this exampleto be 2.3, then they could say that it was found that those that had a low level of physical activity were 2.3 times more likely to develop heart disease than those with a moderate high level of physical activity.
For more information about this study, go to:
General Questions
What are some ways I can determine if I should run a case-control or a cohort study? / You should first start with the question you want to ask.
  • If you want to know the risk of developing a disease based on the exposure, then you would do a cohort study.
  • If you know the disease or outcome and you want to know what exposures might be associated with it, then you would do a case-control study.
  • If you have a rare exposure with multiple possible outcomes, then you would run a cohort study.
  • If you have a rare disease, then you would run a case-control study.
  • If other factors are equal and maintaining low costs is a concern, then you would run a case-control study.

What’s the difference between a Case-Control Study and a retrospective Cohort Study? / In some ways a retrospective cohort study can look much like a case-control study. The investigation in both types of studies can begin when a researcher learns that people became ill after an event and then begin to look at exposures. There are, however, differences that separate the two types of studies. If the cohort were very large, many people in the cohort were unavailable, not very many people became ill, or costs were an issue you might do a case-control study rather than a cohort study. As in all case-control studies, you would classify cases by the presence of illness (using a case definition). You would then randomly select controls who fit the matching criteria, in this case the matching criteria would be that they were a part of the cohort (e.g., attended an event). The study would then look for what people in both groups were exposed to.
Beginning with the same information, that some members of a cohort became ill, a retrospective cohort study could also be done. If the cohort weren’t very large by the time you were done selecting enough controls to give a case-control study enough power, nearly the entire cohort might be contacted. Because the cohort is of a finite size, the cost of doing a retrospective cohort study would be similar to a case-control study. The difference is that you would attempt to contact all, or many of the people in the cohort and group them based on exposure (e.g., people who drank the punch vs. those who didn’t). Even though the study began because some people were ill, those people would be grouped based on their exposures and NOT based on disease status. The study would group based on exposure, and then researchers would look to see whether or not people became ill (outcome).