Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Instructor’s version / July 30, 2009

BIOSTAT Case Study 1: Exploratory Data Analysis Techniques

Time to Complete Exercise: 30 minutes

LEARNING OBJECTIVES

At the completion of this Case Study, participants should be able to:

Ø  Access TB surveillance data from the CDC Web site

Ø  Generate box-and-whiskers plots, stem and leaf diagrams, and histograms

Ø  Generate percentile values and measures of central tendency and dispersion for skewed distributions

Ø  Describe the magnitude of the TB incidence (new case) rates in the United States

Ø  Describe the differences in TB incidence rates by sex/gender and state across the United States

ASPH A. BIOSTATISTICS COMPETENCIES ADDRESSED IN THIS CASE STUDY

A.5. Apply descriptive techniques commonly used to summarize public health data

A.8. Apply basic informatics techniques with vital statistics and public health records in the description of public health characteristics and in public health research and evaluation

ASPH INTERDISCIPLINARY/CROSS-CUTTING COMPETENCIES ADDRESSED IN THIS CASE STUDY: F. COMMUNICATION AND INFORMATICS

F. 8. Use information technology to access, evaluate, and interpret public health data

Please provide your evaluation of the usefulness of this material by clicking here: http://www.zoomerang.com/Survey/?p=WEB229G2W73FYP

Introduction

Control of tuberculosis (TB) in the United States is an important public health responsibility. Effective TB control requires a complex system that merges elements of laboratory science, investigative work, public health, surveillance, and clinical care.

The Tuberculosis Information Management System (TIMS) is one example of a public health surveillance system. TIMS is one of the main sources of descriptive data regarding TB in the United States. TIMS includes information on all cases of TB that have been reported to the Division of TB Elimination (DTBE) at the Centers for Disease Control and Prevention (CDC). This information is reported to CDC by 50 states, the District of Columbia, the city of New York, Puerto Rico, and other jurisdictions in the Pacific and Caribbean.

Data on person, place, and time relating to TB in the United States are gathered using TIMS. These data are analyzed and published by the CDC annually and may be accessed through the CDC Web site in the form of TB Surveillance Reports at: http://www.cdc.gov/nchstp/tb/surv/Surv.htm and the Online Tuberculosis Information System (OTIS) at http://wonder.cdc.gov/tb.html. If you were to access OTIS and request current TB case reports by sex and state for the period 2001-5, you would obtain the data below. The data presented below are the TB new case rates per 100,000 population for males and females (person), in the 50 states and the District of Columbia (DC) (place) during the years 2001 to 2005 (time).

TB Case Rates per 100,000 Population

Place Females Males Place Females Males

1

Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Instructor’s version / July 30, 2009

Alabama 3.4 7.2

Alaska 6.6 9.4

Arizona 3.5 6.6

Arkansas 3.4 6.5

California 7.1 10.6

Colorado 2.1 3.0

Connecticut 2.5 3.6

Delaware 2.7 4.6

DC 8.2 19.0

Florida 4.3 8.5

Georgia 4.5 7.8

Hawaii 7.9 12.6

Idaho 0.9 1.1

Illinois 4.1 6.0

Indiana 1.6 2.7

Iowa 1.2 1.8

Kansas 2.1 3.2

Kentucky 2.1 4.7

Louisiana 3.7 7.9

Maine 1.2 2.0

Maryland 4.5 6.0

Massachusetts 3.5 5.0

Michigan 2.4 3.2

Minnesota 3.9 4.7

Mississippi 2.9 6.1

Missouri 1.5 3.1

Montana 0.7 2.1

Nebraska 1.5 2.5

Nevada 3.6 5.1

New Hampshire 1.2 1.3

New Jersey 5.0 6.8

New Mexico 2.3 2.8

New York 5.6 9.5

North Carolina 3.2 5.9

North Dakota 0.8 1.0

Ohio 1.6 2.9

Oklahoma 3.5 6.4

Oregon 2.3 3.9

Pennsylvania 2.2 3.3

Rhode Island 4.0 5.5

South Carolina 4.2 8.2

South Dakota 1.7 2.1

Tennessee 3.4 6.9

Texas 4.9 9.5

Utah 1.2 1.7

Vermont 1.5 0.9

Virginia 3.9 4.9

Washington 3.3 5.0

West Virginia 1.0 2.0

Wisconsin 1.2 1.7

Wyoming 0.5 0.7

1

Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Instructor’s version / July 30, 2009


Exploratory data analysis techniques are often used to organize, summarize, and describe clinical and epidemiologic data. These techniques include stem-and-leaf plots and box plots. To make this easier, the sorted data, by gender, appear below.

1

Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Instructor’s version / July 30, 2009

Female TB Case Rates per 100,000 Population

1.  Wyoming 0.5

2.  Montana 0.7

3.  North Dakota 0.8

4.  Idaho 0.9

5.  West Virginia 1.0

6.  Iowa 1.2

7.  Maine 1.2

8.  New Hampshire 1.2

9.  Utah 1.2

10.  Wisconsin 1.2

11.  Missouri 1.5

12.  Nebraska 1.5

13.  Vermont 1.5

14.  Indiana 1.6

15.  Ohio 1.6

16.  South Dakota 1.7

17.  Colorado 2.1

18.  Kansas 2.1

19.  Kentucky 2.1

20.  Pennsylvania 2.2

21.  New Mexico 2.3

22.  Oregon 2.3

23.  Michigan 2.4

24.  Connecticut 2.5

25.  Delaware 2.7

26.  Mississippi 2.9

27.  North Carolina 3.2

28.  Washington 3.3

29.  Alabama 3.4

30.  Arkansas 3.4

31.  Tennessee 3.4

32.  Arizona 3.5

33.  Massachusetts 3.5

34.  Oklahoma 3.5

35.  Nevada 3.6

36.  Louisiana 3.7

37.  Minnesota 3.9

38.  Virginia 3.9

39.  Rhode Island 4.0

40.  Illinois 4.1

41.  South Carolina 4.2

42.  Florida 4.3

43.  Georgia 4.5

44.  Maryland 4.5

45.  Texas 4.9

46.  New Jersey 5.0

47.  New York 5.6

48.  Alaska 6.6

49.  California 7.1

50.  Hawaii 7.9

51.  District of Columbia 8.2

Male TB Case Rates per 100,000 Population

1.  Wyoming 0.7

2.  Vermont 0.9

3.  North Dakota 1.0

4.  Idaho 1.1

5.  New Hampshire 1.3

6.  Utah 1.7

7.  Wisconsin 1.7

8.  Iowa 1.8

9.  Maine 2.0

10.  West Virginia 2.0

11.  Montana 2.1

12.  South Dakota 2.1

13.  Nebraska 2.5

14.  Indiana 2.7

15.  New Mexico 2.8

16.  Ohio 2.9

17.  Colorado 3.0

18.  Missouri 3.1

19.  Kansas 3.2

20.  Michigan 3.2

21.  Pennsylvania 3.3

22.  Connecticut 3.6

23.  Oregon 3.9

24.  Delaware 4.6

25.  Kentucky 4.7

26.  Minnesota 4.7

27.  Virginia 4.9

28.  Massachusetts 5.0

29.  Washington 5.0

30.  Nevada 5.1

31.  Rhode Island 5.5

32.  North Carolina 5.9

33.  Illinois 6.0

34.  Maryland 6.0

35.  Mississippi 6.1

36.  Oklahoma 6.4

37.  Arkansas 6.5

38.  Arizona 6.6

39.  New Jersey 6.8

40.  Tennessee 6.9

41.  Alabama 7.2

42.  Georgia 7.8

43.  Louisiana 7.9

44.  South Carolina 8.2

45.  Florida 8.5

46.  Alaska 9.4

47.  New York 9.5

48.  Texas 9.5

49.  California 10.6

50.  Hawaii 12.6

51.  District of Columbia 19.0

1

Biostatistics Case Study 1
Exploratory Data Analysis Techniques
Instructor’s version / June 17, 2009

Question 1

Generate separate stem-and-leaf diagrams of these case rates for males and females and describe the distribution of these data. (Hint: use the decimal as the leaf.)

Answer Key

Female TB Case Rates per 100,000 Male TB Case Rates per 100,000

19 19 0

18 18

17 17

16 16

15 15

14 14

13 13

12 12 6

11 11

10 10 6

9 9 455

8 2 8 25

7 19 7 289

6 6 6 00145689

5 06 5 00159

4 0123559 4 6779

3 234445556799 3 0122369

2 1112334579 2 00115789

1 022222555667 1 013778

0 5789 0 79

Question 2

Describe the distributions. Are they normally distributed or skewed to the right or skewed to the left?

Answer Key

Right skewed.

Question 3

What is the median TB case rate among females and among males? The 75% and 25% values? The interquartile (IQ) range? The range?

Answer Key

Females: Median 2.9 25% 1.5 75% 4.0 IQ range: 2.5 Range: 7.7

Males: Median 4.7 25% 2.5 75% 6.8 IQ range: 4.3 Range: 18.3

Question 4

Draw/generate a histogram and a box-and-whiskers plot describing the rates for males and females. Which states/locations have unusually high or low (outlier) rates?

Distributions

Female TB case rate per 100,000 population

Male TB case rate per 100,000 population

Females: Hawaii and DC

Males: DC

Question 5

Describe the differences in the TB case rates for males and females.

Answer Key

·  Males tend to have higher TB case rates than females in this time period across states in the US.

·  These rates are skewed to locations with high rates, with DC having unusually high TB case rates for both males and females

5

Histogram, box plots and summary statistics in Instructor’s Version were produced using JMP 7.0, SAS Institute Inc.