Problem Set 4: HRP/STAT 261: Due February 22, 2012
1. A study was undertaken to examine the prevalence of abnormal hematologic (blood cell) profiles in elite cross-country skiers at the 2001 World Ski Championships. Abnormal hematologic profiles—measured as increased red blood cells or hemoglobin—may indicate blood doping. Sixty-eight percent of all skiers and 92% of those finishing in the top 10 places were tested. Hemoglobin levels in the athletes were compared against established reference data (hemoglobin concentration is normally distributed). Values of >2 SD above average were classified as “abnormal,” and values >3 SD above average were classified as “highly abnormal.” Results for the top 50 finishers in each of the 9 races of the Championships are represented in the figure below (assume each individual skier only competed once):
Each of the 9 races is represented by a column. The hematologic result for an athlete is placed in the position of their race result (1st to 50th) in each column. A black oval indicates a “highly abnormal” (>3 SD) hematologic profile in the skier who obtained that race result. A speckled oval indicates an “abnormal” (>2 SD but <3 SD) hematologic profile in the skier who obtained that race result. A white oval indicates a “normal” hematologic profile in the skier who obtained that race result. A blank area indicates that a sample was not obtained from the athlete who achieved that race result.
Note: Missing data are missing at random and can be ignored. (Selection for drug testing is random, but top finishers have higher probabilities of being selected).
I’ve converted these data for you into a SAS usable form (shown at the end of this document). You can import the data from the excel file posted on the course website:
(a)What is the probability of having a “highly abnormal” test result by “decade” of finishing place (1-10, 11-20, 21-30, 31-40, 41-50)?
What is the probability of having an “abnormal” test result in each decade of finishing place (1-10, 11-20, 21-30, 31-40, 41-50)?
(b)Plot decade of finishing place against the logit of the outcome “highly abnormal” test result (Note: a hand-drawn sketch would be fine; if you want to plot this in SAS, you can adapt code from the logit plot macro from lab 4, but it will not work directly because of the grouped nature of the data).
(c)Calculate the odds ratio that represents the increase in the odds of a “highly abnormal” test result for every one-unit higher finishing place (e.g., going from 15th place to 16th place or from 40th to 41st place). Note: use “abnormal” (gray circles) and “normal” skiers combined as the reference group.
(d)Calculate the odds ratio that represents the increase in the odds of a “highly abnormal” test result for every ten-unit increase in finishing place (e.g., going from 15th place to 25th place or from 38th to 48th place).
(e)Calculate the odds ratio that represents the increase in the odds of a highly abnormal test result for every jump in “decade” of finishing place (e.g., going from a 11-20 finisher to a 21-30 finisher).
(f)Compare the odds ratios in (d) and (e). Explain why they differ.
(g)Calculate the odds ratio that represents the increase in the odds of having a “highly abnormal” test result for top-ten finishers (compared to all other finishers).
(h)Calculate the odds ratio that represents the increase in the odds of being in the top ten given that you have an “abnormal” test result.
(i)Calculate the odds ratio that represents the increase in the odds of being in the top ten given that you have a “highly abnormal” test result.
(j)Briefly interpret these results.
DATA FOR SAS:
PlaceAb HiAbFrequency
10 0 6
1013
2 004
2014
3 002
3101
3015
4 001
4104
4012
5 005
5014
6 004
6101
6013
7 004
7015
8 003
8103
8012
9 004
9014
10 005
10013
11006
11101
12006
12101
13005
13011
14005
14102
15008
15101
16004
16101
16011
17005
17102
18004
18104
19002
19102
19011
20004
20011
21005
21101
22002
22101
22013
23004
24004
24101
24011
25001
25103
25011
26005
26101
27002
27102
28003
28102
29004
29101
29012
30003
30011
31004
31102
32003
32011
33004
34004
34101
35004
35011
36003
36101
36012
37004
37101
38004
38103
39004
39011
40003
41006
41101
42003
42101
43003
43101
43011
44001
44102
45005
46002
46102
47003
47102
48006
49003
49101
50003