Statistics Overview: 23 Feb 2015

Statistics Overview: 23 Feb 2015

Statistics Overview: 23 Feb 2015

Bill LeBlanc

  1. Parametric Statistics
  2. Comparison of Means
  3. Fixed Effects
  4. Random Effects
  5. Trends
  6. Repeated Measures
  7. Growth Curves
  8. Power
  1. NonParametric Statistics
  2. Counting Observed vs Expected
  3. Risk Factor x Primary Outcome
  4. Prospective – Cohort Studies: use risk factor to look for primary outcome
  5. Use Relative Risk
  6. Retrospective – Case Control Studies: use primary outcome to look for risk factor
  7. Use Odds Ratio
  8. Experimental Design vs Observational design

Parametric Statistics

I’ll break statistics into 2 branches: parametric and non-parametric. To illustrate the parametric branch, I’ll use an example that is often used or needed by you – power calculations. At one time or another, I’m sure that you all have requested from Miriam a power calculation for a grant proposal. These requests all boil to a common question – how many people do I need?

Why is N so important?

Here is an example. Don Nease is in the end stages of a study where he administered a questionnaire to (let’s say) 100 people. This questionnaire measured a trait called PAM13. His intention was to measure these people, then perform an intervention, then measure them again to see if there was a change. This is the crux of purpose of studying statistics – we want to COMPARE twomeasures and see if there is a change. And think of CHANGE as a difference between what is observed and what is expected. Stamp that into your mind – “A CHANGE FROM WHAT IS EXPECTED”.

Let’s make a plot of Don’s data:

50 55 60

This is a distribution of PAM13 means from multiple samples of 100 people. Of course we only have one sample, but from that one sample, we can create this plot of what, theoretically, it would look like if we were to draw, say, 1000 samples of 100 people who have not undergone Don’s intervention. Of course we only have one sample (and that is always because we have limited time, money, staff, facilities, etc. Otherwise we would have more data.) The mean of our one sample is 55 and the standard deviation is 5. The mean and standard deviation (technically this is the standard error) from our sample are used to estimate the parameters (or population characteristics) of the population of samples of size 100. This is where the “parametric” comes in in “Parametric Statistics”. There is a sound mathematical justification for taking these two numbers, 55 and 5, and drawing this curve. It is called the Central Limit Theorem.

In “Non-Parametric Statistics”, we are not talking about populations -- basically we are just “counting things”, and from those counts, we “expect” certain patterns to emerge (there’s that word EXPECT again).

45 55 65

This plot shows us what we can expect from any draw of 100 people’s PAM13 scores. 95% of the time, we will get a mean somewhere within 2 standard deviations – in other words, if we drew an infinite number of samples of 100 people, we would get a mean somewhere between 45 and 65 ( 55 – 2*5, 55 + 2*5) 95% of the time. Sometimes, (2.5% of the time approximately) we’d get a sample mean greater than 65. That point, 2 standard deviations above or below the mean, coincides with our traditional cutoff value that we use to determine statistical significance. We call that “ALPHA”. If Don, before the outset of the project, set Alpha at .05, then what he was saying is that, if he applies the intervention and then measures the 100 people afterward and finds a mean PAM13 score greater than 65 (or less than 45), he is going to conclude that his intervention significantly changed the PAM13 trait in the people who underwent the intervention. 2.5% of the time, he would see a score greater than 65, (and 2.5% of the time he’d see a score less than 45) and if his intervention really was ineffective, he would still say it was effective, and he would be mistaken. This mistake is known as “Alpha error”. He is willing to be mistaken 5% of the time.

To illustrate this with a more concrete example (I’m assuming that flipping coins is more familiar to you than measuring PAM13 scores): you are flipping a coin. Your NULL HYPOTHESISis that the coin is fair. Don’s null hypothesis is that his intervention is ineffective. In general, a null hypothesis is the expectation (there’s that word again) that nothing will happen, or that the status quo will remain the status quo, or that the coin is fair and will produce 50% heads over multiple flips. For the coin flipping example, you are prepared to flip the coin 10 times. You set an alpha equal to 8 heads. In other words, you will remain assuming that the coin is fair if you see 3 to 8 heads. If you see 9 or 10 heads (or 0 or 1 head), you will reject the null hypothesis that the coin is fair and conclude that the coin is not fair.

Let’s say you see 7 heads. You fail to reject your null hypothesis that the coin is fair and conclude that the coin is indeed fair. One of 2 things is true:

  1. the coin is indeed fair and you witnessed a not-unusual event which lead you to correctly conclude that the coin was fair; or
  2. the coin is indeed biased and you failed to detect it, thus incorrectly concluding that the coin is fair when it is not

Let’s say you see 9 heads. You reject your null hypothesis that the coin is fair and conclude that the coin is biased. One of 2 things is true:

  1. the coin is indeed fair and you witnessed a rare event which lead you to incorrectly conclude that the coin was not fair (you just made an alpha error); or
  2. the coin is indeed biased and you correctly concluded that the coin is biased (this is named “POWER” – remember I said we were going to discuss POWER?)

or in a diagram:

Your Decision / True State of Affairs
Null Hypothesis True / Null Hypothesis False
Retain Null Hypothesis / Correct / Beta error
Reject Null Hypothesis / Alpha error / Correct: Power
1 – beta

So, now we’re in the homestretch. Let’s look at the table above in graphical form. We have 2 distributions, one centered at 55, which represents that state of affairs when the null hypothesis is true, and one centered at 62, representing the state of affairs when Don’s intervention really changed people. When we retest the 100 people, we will be drawingor sample from this changed (ie, different) population, whose center is at 62. Let’s say the mean of the re-sample is 62.

So, if people really are changed, and we want to conclude that from this sample whose mean is 62, we need to change this figure of two overlapping graphs into the 2 separate-distribution figure below. There are two ways of doing that (i.e., there are two ways of increasing POWER):

  1. One is to create an intervention so powerful that it produces a population whose mean is far to the right of the original null-hypothesis distribution. Then there will be no overlap, and all the samples of size 100 taken from the new population will have a mean far to the right of the alpha cutoff (65) of the null hypothesis distribution.
  2. The other is to shrink the spread of the distributions so that the alpha cutoff point shrinks from 65 to some smaller value closer to 55. This is the technique that underlies the question to Miriam: “How many people do I need?” The variance of the distribution of means is inversely related to the square root of the sample size. In English: If we increase our sample from 100 to 400 people, that four-fold increase will shrink the variability by a factor of 2 (ie, it will be one half, or 2.5 instead of 5). Now our alpha cutoff is 55 + 2*2.5, or 60 instead of 65.

55 60 62

So now we’ve covered several important points:

  1. We have an idea of what “parametric” statistics means
  2. We see that by using a distribution, we can “see” what it means to “expect” an outcome
  3. We can conceptualize when a comparison results in a difference that is or is not “expected”
  4. We can “see” the two ways of being wrong (alpha and beta error) and the 2 ways of being right (“nothing good ever happens to me”, and power)
  5. We now know that by manipulating conditions, we have an experimental design as opposed to an observational design.

Non-Parametric Statistics

It is not always the case, but today’s example of non-Parametric Statistics will be illustrated by an observational design. In an observational design, we do not manipulate characteristics of the design (doing so would imply that they are predictive, rather than correlational). We just record and analyze what we see.

This is a stretch, but let’s equate non-parametric with simple probabilities that can be ascertained by counting. Let’s use the toss of 2 dice as an example. One die is red and the other is green. There are 6 faces on each die, and each face has a differing number of spots, from 1 to 6. So we have G1, G2, G3, G4, G5, and G6, and R1, R2, R3, R4, R5 and R6 representing the number of spots on each die. Rolling them together, we get a combination of green and red spots that will total into a “count”: eg, G2 and R4, which results in a total count of 6 spots. The minimum count is 2 (G1 + R1) and the maximum is 12 (G6 + R6). There is only one way to get a “2”: G1 + R1. But there are two ways to get a “3”: G1 + R2 and G2 + R1. There are 3 ways to get a “4”, 4 ways to get a “5”, and so on. The figure below shows all 36 outcomes:

R1 / R2 / R3 / R4 / R5 / R6
G1 / 2 / 3 / 4 / 5 / 6 / 7
G2 / 3 / 4 / 5 / 6 / 7 / 8
G3 / 4 / 5 / 6 / 7 / 8 / 9
G4 / 5 / 6 / 7 / 8 / 9 / 10
G5 / 6 / 7 / 8 / 9 / 10 / 11
G6 / 7 / 8 / 9 / 10 / 11 / 12

By simply counting all the outcomes, we can determine the probabilities of getting aspecific outcome, e.g., the probability of getting a 9 or greater is (4 + 3 + 2 +1) / 36. => there are 4 ways of getting a 9, 3 ways of getting a 10, 2 ways for 11 and 1 for 12 = 10/36 = .28. So we would “expect” (there’s that word again!) to get a 9 or greater 28% of the time if we repeatedly rolled 2 dice. So instead of getting an expected percent from a mathematical formula, like we did in the parametric example above using the normal distribution, we get our expected percent using probabilities derived from counting outcomes. This will be our “poor man’s definition” of non-parametric statistics.

So now we will perform a non-parametric observational study. We can do this one of 2 ways: retrospectively or prospectively. Either way, we are looking at 2 things: an exposure and an outcome. In a retrospective study, the exposures and outcomes have already occurred. We are looking at the results, trying to attribute a correlational relationship. In a prospective study, the outcome has not yet occurred – we start with intact subjects and follow them for a period to see if some distinguishing exposure factor at the outset leads to a defined outcome.

Let’s start with a retrospective study. Last month, someone brought cupcakes to the departmental meeting. A day later, 12 people were out sick, and we want to know if perhaps the cupcakes were involved. We collect the following, observational data (we are not manipulating any factors):

Exposure / Outcome
Got sick / Did not get sick / Total
Ate cupcakes / 10 / 40 / 50
Did not eat cupcakes / 2 / 98 / 100
Total / 12 / 138 / 150

Just by eyeballing this table, you would probably recommend passing on the cupcakes next time. Let’s quantify this decision. The top left and bottom right interior cells (10, 98) are logically appealing in their implications: you eat the cupcakes and you get sick + you don’t eat the cupcakes and you don’t get sick. We’ll call that “signal” - that’s the way things should work (if the cupcakes are bad). The other 2 interior cells, top right and bottom left (40 , 2) are “noise” – you eat the cupcakes and don’t get sick + you don’t eat the cupcakes and you get sick – that’s not right! So let’s compare the “right” cells to the “not right” cells, by doing a little arithmetic: we multiply the “right” cells together; we do the same for the “not right” cells, then we divide the former by the latter: 10*98 / 2*40 = 12.25. We are going to call this the “odds ratio”. You can see that if the ratio of signal to noise is close to 1, then there is not much of a relationship between exposure and outcome – it’s all noise, numerator and denominator. However, in our case, the (large) value of 12.25 indicates that the “signal” cells (10 and 98) outweigh the noise cells (2 and 40), supporting our first, eyeball impression.

This retrospective study is also called a case-control study. When choosing our “not exposed” group, we could have attempted to “match” the non-exposed group to the “exposed” group. For example, if all the people who ate cupcakes were men, we might have chosen only men from the group of 100 DFM members who did not eat the cupcakes. Thus the “case control” group would be slightly more similar to the exposed group, and maybe this would reduce the variability in the outcomes.

Now, we just took these numbers as they occurred – we did not do any randomizing or controlling for confounding variables, so we cannot make any generalizations to a population. If we could do some sort of design manipulations so that these groups were more representative of a population, we would slightly modify our numerical calculations, and compute a quantity called the relative risk. We will define the relative risk of getting sick as the ratio of the percentages (incidences) of peoplein each category (“ate” and “didn’t eat”) that got sick. For those who “ate”, 10/50 or 20% got sick. For those who didn’t eat, 2/100 or 2% got sick. We will define the relative risk as the ratio of these two percentages: 20% /2% = 10. This means that those who ate cupcakes were 10 times more likely to get sick. This is also called the “risk ratio” and it is a little more intuitive than the odds ratio.

The Odds Ratio and Risk Ratio are similar in that they both measure association between “exposure” and “outcome”. When the incidence of the outcome is small (eg, <10%) compared to the non-outcome, the formula for relative risk approaches the formula for the odds ratio. Generally, the odds ratio is used for retrospective studies and the risk ratio is used for prospective studies.

In summary, we saw that there are two major divisions in statistics: Parametric and Nonparametric. We “expect” our sample results, under the null hypothesis, to resemble our population parameter(s). We have a benchmark value for how much we are willing to tolerate a deviation from that expected value before we conclude that our observed results describe a population that is different from that of our null hypothesis. This benchmark value is usually found in one of the many tables at the back of any statistics text.

Power refers to the ability to detect a difference between observed and expected when there really is a difference. This represents a “correct” decision. The other type of “correct” decision is when we conclude there is NO difference, when there really is NO difference. The two types of errors we can make are: concluding that there IS a difference when there really is NO difference (alpha error – which we can control by declaring an alpha value, usually 0.05, at the outset of the experiment) and beta error, which is failing to detect a difference when there really is one. We can enhance power either by magnifying the effect of our experimental intervention (which may not be much under our control) or by increasing the N of the experimental groups (which may be more controllable).

Nonparametric statistics are “counting” statistics. Enumeration of differing outcomes lets us compute probabilities of occurrence to determine whether the observed outcome differed from the expected outcome by a specified amount. When we have observational studies (as opposed to experimental studies), we tabulate the joint occurrence of exposure/outcome counts and determine the probability of occurrence of such a pattern. For prospective studies, we start with 2 groups of subjects that are and are not exposed to some factor and follow them for a period of time, and then count the outcomes in each group. From those counts, we compute the relative risk, or risk ratio. For retrospective studies, where we start with two groups that have an outcome and divide them into exposed/not exposed categories, we calculate the odds ratio to determine the relative odds that each group will exhibit the outcome. If the incidence of the outcome event is low in each group, the odds ratio numerically approaches the risk ratio.

Z Inprogress Presentation 23Feb2015 NormalDistribution JPG