Power Analysis
We will use the Power for Genetic Association Analysis (PGA) package from is a modeler in which we can test various assumptions about our data and see what impact they have on our ability to detect associations with phenotype.
This package has been developed within MatLAb a mathematics software program and so it is also necessary to first install the MATLAB RuntimeComponent (MRC) mrcinstaller.exe to create the environment in which PGA pga.exe can run. Both these are available in the course folder together with the paper by (Menashe et al; 2008) and the PGA Readme.pdf describing the package.
First double click on the mrcinstaller.exe and follow the instructions and then double click on the pga exe and follow the instructions.
Power in this context is the probability that we will detect an association if it exists. For example there is a known association with APOL1 alleles and resistance to Human African Trypanosomiasis. What is the chance that we will detect that known association if we sample a certain number of people? This is different from the confidence in the association if we detect it. We normally require a 5% risk of a type 1 error (rejecting the null hypothesis that there is no association when it is true leading to a false positive). Because we are selecting our samples at random we might not detect an association even if it exists. The power is the probability that we will detect that association at the 5% confidence level (or other level that we choose). The minimum acceptable power for an experiment is commonly set at 80%, meaning that there is an 80% chance of detecting a true association at the 5% level if it is there. But it is good to have more power in the experiment so that there might be a 90% chance
In the excercise we will assume an additive genetic model. That means that two risk alleles of a SNP (homozygous) have twice the effect of one risk allele (heterozygous). Press on where it says Co-Dominant (1df) in the Genetic Model box.
We will test the power of two different study designs.
1)A Genome Wide Associaition Study (GWAS) with 1 million SNP and 5,000 cases and 5,000 controls
2) QTL mapping study on samples of known ancestry with 200 cases and 200 controls genotyped at 200 loci.
We will start with the GWAS so enter 5000 in the Case Number Box
As we are planning million independent SNP assays, we need to adjust our confidence level to allow for this. Enter 1000000 in the EDF (Effective Degrees of Freedom) box this is equivalent to a Bonferroni correction. (To make this correction the normal 5% confidence level for a single test is divided by the number of tests that are done). Press “Run”.
Look at the plot and make sure that you understand it. You should see the marker allele frequency on the x-axis and the Detectable Relative Risk on the Y axis. So you can see that the higher the Marker allele frequency the lower the Relative Risk that is detectable.
What is the minimum marker allele frequency at which we will detect any association with a relative risk of 2?
What is the minimum relative risk that we can detect with any marker allele frequency?
Below is a table showing the parameters of the model and the answers to those two questions for the first set of conditions. Each time you change a parameter rerun the model, enter the new conditions in the table and not the answers to the questions
GENE NAMEModel / R2 / EDF / Cases / Control
:Cases / Prevalence / Power / Alpha / Max RR / Min MAF / Min RR
CD1 / 1 / 50 / 500 / 1 / 0.01 / 0.9 / 0.05 / 2 / 0.09 / 1.6
Disease Prevalence
Next we will test the effect of using a more realistic disease prevalence. Let us assume that we are studying a disease that has a prevalence of 0.2%. If you interested in a particular phenotype with a different prevalence then enter that instead. Run the model and enter the parameter and results in the table. What difference did the change make?
Linkage
We are not testing all the SNP in the genome. Only marker SNP that are assumed to be linked to the disease SNP. We cannot assume that linkage is perfect so we will test for reducing the value of r2 between marker and disease SNP. Click on where it says R2 so that it knows that you are using that notation and enter 0.9 in the box. Rerun the model, record your observations. Repeat this for linkages of 0.8, 0.7, 0.6 and 0.5. Make a note of the results as we will need to look at them again when we are selecting SNP.
If you want to use fewer than 1,000,000 markers in a GWAS then the linkage between markers and causative polymorphism is likely to be significantly less that 0.9 reducing he power of the experiment.
Power
If we are prepared to reduce the chance of detecting real associations, we may detect smaller effects. Change the Power from 0.9 to 0.8 (80%) and run the model and record your observations.
Population size
Halve your population size and see what effect that has on the relative risk that can be detected.
Genetic Model
Not many associations have been found between genetically complex condiitons and recessive genetic conditions. Click on “Recessive” in the Genetic Model box run the model and record your observations. Do you think we might find associations with recessive conditions?
QTL mapping
Saving your work
Repeat the exercise using 200 cases and 200 markers to test the power of a QTL mapping study.Before doing that you might like to save your plot using the file menu and export data and export figure. These would be useful data to include in your report or publication demonstrate that you have thought about the design of your experiment.
Saving your again
The next exercise will involve changing the scale on the plot which will wipe it clear. Before doing that you might like to save your plot using the file menu and export data and export figure..
Population Size needed to detect a RR of 1.3
In complex diseases it is rare to find alleles that have RR > 1.5. If we wanted to find a significant association with an allele that had a relative risk of 1.2 how many cases would we need to sample? It will be easier to do this if you reduce Maximum RR in the box at the bottom to 1.5. Increase the population size until an Relative Risk of 1,2 is detectable with a Marker Allele that has a frequency of 0.2. How many cases would be needed? Would this be practicable?
Summary
What factors do you think are most important for designing an experiment to detect associations. Which ones were less important and had smaller effect on power?
Minor Allele Frequencies of SNP in our data
You have seen that Allele frequency has and important effect on power. Therefore it is useful to know the allele frequency distribution in the dataset. We will do that in the introductory Plink excercises.
- Menashe, I., Rosenberg, P. S. & Chen, B. E. PGA: power calculator for case-control genetic association analyses. BMC Genet9, 36 (2008).