Version 2.1 (Mar, 2008)
User Manual for Software MPDA
The Developing Group of Software MPDA
Table of Contents:
1. Introduction
2. Software Installation and Initialization
3. Description of Working Directories
4. MPDA Interface, Functions and Operating Procedures
5. Data Input Format
6. Examples
7. Troubleshooting
8. MPDA Version Upgrade
Appendix A – CPA Estimate
Appendix B – AF Estimate
Appendix C – Association Analysis
Appendix D – Allelic Imbalance Analysis
1. INTRODUCTION
Software MPDA (Microarray Pooled DNA Analyzer) is a powerful tool for pooled DNA data analysis. It can be downloaded at the MPDA website (Figure 1-1). The main functions of MPDA consist of (1) estimating the CPA (coefficient of preferential amplification) and its standard error (s.e.); (2) estimating the AF (allele frequency) and its s.e.; (3) association analysis including single-point pooled DNA association test and multipoint pooled DNA association test; and (4) allelic imbalance analysis including single-point allelic imbalance detection and multipoint allelic imbalance detection.
MPDA was developed under the software platform, MATLAB®, and provided user-friendly interfaces adapted to Windows systems (Windows 98/2000/XP). For users without installing software MATLAB®, we have also developed stand-alone executables generated via the MATLAB® compiler (The newly added function in MPDA Version 2.0). In this manual, we outline the downloading and setup procedures in Section 2. We describe the working directories of MPDA in Section 3. We introduce the interfaces, analytic functions and operating procedures of MPDA in Section 4. Data input formats according to different genotyping platforms and calling algorithms are explained in Section 5. Detailed running procedures of two examples included with MPDA are provided in Section 6. Troubleshooting for software installation and execution is given in Section 7. The information of MPDA version upgrade is shown in Section 8. Appendices A – D introduce statistical methods used in MPDA. In addition, MPDA also provides eight examples that the data can be downloaded at the MPDA website and the operation procedures of each example can be found in documents, which can also be downloaded at the same website.
Figure 1-1. The MPDA website
2. SOFTWARE INSTALLATION AND INITIALIZATION
MPDA was developed based on software MATLAB® and adapted to MS Windows® 98/ME/NT/2000/XP/2003. MPDA can be executed using different versions of MATLAB®. In addition, stand-alone executables of MPDA were also developed. The standalone MPDA can be run on machines without installing MATLAB®. We illustrate the installation procedures of the two versions below.
2.1 MPDA with a user-friendly interface (MATLAB® is required)
Installation procedures of MPDA using MATLAB® software, version R2006a:
l Step 1 – Download a zip file ‘MPDA.rar’ from the MPDA website http://www.stat.sinica.edu.tw/hsinchou/genetics/pooledDNA/mpda.htm (Figure 2.1).
l Step 2 – Unzip the file ‘MPDA.rar’ to get the main directory ‘MPDA’ (Figure 2.2).
l Step 3 – Move the directory ‘MPDA’ to the designated directory where MATLAB® was installed, e.g., ‘C:\Program Files\MATLAB\R2006a’ (Figure 2.3).
l Step 4 – Initialize MATLAB®, and enter its user interface (Figure 2.4).
l Step 5 – Click the button ‘File’ in the command bar of ‘MATLAB Command Window’, and select the button ‘Set Path’ (Figure 2.5).
l Step 6 – Click the button ‘Add Folder’, and select the working directory (i.e., ‘MPDA’) in the designated directory to add folder. Click the button ‘Save’ to save the path (Figure 2.6).
l Step 7 – Click the button ‘Add Folder’ again, and select the subdirectory ‘Database’ in the directory ‘MPDA’ to add a new folder. Click the button ‘Save’ to save the path (Figure 2.7).
l Step 8 – Key the command ‘MPDA’ in the command line in ‘MATLAB Command Window’ to enter the MPDA environment (Figure 2.8).
After finishing the eight steps, the welcome page of MPDA will be shown (Figure 2-9). Two interfaces of MPDA are designed for both the association analysis (Figure 2-10) and the allelic imbalance analysis (Figure 2-11), respectively.
2.2 MPDA with standalone executables (MATLAB® is not required)
Installation procedures of the standalone MPDA and MATLAB® Component Runtime as follows:
l Step 1 – Download an executable file ‘MCRInstaller.exe’ from the MPDA website http://www.stat.sinica.edu.tw/hsinchou/genetics/pooledDNA/mpda.htm (Figure 2.12).
l Step 2 – Execute the executable file ‘MCRInstaller.exe’ to install MATLAB® Component Runtime (Figure 2.13). (If MATLAB® Component Runtime cannot be installed smoothly, please refer to Section 7.1 for troubleshooting.)
l Step 3 – Download a zip file ‘MPDA_mcr.rar’ from the MPDA website http://www.stat.sinica.edu.tw/hsinchou/genetics/pooledDNA/mpda.htm (Figure 2.14).
l Step 4 – Decompress the zip file ‘MPDA_mcr.rar’ to a destination directory (e.g., ‘C:\Program Files\MATLAB\R2006a¥MPDA’) (Figure 2.15).
l Step 5 – Click ‘Start’ and select ‘Run’ to open a window of ‘Run’ (Figure 2.16).
l Step 6 – Type command ‘cmd’ to activate the MS-DOS Prompt window (Figure 2.17). (If the appearance of the MS-DOS Prompt window is not proper, please refer to Section 7.2 for troubleshooting.)
l Step 7 – Change the path to the main directory ‘MPDA’ (Figure 2.18).
l Step 8 – Type ‘MPDA’ and press [Enter] key to execute an executable ‘MPDA.exe’ (Figure 2.19).
After finishing the eight steps, the welcome page of standalone MPDA will be shown (Figure 2.20).
Figure 2-1. Step 1 – Download a zip file ‘MPDA.rar’ from the MPDA website http://www.stat.sinica.edu.tw/hsinchou/genetics/pooledDNA/mpda.htm.
Figure 2-2. Step 2 – Unzip the file ‘MPDA.rar’ to get the main directory ‘MPDA’.
Figure 2-3. Step 3 – Move the directory ‘MPDA’ to the designated directory where MATLAB® was installed, e.g., ‘C:\Program Files\MATLAB\R2006a’.
Figure 2-4. Step 4 – Initialize MATLAB®, and enter its user interface.
Figure 2-5. Step 5 – Click the button ‘File’ in the command bar of ‘MATLAB Command Window’, and select the button ‘Set Path’.
Figure 2-6. Step 6 – Click the button ‘Add path’, and select the working directory (i.e., ‘MPDA’) in the designated directory to add path. Click the button ‘Save’ to save the path.
Figure 2-7. Step 7 – Click the button ‘Add path’, and select the subdirectory ‘Database’ to add path. Click the button ‘Save’ to save the path.
Figure 2-8. Step 8 – Key in the command ‘MPDA’ in the command line in ‘MATLAB Command Window’ to enter the MPDA environment.
Figure 2-9. Interface 1 of MPDA – Main interface
Figure 2-10. Interface 2 of MPDA – Association analysis
Figure 2-11. Interface 3 of MPDA – Allelic imbalance analysis
Figure 2-12. Step 1 – Download an executable file ‘MCRInstaller.exe’ from the MPDA website http://www.stat.sinica.edu.tw/hsinchou/genetics/pooledDNA/mpda.htm
Figure 2-13. Step 2 – Execute the executable file ‘MCRInstaller.exe’ to install MATLAB® Component Runtime.
Figure 2-14. Step 3 – Download a zip file ‘MPDA_mcr.rar’ from the MPDA website http://www.stat.sinica.edu.tw/hsinchou/genetics/pooledDNA/mpda.htm.
Figure 2-15. Step 4 – Decompress the zip file ‘MPDA_mcr.rar’ to a destination directory ‘C:\Program Files\MATLAB\R2006a¥MPDA’.
Figure 2-16. Step 5 – Click ‘Start’ and select ‘Run’ to open a window of ‘Run’.
Figure 2-17. Step 6 – Type command ‘cmd’ to activate the MS-DOS Prompt window.
Figure 2-18. Step 7 – Change the path to the main directory ‘MPDA’
Figure 2-19. Step 8 – Type ‘MPDA’ and press [Enter] key to execute an executable ‘MPDA.exe’
Figure 2-20. A welcome page of standalone MPDA will be showed in an MS-DOS Prompt window
3. DESCRIPTION OF WORKING DIRECTORIES
3.1 MPDA with a user-friendly interface (MATLAB® is required)
The main directory name of MPDA is ‘MPDA’, which consists of four directories, MPDA license and some program files (Figure 3-1).
l Directory ‘Input’ – All input data must be saved in this directory.
l Directory ‘Output’ – All output results will be saved automatically in this directory.
l Directory ‘Example’ – Two illustrated examples discussed in the MPDA paper are provided in this directory.
l Directory ‘Database’ – Reference databases of CPA and databases of AF mean and prediction error for the analyses of the Affymetrix GeneChip Human Mapping 100K Set and 500K Set data are saved in this directory.
l File ‘License.txt’ – The MPDA license.
l Files *.p – The program source codes of MPDA.
Figure 3-1. Working directories and program files of MPDA
3.2 MPDA with standalone executables (MATLAB® is not required)
The main directory name of MPDA is ‘MPDA’, which consists of four directories, MPDA license and two program files.
l Directory ‘Input’ – All input data must be saved in this directory.
l Directory ‘Output’ – All output results will be saved automatically in this directory.
l Directory ‘Example’ – Two illustrated examples discussed in the MPDA paper are provided in this directory.
l Directory ‘Database’ – Reference databases of CPA and databases of AF mean and prediction error for the analyses of the Affymetrix GeneChip Human Mapping 100K Set and 500K Set data are saved in this directory.
l File ‘License.txt’ – The MPDA license.
l File ‘MPDA.exe’ – The executable file of standalone MPDA.
l File ‘MPDA.ctf’ – MATLAB® functions and data file that define MPDA.
4. MPDA INTERFACE, FUNCTIONS AND OPERATING PROCEDURES
4.1 MPDA with a user-friendly interface (MATLAB® is required)
Three interfaces of MPDA contain a welcome page and two main components. The welcome page, i.e., main interface, gives a short introduction to software MPDA (Figure 2-9). The first component was developed for association analysis (Figure 2-10). The second component was developed for allelic imbalance analysis (Figure 2-11).
4.1.1 Welcome page
This component provides a short introduction to software MPDA and its functionalities. There is one item on this page. Users can check the box ‘Component 1’ to carry out association analysis or the box ‘Component 2’ to carry out allelic imbalance analysis.
4.1.2 Component 1 – Association analysis
The purpose of this component is to provide a whole-genome association analysis for the artificially pooled DNA data. The analysis is useful in identifying loci associated with a particular trait of interest. There are seven main items in this component (Figure 2-10) as follows.
l Item 1 – Input/Output directory
Users should provide directories for data input files (e.g., ‘C:\Program Files\MATLAB\R2006a¥MPDA¥Input’) and result output files (e.g., ‘C:\Program Files\MATLAB\R2006a¥MPDA¥Output’). MPDA will automatically read data from the specified input directory and save outputs in the specified output directory.
l Item 2 – Number of groups studied
Component 1 in MPDA provides a one-group analysis (CPA and AF) or a two-group analysis (CPA, AF and association analysis) for artificially pooled DNA data. Users can check the box ‘One group’ to estimate CPA and determine whether to calculate adjusted AF. Or, users can check the box ‘Two groups’ to estimate CPA and determine whether to calculate adjusted AF and carry out association tests. Moreover, users should specify which CPA calibration methods are applied to the subsequent association analysis. Users can check the box ‘Yes’ to apply constant CPAs or ‘No’ to use unequal CPAs between two groups.
l Item 3 – Data type for CPA estimation
MPDA permits three input data types for CPA estimation as follows:
(1) Affymetrix format: Data files of hybridization intensities, which are obtained from the software GDAS, GCOS, CNAT and BAT (Affymetrix, CA, USA), should be provided. First, users should select which type of Affymetrix gene chips is used, i.e., 100K or 500K. Second, users can check ‘All autosomes’ to carry out whole-genome analysis or check only some specific chromosomes of interest.
(2) Raw CPA/heterozygote ratio: CPA data and/or the corresponding s.e. should be provided. First, users should select which type of Affymetrix gene chips is used, i.e., 100K or 500K. Second, users can provide their own CPA reference or directly use the MPDA-provided CPA reference datasets (CPA from a combined population and CPA from the Taiwanese population) provided by MPDA.
(3) Peak intensity: For non-Affymetrix users, data can be reformatted into pairs of peak intensities for each heterozygote individual to calculate CPA. Three CPA estimators (arithmetic mean, geometric mean, and bias-correction CPA) are provided in the analysis of MPDA. The statistical formulae are shown in Appendix A.
l Item 4 – Calculation of the bootstrapped s.e. of the CPA estimate
MPDA calculates s.e. of CPA estimate based on a parametric bootstrapping resampling procedure, where composite relative allele signals (CRAS) are modeled by a beta distribution. Users can check the box ‘Yes’ and then key in the number of bootstrap replications between 10 and 1,000 to calculate s.e. Or, users can check the box ‘No’ to omit the calculation. The detailed procedure can be found in Appendix A.
l Item 5 – Estimation of adjusted AF
Users can check the box ‘Yes’ and then select the Affymetrix-format hybridization intensities or pairs of peak intensities to calculate adjusted AFs. Or, users can check the box ‘No’ to omit the calculation. The estimation procedure can be found in Appendix B.
l Item 6 – Single-point pooled DNA association test
Users can check the box ‘Yes’ and then key in the experimental s.e. to carry out a single-point pooled DNA association test. Or, users can check the box ‘No’ to omit the procedure. The detailed testing procedure can be found in Appendix C.
l Item 7 – Multipoint pooled DNA association test
Users can check the box ‘Yes’ and check seven options to conduct a multipoint pooled DNA association test. The seven options are listed as follows.
(1) Data type for the association test: Users can select the Affymetrix-format hybridization intensities, pairs of peak intensities, or raw p-values obtained from previous single-point association tests for a multipoint association test.
(2) Map information: Users can check the box ‘Yes’ to input marker positions for the latter graph demonstration of multipoint p-values. If users check the box ‘No’ to omit the inter-marker distances, then the graph will be shown with an equal intermarker distance.
(3) Weight function: Users can check ‘Equal weight’ to assign equal weights to all marker loci. Or, users can check ‘User-specified weight’ and then provide a set of weights.
(4) Threshold value of truncation: Users can specify a truncation threshold. The threshold must be between 0 and 1 to exclude markers whose p-values from the previous single-point association test are greater than the threshold.
(5) Number of Monte Carlo simulations: Users should enter the number of Monte Carlo simulations. The number of simulations must be between 500 and 10000.