The Simplest Way to Obtain Allele- and Parent-Specific Copy Number (As Defined in the Paper)

First, note that PLASQ only works on Affymetrix 100K Set .cel files that are ASCII text. Binary format files may be converted (version 4 to version 3) using Affmetrix’CEL File Conversion Tool, available athttp://www.affymetrix.com/support/developer/tools/devnettools.affx

The simplest way to obtain allele- and parent-specific copy number (as defined in the paper) is to proceedas follows. I am assuming that there are something like 8-15 normal diploid samples to "calibrate" the model with (less than this MAY be too few to obtain accurate parameter inferences; more than this MAY cause memory problems in R), in addition to the test samples you want to analyze. I am also assuming that you are familiar and comfortable with R.

1) Put the .cel files from the normal samples into two separate directories, say HND and XND, one for the Hind files and the other for the Xba files. If two different files are from the same sample (i.e. one for Hind and one for Xba), they should have EXACTLY the same name.

2)Put the .cel files from the test sample(s) into two separate directories, say HTD and XTD, one for the Hind files and the other for the Xba files. If two different files are from the same sample (i.e. one for Hind and one for Xba), they should have EXACTLY the same name.

3) In R, after calling the library(PLASQ), enter the following command:

PSCN<-pscn("HTD", "XTD", "HND", "XND",

computeBetas=T,

normFile="tmp", betasFile="betas.Rdata", rawCNfile="rawCN.Rdata")

Note that here "tmp", "betas.Rdata", and "rawCN.Rdata" may be replaced by whatever file names you choose, as may the object name PSCN.

This will take a VERY long time to run, but the output will keep you updated with regard to how it's progressing. When it's done, PSCN will be a matrix whose rows are the SNP sites and whose columns are: SNPID, followed by major and minor chromosome copy numbers for the test sample(s). The SNP IDs may be mapped to their genomic locations using the SNPinfo matrix found by entering

data(SNPinfo)

Alternatively, the PSCNs may be plotted using the command

PSCNplot(mat)

where mat is a 3-column matrix of PSCNs (as obtained with the command pscn) whose columns are: SNPID, and minor and major chromosome copy numbers of the test sample you wish to plot.

Note that, if you run more test samples later, you needn't recompute the parameters, and can thereafter use the command

pscn("HTD", "XTD", "HND", "XND", betasInfile="betas.Rdata", [or whatever you called the file above]

normFile="tmp", rawCNfile="rawCN.Rdata")

This should take less time to run.

4) If you desire (SNP) allele-specific copy numbers, enter the commands:

load("rawCN.Rdata") (or whatever you used for the rawCNfile argument above)

ASCN<-ascn(rawCN, PSCN)

where PSCN is the object obtained from the pscn command.

Now ASCN will be amatrix whose rows are the SNP sites and whose columns are: SNPID followed by allele A and allele B copy numbers for the test sample(s). The SNP IDs may be mapped to their genomic locations using the SNPinfo matrix found by entering

data(SNPinfo)