SupplementalMethods
Brief Report: Somatic Mutations and Ancestry Markers in Hispanic Lung Cancer Patients
Gimbrone et. al.
Patients and Tissues
Three tissue sources contributed to this study (Table 1). Patient sex, smoking status, and tumor histology were collected from patient medical records, as available (Supplemental Table S1). Patient age, tumor stage and outcome were not available for all of the patients for analysis. DNA sequencing and ancestry analysis were performed on as many samples as possible; however, DNA was limiting in some cases. Statistical comparisons were made on the largest subgroups possible, as indicated within each Figure or Table. Tissue samples designated “Moffitt” were acquired from patients diagnosed primarily with NSCLC who consented to Moffitt Cancer Center’s Total Cancer CareTM (TCC) program between April 2006 and August 2010. The University of South Florida (Protocols 00001222 and 00011723) Institutional Review Board approved this study. Tissue samples designated “Puerto Rico” were acquired through the Puerto Rico Bio-Bank. The Ponce Health Sciences University (Protocol 080121-IF)Institutional Review Board approved this study. Tissue samples designated “Perú” were from untreated patients diagnosed with lung adenocarcinoma who had surgery during the years 2013- 2014 in Lima, Perú, where institutional approval was obtained locally for the collection of tumor blocks and clinical data.
DNA isolation. TCCTM-provided DNAs were extracted from fresh-frozen macro-dissected tumor tissues using Qiagen DNeasy Blood & Tissue Kits according to the methods and consent protocols, previously described.1 DNA samples acquired through the Puerto Rico BioBank (PRBB) originated from lung biopsies in the form of fine-needle aspiration collected in PAXgene Tissue Containers (Qiagen) or were from archived tissue blocks. Samples from Perú were all from archived tissue blocks. Candidate tissue blocks were cut, H&E-stained, and reviewed by the tissue procurement core pathologist to verify the diagnosis, histology, and the amount of tumor present. Tumor margins were marked and each tumor specimen was excised in Moffitt’s Tissue Core Facility. DNA quantity and quality were assessed using a Qubit and Tape Station, respectively.
DNA sequencing. Sequencing information across exons of 1,321 genes for an initial 28 H/Ls samples was obtained via our TCCTM institutional sequencing protocol.1 Subsequently, another 76 samples were subjected to a targeted Agilent panel matching the original 1321 genes of the TCCTMprotocol. Another 11 samples were subjected to an Illumina whole-exome sequencing panel. The processed sequencing reads were analyzed on Moffitt’s cluster. Alignment and refinement was performed using BWA 2 and PICARD ( against reference hs37d5 and variant calling was done using GATK 3through a BASH pipeline. The VCF file was annotated with ANNOVAR 4and read and exported using Varsifter 1.85. Heterozygous or homozygous variants were exported only if they resulted in a change in protein sequence. Synonymous and intronic mutations were excluded from analysis. Additionally, since matching normal DNA samples were not available for many of the tumor samples, 25 unmatched normal samples from healthy Puerto Rican blood donors were sequenced and used to filter the sequenced tumors. Any mutation occurring in the normal samples was removed. Quality filters were applied and any variant call with a VSQR Tranche score from 99.90-100.00 was removed. To further account for unmatched normal samples and eliminate false positive mutation calls the Broad Institute EXAC database of SNP’s was downloaded. The EXAC database provides variant call frequency by global population frequency (AF_GLOBAL) and maximum frequency in any ethnic subpopulation (AF_POPMAX). Any variant call from our cohort that exceeded a 1% AF_GLOBAL or AF_POPMAX frequency was removed. Finally, to account for batch effects of multiple sequencing types, identical nucleotide variants that were associated with a single sequencing run and at a frequency greater than that of the most common canonical mutation were excluded.
Global ancestry estimations. A set of 106 single nucleotide polymorphisms (SNPs) that discriminate Indigenous American, African, and European ancestry was used to estimate the proportion of genetic ancestry. The SNPs were chosen to maximize information for more than one ancestral population pairing, with a large difference in allele frequency between ancestral populations. They are widely spaced throughout the genome and have a well-balanced distribution across all 22 autosomal chromosomes. Genotyping was performed using a multiplex PCR coupled with single base extension methodology with allele calls using a Sequenom analyzer. The AIMs panel and reference populations have been described previously.6 For each sample, genetic ancestry was estimated using ADMIXTURE7.
Supplemental References
1.Fenstermacher DA, Wenham RM, Rollison DE, Dalton WS. Implementing personalized medicine in a cancer center. Cancer J. Nov-Dec 2011;17(6):528-536.
2.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. Jul 15 2009;25(14):1754-1760.
3.DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. May 2011;43(5):491-498.
4.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. Sep 2010;38(16):e164.
5.Teer JK, Green ED, Mullikin JC, Biesecker LG. VarSifter: visualizing and analyzing exome-scale sequence variation data on a desktop computer. Bioinformatics. Feb 15 2012;28(4):599-600.
6.Fejerman L, John EM, Huntsman S, et al. Genetic ancestry and risk of breast cancer among U.S. Latinas. Cancer research. Dec 1 2008;68(23):9723-9728.
7.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. Sep 2009;19(9):1655-1664.
8.Pradhan A, Lambert QT, Griner LN, Reuther GW. Activation of JAK2-V617F by components of heterodimeric cytokine receptors. J Biol Chem. May 28 2010;285(22):16651-16663.
1