Supplementary figure legends

Supplementary Figure 1. Stability of cluster assignments by unsupervised clustering of array CGH data by NMF method. Stable cluster assignments when patterns of associated CNAs are used to define 2 or 3 groups. Clustering into more than 3 groups did not yield stable cluster assignments. Twenty iterations of NMF: 199x199 samples. Red: stable co-clustering. Blue: no co-clustering

Supplementary Figure 2. Two-cluster partition of array CGH data based on unsupervised clustering by NMF method with data shown for each individual sample. Top part of figure is same as top panel in Figure 2.

Supplementary Figure 3. Three-cluster partition of array CGH data based on unsupervised clustering by NMF method with data shown for each individual sample. Top part of figure is same as bottom panel in Figure 2.

Supplementary Figure 4. Impact of EGFR mutation status on overall survival. EGFR mutation did not have a significant effect on overall survival in this patient cohort (p=0.18 for difference in survival curves; p=0.159 from univariate Cox proportional hazards regression; hazard ratio of 1.66). This is not entirely unexpected as the favorable prognostic impact of EGFR mutation (in the absence of EGFR kinase inhibitor therapy) is more evident in advanced stage lung adenocarcinoma than in patients with resectable early stage tumors (Marks et al. 2008).

Supplementary Figure 5. EGFR copy number versus expression, annotated for EGFR mutation status. Samples with EGFR alterations are as indicated in the legend. The correlation of copy number to expression for the EGFR locus is highly significant (p<0.0001 and R=0.52, Pearson's product moment correlation). EGFR mutant samples generally have higher EGFR expression for a given EGFR copy number, compared to samples lacking EGFR mutation. Likewise, among samples lacking evidence of EGFR amplification, EGFR expression levels are higher in mutant cases.

Supplementary Figure 6. Unsupervised clustering of U133A expression data. Unsupervised clustering analysis was performed separately on two parts of the expression profiling dataset (due to subtle differences between the Affymetrix U133A and U133A 2.0 array platforms). There was a well-defined sub-cluster of EGFR mutated cases in the set of samples studied on Affymetrix U133A arrays (n=106). This sub-cluster was not as apparent in the unsupervised analysis of the samples studied on Affymetrix U133A 2.0 arrays (not shown). KRAS-mutated cases did not cluster well in either of the datasets, consistent with the more limited gene expression differences detected by supervised analysis based on this variable (Supplementary Table 5).

Supplementary Figure 7. Venn diagram of inter-relationships between EGFR mutation, DUSP4 loss, and P16/CDKN2A loss. The significant relationships are as shown. We note that among the 152 DUSP4-diploid tumors, EGFR mutations and p16/CDKN2A deletions showed no association (p=0.77), suggesting that the primary driver of these inter-relationships is co-selection of DUSP4 loss with each of the other two alterations.

Supplementary Figure 8. DUSP4 induction upon EGF stimulation of human bronchial epithelial cells (HBECs) transfected with mutant or wild type EGFR constructs. HBECs stably transfected with mutant EGFR show increased DUSP4 transcriptional upregulation upon EGF stimulation, compared to HBECs transfected with wild type EGFR. The bimodal response in the data for the HBECs transfected with mutant EGFR with exon 19 deletion is consistent with a system containing a negative feedback loop. DUSP4 mRNA levels were quantitated on the Sequenom platform (see Supplementary Materials and Methods). EGF concentration was 50 ng/ml. Parallel cell cultures were extracted for RNA at different time points to establish time course data.

Supplementary Figure 9. A. Confirmation of expression of the DUSP4-GFP protein. Western blotting using a GFP antibody shows expression of the expected DUSP4-GFP fusion protein 24 hours after cDNA transfection (as noted elsewhere, we have found commercial DUSP4 antibodies to be unreliable). B. Confirmation of appropriate subcellular localization of the DUSP4-GFP protein. Fluorescence microscopy showed nuclear positivity for DUSP4-GFP fusion protein expression 24 hours after cDNA transfection, consistent with the expected nuclear localization of DUSP4.

Supplementary Figure 10. Loss of DUSP4-GFP-positive H1650 cells upon antibiotic selection. H1650 cells were transfected with vectors encoding GFP or DUSP4-GFP fusion protein by electroporation. Twenty four hours after transfection, cells were selected by growth in 500 µg/ml G418 for one week. The percentages of GFP-positive cells were estimated by FACS analysis before and after G418 selection. Expression of DUSP4-GFP protein was silenced in H1650 cells after one week of antibiotic selection, but GFP was not.

List of supplementary tables

Supplementary table 1. Summary statistics on 199 cases of lung adenocarcinoma. (Table follows in main manuscript file)

Supplementary table 2. Notable minimal common regions of gain or loss (Table follows in main manuscript file)

Supplementary table 3. Associations of EGFR and KRAS mutation status with unsupervised clusters of CNA data. (Table follows in main manuscript file)

Supplementary table 4. Expression profiles of selected MCRs of gain and outlier expression profiles of selected broad gains (provided as Supplementary Excel file)

Supplementary table 5. Expression profiles of EGFR, KRAS, and TP53 mutation (provided as Supplementary Excel file). Note: FDR cutoffs were selected for signature size equivalence.

List of files provided online at

1. Sample annotation table for complete n=199 sample set

2. Raw expression microarray data (Affymetrix CEL files)

3. GCnorm version of the Agilent aCGH data