Supplementary Methods

Clinical cohorts and patient sample characteristics

Clinical and demographic information for 68 surgically resected, mostly early-stage, KRAS-mutant lung adenocarcinomas included in the TCGA cohort can be obtained from the TCGA portal ( Level 3 somatic mutation, copy-number (GISTIC2.0),RNA-Seq, miRNA and protein expression (RPPA) data for these tumors were downloaded directly from the same source and used for subsequent analyses.

Clinical and somatic mutation data for KRAS, TP53 and STK11 for 47 KRAS-mutant LUACs included in the Chitale el al cohort have been previously reported(16).

Clinical and mRNA expression data from the JBR.10 trial have been previously described (51) and were accessed through a publicly available database (GSE14814). KRAS-mutation status for tumors with available mRNA expression data was kindly provided by Dr Ming-Sound Tsao, Dr Frances Shepherd and Dr Chang-Qi Zhu.

The PROSPECT (Profiling of Resistance Patterns and Oncogenic Signaling Pathwaysin Evaluation of Cancers of the Thorax and Therapeutic Target Identification) dataset includes tumor tissue collected, following informed consent under an IRB-approved protocol, from patients who underwent surgical resection of NSCLC with curative intent between 1996 and 2008at the UT MD Anderson Cancer Center (Houston, Texas)(20). Only tumors classified as adenocarcinomas resected from patients who had not received neo-adjuvant chemotherapy or radiation therapy were incorporated in the current study.A total of 41 KRAS-mutant LUACs with available mRNA expression data were included in the expression-based clustering, whereas whole-exome sequencing data were available for 19 of these tumors.

BATTLE-2 (Biomarker-Integrated Approaches of Targeted Therapy for Lung Cancer Elimination)isa multicenter, biomarker-integrated, biopsy-mandated randomized phase 2 trial of targeted therapy in patients with advanced (Stage 3B or 4) NSCLC whose disease has relapsed or progressed following at least one front-line metastatic chemotherapy regimen or within 6 months of adjuvant therapy or therapy for locally advanced disease (NCT01248247) (17).The trial protocol has been approved by IRBs at MD Anderson Cancer Center and all other participating institutions. Therefore, patients enrolled in BATTLE-2 represent a patient population with advanced, incurable disease that is refractory to platinum-based cytotoxic chemotherapy. Following informed consent and prior to randomization, eligible patients undergo protocol -mandated fresh tumor sampling (FNA or core biopsy) and assessment of KRAS mutation status (codons 12,13,61) in real-time in a CLIA-certified laboratory. KRAS mutation status and 8-week disease control rate (the primary endpoint of the trial)guide early (Stage 1) adaptive randomization into four possible treatment arms: 1) Erlotinib 2) Erlotinib coupled with MK-2206 (pan-AKT inhibitor) 3) AZD6244 (MEK1/2 inhibitor) and MK-2206 and 4) Sorafenib, with further refinement of the adaptive randomization model to incorporatethe most predictive biomarkers tested during Stage 1 in Stage 2. Data included in the current study originate from 41 KRAS-mutant LUACs included in Stage 1 of the BATTLE-2 clinical protocol. Tumor tissue surplus to diagnostic requirements was subjected to comprehensive exploratory molecular analyses including massively parallel targeted exome sequencing of cancer related genes and array-based gene expression profiling. 41 KRAS-mutant LUACs from this cluster underwent targeted exome sequencing of cancer-related genes. Expression profiling on the Affymetrix platform was performed in 36 tumors.

DNA sequencing and data processing

TCGA tumors

Whole exome sequencing of the 68 KRAS-mutant lung adenocarcinomas that form part of the TCGA cohort was performed as previously described (12). Level 3 somatic mutation data for these tumors (as well as for the 77 additional KRAS-mutant tumors analyzed subsequently in order to assess the observed versus expected rates of STK11/LKB1 and TP53 co-mutations) were downloaded directly from the TCGA Portal (

PROSPECT tumors

Details regarding sample collection and processing have been previously reported(20).Lung adenocarcinomas and matched normal lung tissues were subjected to paired whole exome sequencing at the W.M. Keck Facility at Yale University. Total DNA was isolated using the AllPrep DNA/RNA Mini kit according to the manufacturer’s protocol (Qiagen, Valencia, CA). Genomic DNA was quantified using the Quanti-iTTM PicoGreen® dsDNA Reagent (Life Technologies) according to the manufacturer’s instructions. DNA was sheared by sonication, adaptor-ligated, fractionated and amplified for library preparation as previously described (52).Genomic DNA was captured on the NimbleGen 2.1M human exome array and subjected to 75-bp paired-end sequencing using the Illumina HiSeq2000 platform as previously described (52). Sequence reads were mapped to the reference genome (hg19) using the ELAND program as previously described (53). Non-target reads were filtered and the resulting coverage was calculated from remaining on-target reads. Mean tumor coverage was > 200X and mean normal tissue coverage was > 100X.

Somatic mutations were identified by statistically comparing, using Fisher’s exact test, reference and non-reference reads in tumors compared to their corresponding normal lung tissue. Thresholds for Fisher’s exact test were computed by estimation of the null distribution (54). ELAND was also used for detection of small insertions and deletions (indels) (53). To identify genes with significantly increased somatic mutation burden compared to that expected by chance, we utilized an in house-developed pipeline that adjusted the probability of observing mutations in each gene by the size of the gene, its level of expression in lung tissue, and ratios of silent versus non-silent variants (53). In addition, mutations were permuted at least 106 times randomly across the gene’s covered base pairs, respecting trinucleotide context, and the mutation burden score of the randomized instance was calculated (invex algorithm, (55). Genes that passed the genome-wide threshold of P = 2.4 x 10-6, defined by mutation burden scores for random that were equal to or greater than the observed burden, were determined to be significant and a subset of these (KRAS, STK11, and TP53) were used for subsequent analyses in the present study.KRAS mutation status for the larger group of 41 PROSPECT tumors included in the current study was determined by pyro-sequencing and confirmed using the Sequenom MassARRAY® platform.

BATTLE-2 tumors

Massively parallel sequencing of all coding exons from 287 cancer-related genes and selected introns from 19 genes was based on the FoundationOneTM test (Foundation Medicine®) and was performed as previously described (56). Briefly, ≥50ng of tumor DNA extracted from FFPE tumor biopsy was used for sequencing library constructionand hybridization-based targeted capture of 4557 exons and 47 introns corresponding to known cancer-related genes, prior to 49x49 paired-end sequencing on the Illumina HiSeq2000 platform to >500x mean coverage. Subsequent analysis of DNA sequence data was performed according to an in-house pipeline that facilitates accurate identification of base substitutions, short insertions/deletions (indels), focal amplifications, bi-allelic deletions and specific gene fusions.

Chitale et al cohort

Direct sequencing of all exons of TP53 and LKB1reported by Chitale et al was performed at Agencourt Biosciences (Boston, MA) as previously described (16). Sequenom assays were applied in parallel, to confirm the absence of the prevalent nt109C/T_Q37* and nt508C/T_Q170* mutations in STK11 exon 1 and exon 5, respectively. KRAS mutation status was also assessed by direct sequencing, with additional confirmation of specific base substitutionsby Sequenom MassARRAY-based assays.

Copy number analysis

Copy-number alterations in TCGA tumors were assessed using Affymetrix SNP 6.0 array profiling of tumor and matched control DNA, as previously described (12) and GISTIC 2.0 was used for identification of significant focal copy number change(21). Level 3 data were downloaded from the TCGA portal and used for subsequent analyses.

NMF consensus clustering

In order to identify the most discriminatory genes within our dataset, we initially selected genes with expression levels ≥0 in at least 5 samples andadjusted all negative expression values to 0. For each gene, thebimodality index (BI), mean and standard deviation (SD) were then calculated across all samples. 384 genes with BI≥1.4, mean ≥2 and SD≥70 percentile among all genes were subsequently selected for the NMF algorithm. Unsupervised NMF consensus clustering was performed using an R package, NMF Version 0.5.06 (14). The consensus matrix was derived by averaging connectivity matrices over 200 clustering runs.

microRNA sequencing

miRNA-Seq of LUACs included in the TCGA cohort was performed as previously described and level 3 data were downloaded directly from the TCGA portal (12).

Expression signature derivation

The cluster assignment signature was derived using the ClaNC (Classification to Nearest Centroids) algorithm. ClaNC was developed by Dabney et al, and used to determine cluster membership for high-throughput microarray gene expression data(15, 14). The source code can be downloaded from ANOVA and Tukey’s test were applied to compare gene expression in the three LUAC clusters. Genes significant at a FDR level of 0.05 with at least two pair-wise comparisons yielding P values ≤0.05 (Tukey’s post-hoc test) and expression fold change (FC)≥2 were selected. A final list of 18 genes (6 genes per cluster) was established with the lowest cross-validation (CV) error using ClaNC.

Reverse Phase Protein Array (RPPA) analysis

The RPPA methodology and data analysis pipeline have been previously described (12). For TCGA, level 3 data were downloaded directly from the TCGA portal and utilized in subsequent analyses. PROSPECT tumors were processed and the data analyzed following the same general schema with minor modifications (12,57). In brief, tumors were lysed in RPPA lysis buffer [1% Triton X-100, 50mM HEPES (pH 7.4), 150mM NaCl, 1.5mM MgCl2, 1mM EGTA, 100mM NaF, 10mM NaPPi, 10% glycerol supplemented with fresh PMSF (1mM final concentration), Na3VO4 (1mM final concentration), protease (cOmplete) and phosphatase (phosSTOP) inhibitor cocktail (Roche Applied Science). Five serial dilutions of each protein lysate were printed on nitrocellulose-coated slides using an Aushon Biosystems 2470 arrayer (Burlington, MA) and stained sequentially with primary and secondary antibodies in an autostainer (BioGenex), prior to signal detection using a signal amplification system and DAB-based colorimetric reaction. MicroVigene Software (VigeneTech) as well as an in-house R package were used to assess spot intensities and the SuperCurve method was applied to estimate protein levels in each sample. For comparisons, data were log transformed (to the base of 2) and median-centered across antibodies to correct for protein loading.

Clonality analysis

Variant allele frequency (VAF) data and segmented log2 ratio values were obtained from TCGA website ( ABSOLUTE algorithm was applied to all point mutations to estimate sample purity, ploidy and to infer cancer cell fractions (CCF) of each mutation as described previously (22). Mutations were classified as clonal based on the posterior probability that the cancer cell fraction exceeded 0.95 and sub-clonal otherwise.

Cell Viability Assay and IC50 estimation

Cell viability was assessed using the Cell-Titer-Glo® Luminescent assay (Promega),according to the manufacturer’s protocol, with minor modifications. Cells in the exponential growth phase were washed once with PBS, trypsinized, passed five times through a 21G needle to generate single cell suspension and counted using a CountessTM automated cell counter (Invitrogen). An optimized number of viable cells foreach cell line(selected so that untreated cells remain in the logarithmic growth phase for the duration of the assay) were then plated in opaque 384-well plates (Greiner Bio-One), in triplicate wells for each experimental condition. Cells were allowed to attach overnight and were subsequently exposed to seven different concentrations of drug (serial three-fold dilutions) in a final volume of 40µL of media per well. Plates were spun at 500rpm for 30 seconds to ensure even addition of drug and returned to the incubator for 72 hours. Re-suspended CellTiter-Glo® reagent (11µL) was then added to each well, and contents were mixed on an orbital shaker for 15 minutes prior to recording of bioluminescence using a FLUOstar OPTIMA multi-mode micro-plate reader (BMG LABTECH). Average readings from triplicate wells were then expressed as a percentage of average bioluminescence recorded from 7-14 control wells treated with vehicle (DMSO) at a concentration of 0.347% (v/v), representing the highest DMSO concentration in drug-treated cells. A dose-response model was used to estimate IC50 values from cell viability data. Multiple models from the DoseFindingand drcpackages were fitted and the best model was selected based on RSE (Residual Standard Error) using the R software. Median IC50 values for each cell line from 3 to 5 independent experiments were used for statistical comparisons.

Drugs

17-AAG(Tanespimycin), AUY922 and Ganetespib (STA-9090) were purchased from Selleck Chemicals and were re-suspended in DMSO to a final concentration of 10mM (17-AAG, AUY922) or 5mM (Ganetespib). Drug aliquots were stored at -80oC and each aliquot was used only once. 3,3’-Methylene-bis(4-hydroxy-coumarin) (dicumarol) was purchased from Sigma.

Quantitative RT-PCR

Total RNA was isolated from retrovirally-transduced stable A549 and H460 cell lines expressing full-length, wild-type LKB1 or empty vector using Triazol (Life Technologies), according to the manufacturer’s protocol. RNA quality and quantity were assessed by Nanodrop (Thermo Scientific). 1μg of total RNA was retrotranscribed using iScript™ Reverse Transcription Supermix for RT-qPCR (BIORAD) and analyzed by quantitative PCR using SYBR Green (Life Technologies) with the following primers (ATF4 forward: GTTCTCCAGCGACAAGGCTA, ATF4 reverse : ATCCTGCTTGCTGTTGTTGG, sXBP1 forward : CTGAGTCCGAATCAGGTGCAG, sXBP1 reverse: ATCCATGGGGAGATGTTCTGG, ACTB forward : GCGAGCACAGAGCCTCGCCTTTG, ACTB reverse: CGACGACGAGCGCGGCGATAT) according to the manufacturer’s protocol. The comparative Ct method was used to calculate the relative abundance of ATF4 and spliced XBP1 (sXBP1) transcripts compared with ACTB.

Western blotting

Western blot analysis of whole cell extracts was performed using standard methods. After washing with PBS, 300µL of ice-cold lysis buffer [1% Triton X-100, 50mM HEPES, pH 7.4, 150mM NaCl, 1.5mM MgCl2, 1mM EGTA, 100mM NaF, 10mM Na pyrophosphate, 1mM Na3VO4, 10% glycerol, supplemented immediately prior to cell lysis with phenylmethylsulfonyl fluoride (1mM final concentration), cOmplete protease inhibitor and phosSTOP phosphatase inhibitor cocktail (Roche Applied Science)] was added directly to each 10cm dish and cells were scraped using a disposable cell lifter (Fisher Scientific). For suspension and semi-adherent cell lines the supernatant was also collected in 15mL polypropylene tubes, spun at 1000rpm for 5min and the pellet was washed once in PBS prior to lysis with 100-300µL of lysis buffer. Lysates were spun at 14,000rpm for 15 minutes at 4oC using a refrigerated micro-centrifuge (Mikro 200R, Hettich), cleared supernatant was collectedand protein concentration was quantified using the colorimetric DCTM Protein Assay (BIO-RAD).35 µg of total protein was resolved in 4-20% pre-cast gradient gels (BIO-RAD) and transferred to PVDF membranes using the Trans-Blot Turbo transfer system and Trans-Blot Turbo RTA transfer kit (BIO-RAD) according to the manufacturer’s protocol. Membranes were blocked in 5% non-fat dry milk (BIO-RAD) in 0.1% TBS-Tween (150mM NaCl, 10mM Tris-HCL, pH 8) for 1 hour at RT and incubated with the following primary antibodies in 5% BSA in 0.1% TBS-Tween: LKB1(27D10) (1:1000 dilution, #3050,Cell Signaling Technology), phospho-p70S6Kinase (Thr 389) (1:1000 dilution,#9205 Cell Signaling Technology), phospho-S6 Ribosomal Protein (Ser 235/236) (1:5000 dilution, #2211 Cell Signaling Technology), IRE1α(14C10) (1:1000 dilution,#3294 Cell Signaling Technology), phospho-eIF2α (Ser51) (119A11) (1:1000 dilution,3597 Cell Signaling Technology ), phospho-4EBP1 (Ser65) (174A9)(1:000, #9456, Cell Signaling Technology), phospho-p44/p42 (Thr202/Tyr204) (E10) (1:1000, #9106, Cell Signaling Technology), phospho-SRC (Tyr416) (1:1000,#2101 Cell Signaling Technology), SRC (32G6) (1:1000, #2123 Cell Signaling Technology), c-MYC (1:1000, #9402, Cell Signaling Technology), CHK1 (2G1D5) (1:000, #2360 Cell Signaling Technology), CA-IX (H-120) (1:500, sc-25599 Santa Cruz Biotechnology), NQO1 (A180) (1:3000, #3187 Cell Signaling Technology), BIP (1:1000, #3183 Cell Signaling Technology),ATF4 (1:8000, AV37017 Sigma), Vinculin (hVIN-1) (1:4000, V9131 Sigma). HRP-conjugated secondary antibodies were applied at a concentration of 1:3000 in 5% non-fat dry milk for 1 hour at RT and signal was developed with picoLUCENTTM PLUS-HRP (G-Bioscience) or ECL (Amersham) detection reagents.

Establishment of isogenic cell lines

Stable expression of wild-type LKB1 cDNA in the naturally LKB1-deficient A549 and H460 cell lines, and stable shRNA-mediated LKB1 knockdown in CALU-6 were achieved by lentiviral transduction withpLenti-GIII-CMV-hSTK11-GFP-2A-Puro Lentiviral /pLenti-CMV-GFP-2A-Puro-Blank vectors (Applied Biological Materials Inc.) and GIPZ Lentiviral human STK11 shRNA (Clone V3LHS_348649)/GIPZ Non- silencing Lentiviral shRNA Control (Open Biosystems)vectors respectively. Viral vectors were initially co-transfected with second generation lentiviral packaging plasmids (psPAX2 and pMD2.G) into 293T cells usingLipofectamine®2000 Reagent with PLUSTM Reagent (Life Technologies). Viral supernatant was collected 3 days following transfection and viral particles were concentrated with PEG-itTM Virus Precipitation Solution (System Biosciences) overnight at 4°C, according to the manufacturer’s protocol. NSCLC cells were then incubated withmedia containing viral particles supplemented with 8µg/mL of polybrene (Sigma) overnight at 37°C. Fluorescence-activated cell sorting (FACS) was used to single-cell sort GFP-expressing cells into separate wells of a 96-well plate.Single-cell derived colonies were expanded in media containing 2µg/mL of puromycin (Gibco).

Immunohistochemistry

Formalin-fixed, paraffin-embedded blocks with representative tumor were selected for each case and 4 micron-thick tissue sections were cut for immunohistochemistry (IHC). IHC staining was performed in an automated staining system (Leica Bond Max, Leica microsystems, Vista, CA, USA) using previously optimized IHC parameters. The antibodies used in this study included PD-L1, PD1, CD3, CD4, CD8, CD45RO, CD57, Granzyme B and FOXP3. Detailed antibody clone information and dilutions employed are listed in the table below. All antibodies were detected with the Leica Bond Polymer Refine detection kit (Leica Microsystems), including diaminobenzidine reaction to detect antibody labeling and hematoxylin counterstaining.

For quantification, stained slides were digitally scanned using the Aperio® ScanScope Turbo slide scanner (Leica Microsystems). Digital images were captured at ×200 magnification. Images were visualized using ImageScope™ software (Leica Microsystems,) and digital image analysis was performed using Aperio Image Toolbox (Aperio, Leica Microsystems). For PD-L1 and immune profiling analysis, 5 randomly selected square regions (1 mm2) in the core of the tumor were analyzed from each case (the same for each marker).The cell membrane staining algorithm was usedto obtain the H-score (0-300) for PD-L1. Both cancer and non-cancer cells were scored. For immune profiling analysis, the density of the different types of immune cells was assessed in each area.