Gene Expression and Statistical Analysis

Supplemental Methods

Gene expression and statistical analysis

OCI-LY3 and Daudi cells were infected for 48 hours with lentivirus expressing p100-, p105- or Luciferase (Luc)-shRNA. After two weeks of selection with puromycin, Western blot analysis confirmed knockdown of p100 or p105. Triplicate samples of each shRNA-expressing cell line were used for comparative gene expression analysis. RNA samples were extracted from 50×106 cells using Qiashredder and RNeasy Mini kit (Qiagen) following the manufacturer’s protocol. Quality of the RNA was checked by an Agilent 2100 Bioanalyser (GSE24020).

The raw data were processed using R software and BioConductor packages to identify the full p100- and p105-targeted gene lists. In brief, microarray hybridization data were prepared by Cogenics, Inc. using an Agilent 4X44 platform. Scanning and image analyses were performed using an Agilent 2100 Bioanalyzer and an Agilent MR-2 DNA Microarray Scanner (Agilent Technologies, Inc.). Raw data was deposit in Gene Expression Omnibus (accession number GSE24020). The raw data were first log2-transformed and quantile-normalized 13. The differential expression among the cells expressing p100, p105 or Luc shRNA was assessed by implementing a linear model and empirical Bayes statistics (Limma package,14,15). Using a false discovery rate of 0.001 applied to p-values adjusted for multiple testing according to the Benjamini-Hochberg method, we selected a set of genes for which the expression levels were highly affected after silencing p100 or p105 compared with control cells to create p100 and p105 target gene lists for each cell line 16. The gene lists obtained from OCI-LY3 and Daudi cell lines were subsequently combined to generate a common p100 or p105 target gene list. The microarray data analyzed in this study were previously deposited in the NIH Gene Expression Omnibus database at under the accession number GSE24020.

The raw data from a cohort of DLBCL patients, which were included in a previously reported microarray dataset (GSE4475), were log2-transformed and quantile-normalized prior to selecting probes contained in the p100 and p105 gene list. A complete linkage agglomerative hierarchical clustering analysis demonstrated two sets of tumors with an expression pattern specific to each pathway (training set). To evaluate genes that were robust for predicting pathway dysregulation, we performed a significance analysis of microarrays (SAM) of the training gene expression dataset.Different delta values (delta values of 7 for genes down-regulated in the p100 list and 4.5 for those down-regulated in the p105 list;9.5 for up-regulated genes in the p105 list and 7 for those up-regulated in thep100 list, Figure 1B) and a false discovery rate of zero yielded 80 significant genes. Next, we performed a complete linkage agglomerative hierarchical clustering analysis of the DLBCL microarray training dataset. The gene list was then filtered to exclude genes with probes that failed to cluster with a corresponding classifier, leaving a final list of 48 genes that were equally distributed between the p100 and p105 classifiers.

Finally, we applied both gene lists to previously published gene expression datasets (GSE10846 17 and the complete population of DLBCL cases included in GSE447518). The raw data in all datasets were preprocessed and analyzed following similar methods, as above.

RNA interference constructs

RNA interference hairpins were expressed under the control of the U6 human promoter inpLKopuro.1 (provided by S. Stewart, Washington University). Complementary shRNA oligos were annealed and cloned into vectors digested with AgeI and EcoRI and confirmed by sequence analysis. The sequence of the sense shRNA oligonucleotide probes were as follows: p105: CCTTCCGCAAACTCAGCTTTA, p100:GCTGCTAAATGCT-GCTCAGAA, Rel A:CGGATTGAGGAGAAACGTAAA and Rel B: AGCCCGTCTATGA-CAAGAAAT.Luciferase shRNA plasmid was kindly provided by S. Stewart.

Western blot analysis

Cells were lysed with cell lysis buffer (50mM Tris-Cl, pH 8, 5mM EDTA, 100mM NaCl, 0.5% Triton X-100 and protease and phosphatase inhibitors). Immunoblotting was performed as described previously (1). The following antibodies were used: p100 (sc-7386), p105 (sc-7178), Rel A (sc-372), Rel B (sc-226), glyceraldehyde 3 phosphate dehydrogenase (GAPDH, sc-137179),CARD11 (SC-166910), IKK (SC-71333) all from Santa Cruz Biotechnology. Tak1 (5206S),actin (4970S) and phospho-antibodies for TAK1 (4531S), p100 serines 866/870 (4810S) and CARD11 (5189S) are from Cell Signaling. Phospo- IKKwas purchased from ABCAM and Phospho-p100 serine 707 (07-1829) Millipore. Proteins were visualized in an SRX101A imager (Konica Minolta) using Immobilon chemiluminescent HRP substrate (Millipore).

Flow cytometry

We used a BD LSRII (BD Biosciences, San Jose, CA, USA) for flow cytometric analysis, followed by data analysiswith FlowJo Software (Tree Star Inc). For GFP-expressing cells, analysis was performed on FITC(+)-gated cells only. The following antibodies were used: IgM (cat#314520), CD10 (cat#340698), CD27 (cat#340425), CD38 (cat#561823), CD44 (cat#555478), CD95(cat#559773), obtained from BD Biosciences. For studies with GFP-expressing cells, CD10 antibody from eBiosciences was used (cat#56-0106-41).

qPCR and mRNA analysis

Single strand cDNA was synthesized using 5 g of total RNA and the QuantiTect reverse transcription kit (Qiagen). Relative gene expression levels were measured using Power SYBR Green Master Mix (Bio-Rad) and Applied Biosystems 7500 Fast sequence detection system. IRF4 and LMO2 primers were obtained from Qiagen; all others were obtained from Integrated DNA Technologies (Supplemental Table 1A). Amplification efficiency of individual primers was determined before performing qPCR. The relative expression level of each gene was measured by qPCR as described byPfaffl (2).GAPDH was used as the reference gene.

Tissue Microarrays

127 retrospective de-identified DLBCL samples were obtained from theinstitutional review board-approved hematology tissue acquisition and procurement bank programs at Stanford University and Emory UniversitySchools of Medicine. Three independent pathologists confirmed the pathological diagnosis of all samples. 127 0.5-mm cores from diagnostic areas of each DLBCL samplewere used to generate a single-recipient paraffin block using a tissue arrayer (Beecher Instruments, Silver Spring, Maryland).

Immunofluorescence (IF) studies

The methodology used to perform IF in cell cultures has been previously described(1).For primary tissue studies, 5-micron sections of the tissue microarray were deparaffinizedby incubating in an 80oC water bath three times for 20 minutes followed by three 5-minute incubations in xylene and a series of ethanol solutions (100%, 90%, 75% and 50%). After washing with distilled water, antigen retrieval was performed by immersing the slides in a microwave solution (9 ml of 0.01M citric acid, 41 ml of 0.01 mM sodium citrate and 450 ml of water) and microwaving at low power three times for 5minutes. Slides were pre-treated with blocking solution (10% goat serum/3% BSA/0.5% gelatin/PBS) for 1 hour to block non-specific binding sites. Primary antibodies for RelA (sc-372) and RelB (sc-226) were applied at 1:250 dilutions in 50 mM Tris-Cl (pH 7.4) with 3% goat serum overnight. After washing, secondary Alexa Fluor 488-conjugated antibodies (Molecular Probes) were applied for 1 hour. After further washing, slides were counter stained with 4, 6-diamidino-2-phenylindole (DAPI) for nuclear detection.

The nCounter™ System ChIP assay.

Fifty million OCI-LY3 cells were harvested before and after 60minutes of treatment with doxorubicin alone or in combination with rituximab, fixed with 1.1% formaldehyde and quenched with 0.125M glycine. Chromatin was isolated by sequentially adding 3 different lysis buffers (LB1: 50mM Hepes-KOH, pH 7.5, 140mM NaCl, 1mM EDTA, 10% Glycerol, 0.5% NP-40, 0.25% Triton X-100, LB2:10mM Tris-HCl, pH 8.0, 200mM NaCl, 1mM EDTA, pH 8.0, 0.5mM EGTA, pH 8.0 and LB3: 10mM Tris-HCl, pH 8.0, 200mM NaCl, 1mM EDTA, pH 8.0, 0.5mM EGTA, pH 8.0, 0.1% Na-Deoxycholate, 0.5% N-lauroylsarcosine) followed by disruption with a Dounce homogenizer. Using a Brandon sonifier cell disrupter 205 with output setting of 3 and constant power (Branson Ultrasonics, CT), lysates weresheared under cold conditions to an average length of 300-500bp. An aliquot of chromatin (30 g) was precleared with protein A agarose beads (Invitrogen). Genes withkB binding sites in the promoter region were isolated from the genomic DNA using an antibody against p105/p50 (Abcam, ab7971) or p100/p52 (Abcam, ab7972).Following incubation at 4oC overnight, protein A agarose beads were used to isolate the immune complexes. Subsequently, complexes were washed, eluted from the beads with a 1% SDS/50nM Tris/10 mM EDTA buffer, and subjected to RNase and proteinase K treatment. The antibody/chromatin complex was reversed by incubation overnight at 65oC, and ChIP DNA was purified by phenol-chloroform extraction and ethanol precipitation. Length of the genomic DNA was evaluated using Micro-Volume UV-Vis Spectrophotometer (NanoDrop 2000, Termo Scientific,DE).

To measure enriched binding at each gene loci, we made use of a newly described method, CHIP-string (3). To leverage the nCounter analysis system platform, we selected, when available, a probe set containing the B binding site for each loci complementary to the genes contained in the classifiers(Supplemental Table 2, NanoString Technologies). According to the subunit and treatment, we measured the percent of enrichment by comparing each gene level to a control level of enrichment of untranscribed regions.Quantification of DNA molecules by nCounter Analysis System was performed by NanoString Technologies.

nCounter-Chromatin immunoprecipitation (ChIP) assay analysis

nCounter-ChIP data (ChIP-String) provided digital counts of each probe across all experiments (no treatment or 1-hour treatment with rituximab, doxorubicin, or the combination of doxorubicin and rituximab, Supplemental Table 1). The two technical replicates of the reference sample (IgG control, “Mock”) were used to calculate an average value, and we used this average value as our reference to estimate the enrichment across the p105 and p100 conditions. The counts were log2-transformed to reduce the effect of extreme outliers in the dataset. The resulting data of the genes selected as regulated by p100 or p105 (both dependent and suppressed) were used to generate empirical distributions.

The nCounter™ System assay

We performed a targeted gene expression analysis of the p100 and p105 target genes inthe 39 DLBCL samples from Winship Cancer Institute of Emory University (IRB#IRB00041889) using an nCounter Analysis System (NanoString Technologies) using previously reported technology 18. We performed the nCounterTM assay using 100 ng of total RNA. The CodeSet contained probe pairs for 47 test genes and 3 control genes. Detailed sequence information for the target regions and reporter probes is listed in Supplemental Table2. All 47 genes and controls were assayed simultaneously in multiplex. Because the original 3 reference control genes (tubulin, -actin and GAPDH) fluctuated significantly across the experiment conditions, we accounted for differences in hybridization and purification efficiency by quantile-normalizing the log2 data. Subsequently, a supervised complete linkage agglomerative hierarchical clustering analysis was performed.

Correlation of nCounter ChIP and expression data

We used nCounter ChIP and expression data obtained from the OCI-LY3 cells, which were treated as described above. Based on changes in the gene expression observed under different experimental conditions, we compared the enrichment of p100 or p105 kb sites in the target gene list directly to changes in gene expression. We calculated the ratio of p105 ChIP condition raw data to the p100 ChIP condition raw data, and then we log2-transformed and quantile-normalized the ratio. The expected relationship among the conditions for each p100 gene followed control < Doxorubicin 1 (Dox)< Dox Rrituximab 1 (DR1), and for each p105 gene, the relationship followed control < DR1 < Dox1 (Dox1 = cells treated with doxorubicin for 1 hour, DR1 = cells treated with the combination of doxorubicin and rituximab for 1 hour, and control = no treatment). For the p105-dependent genes and the p100-suppressed genes, the gene expression values were expected to decrease from Dox1 to DR1; for the p100-dependent and p105-suppressed genes, the gene expression values were expected to increase from Dox1 to DR1. Thus, for p105-dependent genes, the DNA binding ratio of p105/p100 should mirror the direction of the p105-gene expression data (>1). By contrast, for p100-dependent genes, the DNA binding ratio of p105/p100 should mirror the direction of the p100-gene expression data (<1). The inverse relationship should be observed for suppressed genes. To assess the correlation between the ChIP-String and nCounter expression counts of each gene, we selected conditions (control, Dox1, and DR1) and genes common to both datasets and calculated the Pearson’s correlation coefficient between the ChIP ratio and the expression nCounter values for each gene. A p105-targeted gene was expected to have a positive correlation between the ChIP-String ratios and gene expression values, whereas a p100-targeted gene was expected to have a negative correlation.

Using a Fisher's exact test, we calculated the probability of obtaining the observed data and all datasets with more extreme deviations. A small probability would indicate that our observed set of data is unlikely to occur under the null hypothesis: namely, that the probability of a gene falling in cluster 1 is the same as the probability of a gene falling in cluster 2 for the genes identified as targets of either p100 or p105, as defined using the correlation between the ChIP-String and gene expression data.Using an exact calculated probability of 0.05, we determined the significance of the association between the gene designation according to the correlation coefficient and the gene classifier.

Agreement between nCounter expression data and AI Rel A and Rel B nuclear intensity

The nCounter expression data of p100 and p105 classifiers and the AI of Rel A and Rel B was determined in 39 tumor samples as described above. Activation of the pathway detected by our gene classifier was used as the gold standard. A receiver operating characteristic (ROC) analysis was performed to estimate the predictive power of AI ratio of RelA/RelB to detect the status of activation identified by the gene classifier. An optimal cutoff point for the AI ratio of RelA/RelB was estimated to maximize the sum of its sensitivity and specificity for detecting activation of each NF-B pathway. Agreement between the two methods was measured as follows:

where N1 is the total number of p105 samples and N2 is the total number of p100 samples according to our gene classifier, N is the total number of samples analyzed, Sen is the sensitivity and Spec is the specificity.

Multivariate analysis

Using the groups of patients with canonical (p105 classifier) or noncanonical (p100 classifier) NF-B pathway activation identified in the GSE10846 data set (Supplemental Figure 2A), we performed multiple logistic regression analysis of the interaction effect between DLBCL subtype and NF-B pathways.

Analysis software

The statistical software R and the related BioConductor function packages were used to analyze the significance of microarray and nCounter gene expression data. To identify genes with variable expression in the cell line experiments that generated the p100 and p105 gene lists, we used linear modeling approaches and empirical Bayesian statistics (limma package) (4, 5).

Patient gene expression profiles were normalized using quantile normalization methods. p100 and p105 classifiers were generated by performing SAM, using SAMr package (6). Heatmap analysis with dendrograms and complete linkage agglomerative hierarchical clustering analysis were performed using the Made4 package (7).

Cell lines

OCI-LY3, OCI-LY10, OCI-LY2, OCI-LY1, HBL1, SUDHL2, SUDHL4, SUDHL6, BAJB and RCK8 were kindly provided by Dr. Izidore S. Lossos, University of Miami, FL. SUDHL10, U2932, SUDHL9 and WSU-NHL were kindly provided by Dr. Sandeep S. Dave, Duke Univserity, NC. Ramos (CRL-1596) and Daudi (CCL-213) cells were purchased from American Type Culture Collection (ATTC). OCI-LY3, OCY-LY10, SUDHL6, SUDHL4, OCI-LY1, BJAB, RCK8, Ramos and Daudi were recently authenticated by short tandem repeats profiling.

References

1.Bernal-Mizrachi L, Lovly CM, and Ratner L. The role of NF-{kappa}B-1 and NF-{kappa}B-2-mediated resistance to apoptosis in lymphomas. PNAS. 2006;103(24):9220-5.

2.Pfaffl M. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Research. 2001;29(9):2003-7.

3.Ram O, Goren A, Amit I, Shoresh N, Yosef N, Ernst J, Kellis M, Gymrek M, Issner R, Coyne M, et al. Combinatorial Patterning of Chromatin Regulators Uncovered by Genome-wide Location Analysis in Human Cells. Cell. 2011;147(7):1628-39.

4.Smith G. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology. 2004;3(1):Article 3.

5.Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004;5(10):R80.

6.Tusher VG, Tibshirani R, and Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(9):5116-21.

7.Culhane A, Thioulouse J, Perriere G, and Higgins D. MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 2005;21(11):2789-90.