Supplementary The Cotton Fiber Transcriptome Defined by Expressed Sequence Tags Offers a Unique Perspective on Cellular Dynamics During Rapid Cell Elongation

A. Bulak Arpat1, Mark Waugh2, John P. Sullivan2, Callum Bell2,4 , Bill Beavis2, David Frisch3,5, Dorrie Main3, Todd Wood3,6, Geoffrey Routh1,7, Anna Leslie1,8, Rod A. Wing3,9, and Thea A. Wilkins1*

1Department of Agronomy and Range Science, University of California, Davis, CA, 2National Center for Genome Resources, Santa Fe, NM, and 3Clemson University Genomics Institute, Clemson University, Clemson, SC

*Corresponding Author:

Dr. Thea A. Wilkins

Department of Agronomy and Range Science

University of California

One Shields Avenue

Davis, CA 95616-8515

Tel: (530) 752-0614

Fax: (530) 752-4361

E-mail:

4Current Address:Callum Bell

EmerGen Inc.

390 Wakara Way

Salt Lake City, UT 84108

5Current Address:David Frisch

Genome Center of Wisconsin

445 Henry Mall, Genetics Bldg B19

University of Wisconsin

Madison, WI 53706

6Current Address:Todd Wood

Center for Origins Research and Education

Bryan College

P. O. Box 7000

Dayton, TN 37321

7Current Address:Geoffrey Routh

QIAGEN

Valencia, CA

8Current Address:Anna Leslie

CA&ES Genomics Facility

University of California

One Shields Ave.

Davis, CA 95616

9Current Address:Rod A. Wing

Arizona Genomics Institute

Dept. of Plant Sciences

University of Arizona

303 Forbes Bldg.

Tucson, AZ 85721

Abstract

Cotton fibers are single-celled seed trichomes of major economic importance in cultivated species. However, the molecular mechanisms that underpin the fiber phenotype and its agronomic properties are poorly understood. Functional genomic approaches were undertaken to identify the transcriptome in rapidly elongating cotton fibers, and to investigate dynamic changes in gene expression at key developmental stages using long oligonucleotide (70-mer) microarrays. Gene discovery was facilitated by generating 46,603 expressed sequence tags (ESTs) from fiber cDNAs that relied on deep sampling of non-normalized and normalized arrayed libraries from Gossypium arboreum L., a cultivated diploid species used to minimize redundancy due to polyploidy. The fiber transcriptome of ~14,000 genes is conservatively estimated to represent 35-40% of the cotton genome. In silico expression analysis, coupled with functional binning of the genes using gene ontology (GO), revealed that developing fibers undergoing rapid turgor-driven cell expansion exhibit significant metabolic activity. Dynamic changes in gene expression accompanied the developmental switch from primary to secondary cell wall synthesis detected using oligonucleotide microarrays containing 12,227 elements also showed the fiber dbEST to be highly stage-specific for primary cell wall synthesis and expansion. [df1]Expression profiling revealed a large contingent of housekeeping genes as well as a subset of differentially and developmentally expressed fiber genes that appear to play a prevalent functional role in a stage-specific manner. Of the 2,553 genes down-regulated concomitant with the termination of fiber expansion, the majority are metabolism related genes. In sharp contrast, only a small subset of 78 fiber genes, most of which represent rare transcripts in our fiber dbEST, were up-regulated during secondary cell wall synthesis, including genes involved in energy/carbohydrate metabolism and cell wall biogenesis. Secondary cell wall-specific fiber genes (GhCesA1, GhCesA2 and GhFbL2A) not found in our fiber dbEST and therefore used as stage-specific developmental controls, were strongly induced in 24 dpa fibers. This work provides the first view of the genetic complexity of the transcriptome in a single-cell undergoing rapid cell expansion, and lays the groundwork for future efforts to study fundamental processes in plant biology with applications in agricultural biotechnology.

Keywords: cell expansion, cell wall biogenesis, expression profiles, Gossypium arboreum, oligonucleotide microarrays, trichome

Abbreviations: EST, expressed sequence tag; NCGR, National Center for Genome Resources; PCW, primary cell wall; qPCR, quantitative real-time RT-PCR; SCW, secondary cell wall; XGI, X Genome Initiative;

INTRODUCTION

Plant trichomes are found on vegetative and reproductive organs throughout the plant kingdom, and exhibit considerable diversity in terms of size and morphology, distribution, and origin (1). Seed trichomes (“fibers”) of cultivated cotton (Gossypium) are single-cells virtually unrivaled in the plant kingdom as perhaps the longest cells in higher plants with a highly exaggerated growth rates well above average. As a major crop species, cultivated cotton provides a natural fiber of major economic importance that fulfills one of man’s basic needs. Cultivated species include Asiatic diploid species G. arboreum L. and G. herbaceum L. (2n=26, A genome) and allotetraploids G. hirsutum L. and G. barbadense L. (2n=4x=52, AD genome). The allotetraploids arose from a polyploidization event that took place ~1.5-2 mya between Old World A-genome as maternal parent, and a New World D-genome progenitor species (2,3).

The primary (PCW) and secondary (SCW) cell walls of mature cotton fibers differ significantly in structure and composition. The thin PCW (0.2-0.4 μm) contains <30% cellulose, while the thick SCW (8-10 μm) is composed of >94% cellulose (Meinert and Delmer, 1977). The cellulose microfibrils also vary with respect to the degree of polymerization, being <5,000 in PCW vs. ~14,000 in SCW (Marx-Figini 1966). Developmental programs regulate the temporal synthesis of PCW and SCW in fibers. Commencing on the day of anthesis (0 days post-anthesis [dpa]), the PCW is made over a period of ~21 days in rapid elongating fibers via a biased diffuse-growth mechanism that directs polarized growth (6,7). The rate and duration of cell expansion are dictated by developmental programs that coordinately regulate cell turgor and cell wall extension (7, 8, 9, 10). Our current model contends that expression of developmentally regulated fiber genes closely parallels the rate of expansion, reaching peak levels at ~12-13 dpa (7,10). The increase in fiber strength coincident with the termination of fiber elongation at ~21 dpa (11), presumably occurs as a result of cross-linking cellulose microfibrils and non-cellulosic matrices typical of dicot PCWs (12). Termination of fiber elongation is accompanied by loss of a major fraction (~36%) of abundant HMW noncellulosic polymers (13). A similar loss of xyloglucans is reported coincident with termination of auxin-induced expansion in pea hypocotyls (14). The transition from primary to secondary cell wall synthesis between ~16 and 21 dpa occurs during the final stage of fiber expansion. The switch in developmental programs is distinguished by a number of dynamic cellular and molecular changes, including the re-orientation of microtubules and cellulose microfibrils to steeply pitched helical arrays (16). Occasional Re-alignment of cellulose microfibrils produce reversals in patterns of layered SCW resulting in structural modifications that impart cotton fiber with its unique textile properties. Expression of CesA genes encoding the catalytic subunit of cellulose synthase dramatically increases in parallel with the rate of cellulose synthesis to reach peak levels at 24 dpa (4, 17). CesA genes therefore serve as convenient stage-specific markers for secondary cell wall synthesis in developing cotton fibers (7). Indeed, developing cotton fibers have been instrumental in providing novel insight into the mechanism of cellulose biosynthesis, and continue to offer a unique subject for studying fundamental cellular and biological processes in plants (7, 15, 17-20).

In multicellular eukaryotes, one of the nexta major challenges in functional genomics will be innovations in high-throughput expression profiling of individual cell-types (21), especially within the context of the complex cellular organization of intact tissues. Cotton fibers therefore offer a unique opportunity to query the transcriptome of a single-cell at discrete developmental stages that promise novel insight into fundamental biological processes in plants. One of our long term interests has been elucidating fiber gene function in a developmental context as a means for manipulating output traits in the genetic improvement of fiber properties (22). Yet at last count, fewer than 50 fiber genes have been isolated and partially characterized using traditional molecular approaches. Here we report on the genetic complexity of the cotton fiber transcriptome during rapid cell expansion based on more than 46,000 ESTs. Functional binning, in silico expression analysis and transcription profiling are all consistent with identifying very metabolically active cell-types requiring a plethora of cellular and biological processes to support to support the rapid elongation of developing cotton fibers. A unique view of the cellular dynamics that accompanies the developmental switch from primary to secondary cell wall synthesis was obtained from expression signatures derived from oligonucleotide microarrays fabricated from non-redundant fiber ESTs.

MATERIALS AND METHODS

Plant Material and RNA Isolation

Developing fibers harvested from greenhouse-grown cotton as described elsewhere (10) used established criteria to control for biological variability. PolyA RNA was purified from quality-controlled total RNA (23) using Promega’s PolyAttract kit to construct a fiber cDNA library from Gossypium arboreum L. cv. AKA 8401. For microarrays, 10 and 24 dpa fibers were harvested from plants (G. hirsutum L. cv. TM-1) grown in a randomized complete block design. Total RNA isolated using a modified hot borate procedure (24). LiCl-precipitated RNA suspended in 10 mM Tris/1mM EDTA buffer was purified using RNeasy mini-spin columns (Qiagen, Chatsworth, CA) as per manufacturer’s instructions, followed by ethanol precipitation. The concentration of total RNA was determined spectrophotometrically and quality-controlled by agarose gel electrophoresis. Plants were pooled into three groups to create biological pools. RNA was isolated independently at least two times from each pool to create technical replicates.

Cotton Fiber ESTs

A high-quality directional cotton fiber cDNA library (1.2 x 106 pfu/ml) containing <0.5% non-recombinant phage was constructed using Stratagene’s λZAP Express cDNA cloning kit. Following mass-excision of phagemids, kanamycin-resistant bacterial colonies (92,160) were arrayed in 384-well microtiter plates containing LB-glycerol freezing media and stored at –800C. Automated DNA sequencing of >50,000 cDNAs from the 5'- and 3’-termini using universal T3 and T7 primers was performed using the Big Dye Terminator sequencing kit and Applied Biosystems (ABI) 377 or 3700 automated sequencers with a success rate of >70%. The ESTs averaged 757 nucleotides (NT) in length, and an average high quality sequence score of 416 NT after removal of vector sequences. The minimum quality score for EST sequences released to GenBank () was 100 high quality [df2]NT.

The Ga (G. arboreum) Cotton Fiber dbEST ( consists of four discrete data sets, 1) Ga_Ea (Ea) cDNAs (12,767) randomly sequenced from the 5’-terminus, 2) Ga_Eb (Eb) sequences (13,613) obtained from the 5’-terminus following one round of normalization to remove redundant Ea sequences, 3) Ga_Ed (Ed) sequences (14,915) obtained from the 5’ and 3’ termini following a second round of normalization to remove redundant Ea and Eb sequences, and 4) Ga_Ec (Ec) sequences (3,026) - a subset of Ea cDNAs sequenced from the 3’ terminus. Normalization of the fiber cDNA library was performed by sequential hybridization of high-density filter arrays to remove the most redundant (Ea and Eb) gene sequences. Fiber cDNA clones (92,160) from the arrayed library were spotted in duplicate onto six high-density nylon membranes (18,432 clones per 22x22 cm2 filter) in 4x4 arrays as described (25). Radiolabeled probes were generated from heterogeneous gene pools (20-25 cDNAs/pool) containing equal amounts of DNA from purified PCR products of unrelated gene sequences. Following hybridization (26), scanned images of autoradiographs, imported as tif files, were divided into six fields per image for analysis. Output files containing total signal and background intensities were obtained for hybridized spots using ImaGene 4.2 software (BioDiscovery, Los Angeles). Software programs written for data quantification, analysis and automated identification of corresponding plate address for each gene were used to create files for robotic re-arraying of the cDNA library minus the redundant sequences before resuming random sequencing. Normalization probes were generated from a representative of 75 Ga_Ea gene clusters (6 ESTs/cluster) and 96 Ga_Eb gene clusters (7 ESTs/cluster) for the first and second rounds of normalization, respectively (Tables 1 and 2 published as supporting documentation on the PNAS web site,

Raw EST sequences files were imported from an FTP site for processing, annotation and analysis using XGITM, an automated pipeline for high-throughput sequencing projects that runs on the SybaseTM DBMS platform at the National Center for Genome Resources (NCGR) (ww.ncgr.org/xgi). Only high-quality processed ESTs that passed through filters in the Vector Screener stage of the pipeline were annotated using BLASTX against NCBI’s nonredundant protein database to generate the consensus sequences of the UCD Unigene (UG)/Non-redundant (NR) Fiber EST v2.0 (Dec 2002) data set (). Functional categories were assigned where possible using gene ontology annotation (27).

Cotton Oligonucleotide Microarrays

Oligonucleotides (oligoNT [70-mers]) were synthesized by Operon Technologies against 12,227 NR fiber ESTs, excluding consensus sequences 100 nucleotides in length, or showing 85% similarity to other genes. Cotton fiber microarrays were fabricated by spotting oligoNTs (40 μM) in 1X Array-It Spotting Buffer Plus (Telechem International, Sunnyvale, CA) in duplicate on superaldehyde slides (Telechem) formatted in 23 x 23 subarrays using the OmniGrid arrayer (Genomic Solutions) equipped with 16 (4x4) MicroQuill pins (Majer Precision Engineering, Tempe, AZ). Experimental controls (71) included internal, positive and negative controls, transgene and vector controls, calibration spike-in controls, ratio spike-in controls, blank and buffer controls interspersed among the cotton oligoNTs and replicated 2 times. Additional controls included cotton sequences deposited in GenBank, but not found in our fiber dbEST. Post-printing processing of slides to chemically cross-link oligomers was performed according to slide manufacturer’s instructions.

Hybridization probes were prepared using the aminoallyl labeling method as described (28) with few modifications. Total fiber RNA (20 μg) was spiked with 2 μl of test or reference mRNA mix (Lucidea Universal Scorecard, Amersham Pharmacia) prior to being reverse transcribed in the presence of aminoallyl-dUTP (Sigma). Following conjugation of Cy3- or Cy5-NHS esters (Amersham Pharmacia) to reverse-transcribed cDNA, unincorporated dye was removed from probes using QIAquick PCR Purification columns (Qiagen, Valencia, CA). Lyophilized probes were hybridized at 420C for 16-20 hr in humidified hybridization chambers (Telechem, Sunnyvale, CA) essentially as described elsewhere (29). Slides were scanned (10 m resolution) using an Affymetrix Array Scanner 428 from a total of 8 hybridizations, including 4 dye-swap treatments, producing 16 replicates for each fiber oligoNT. Self-hybridization controls were also performed. Signal intensities were quantified using ImaGene 4.2 software (BioDiscovery, Los Angeles). Visually flagged spots and spots with a background corrected intensity smaller than the average plus 2 standard deviations of corrected intensity for blank spots (N=274) were filtered. Normalization and analysis of microarray data were performed using GeneSpring 6.0 (SiliconGenetics, Redwood City, CA). Normalization of the array dataset was based on intensity-dependent Lowess curve fitting (f = 0.2) and median of background subtracted intensities from control RNA set spiked into the query RNA samples at a 1:1 ratio. Statistical analysis of microarray data was performed using GeneSpring 6.0 cross-gene error model based on replicates for 10 vs 24 dpa hybridizations and based on deviation from 1 for self-hybridizations. Significantly up or down regulated genes were filtered for expression ratios greater or smaller than 2 and 0.5, respectively, and for t-test p-value < 0.05.

Real time PCR

Expression analysis was performed to confirm microarray results using two-step quantitative real time RT-PCR (qPCR). A known amount of DNase-treated total cotton fiber RNA, spiked with non-plant RNA synthesized from a cloned human phosphomannomutase gene as an internal reference, was reverse transcribed using Invitrogen’s Superscript II RTase kit. RT-PCR reactions were tracked on an ABI 7000 instrument (Applied Biosystems) using the Quantitect SYBR Green Master Mix (Qiagen) Each sample was PCR-amplified using the same amount of cDNA template in triplicate reactions in at least two independent experiments. Gene-specific qPCR primer-pairs for the spiked control and 21 fiber genes designed with Primer Express software (Applied Biosystems) are provided in Table 6, published as supporting documentation on the PNAS web site ( Following an initial step in the thermal cycler for 15 min at 95°C, PCR amplification proceeded for 40 cycles of 15 s at 95°C and 1 min at 60°C, and completed by melting curve analysis to confirm specificity of PCR products. The baseline and threshold values were adjusted according to manufacturer’s instructions. Similar results were obtained from relative quantification of transcript abundance determined independently by the standard curve method described in Applied Biotechnology User Bulletin 2 (web page address).

RESULTS

A genomic approach, based on fiber ESTs and expression profiling, was used to characterize the cotton fiber transcriptome of cells undergoing rapid expansion. Three approaches were employed to maximize gene discovery; 1) Construction of a high-quality fiber cDNA library from a cultivated diploid species to minimize redundancy due to polyploidy, 2) Deep sampling of the cDNA library, and 3) Normalization of the cDNA library to identify rare gene transcripts by removing highly redundant gene sequences.

Cotton (Gossypium arboreum L.) Fiber ESTs

A high-quality cotton fiber cDNA library was constructed from a cultivar (AKA8401) of the diploid species G. arboreum L. (2n=26, A2A2). To obtain a stage-specific fiber library, developing fibers (7-10 dpa) were harvested during rapid elongation, but well before detection of CesA genes, which signal the onset of secondary cell wall synthesis (17). An average insert size of 1.7 kbp as determined by PCR amplification of randomly sampled clones, although cDNAs 3.0 kbp were not uncommon using other detection methods. Random sequencing of the arrayed cDNA library yielded 46,603 G. arboreum (Ga) cotton fiber ESTs in discrete data sets, before and after normalization. The four data sets of the Ga Fiber dbEST are; 1) Ea ESTs sequenced from the 5’-terminus before normalization and therefore suitable for in silico expression analysis, 2) Ec, a subset of Ea cDNAs sequenced from the 3’-terminus; 3) Eb ESTs [df3]obtained following one round of normalization to remove redundant Ea sequences; and 4) Ed ESTs sequenced from both 5’- and 3’-termini following a second round of normalization to remove the most redundant Ea and Eb sequences.

To demonstrate that gene discovery in the diploid species was highly successful in terms of being cost effective by minimizing redundancy, a BLAST search compared Ea fiber ESTs to fiber ESTs of similar developmental age from cultivated allotetraploid cotton (G. hirsutum L. cv. Maxxa; 2n=4x=52, AD genome). G. hirsutum (Gh) ESTs retrieved from GenBank were trimmed, filtered and assembled into clusters for comparative analysis using an equivalent number of Gh and Ga EST gene clusters. Comparative analysis, however, clearly revealed the negative impact of redundancy in the tetraploid due to ploidy. Ga_Ea fiber ESTs produced 61% novel gene sequences, or slightly more than twice the number of novel Gh genes.recovered.