HereditySupplementary Information

Article title: Conserved noncoding sequences conserve biological networks and influence genome evolution

Authors: JianboXie, KechengQian, Jingna Si, Liang Xiao, Dong Ci, and Deqiang Zhang

The following Supporting Information is available for this article:

Fig. S1 The GC content distribution of conserved noncoding sequence and other regions of the Populusgenome.

Fig. S2 Distribution of CNSs over distinct annotated genomic regions.

Fig. S3 Density of CNSs across the first 2kb upstream.

Fig. S4. CNSs are associated with more tissue expression specificity.

Fig. S5 Forty co-expression modules were determined by applying K-means clustering with Euclidean distance as the distance metric.

Fig. S6 The global pattern of PopulusDNA methylomes under water-stressed condition.

Fig. S7MAF (minor allele frequencies) are lower for SNPs within CNSs.

TableS1-S19 were provided in other formats (Excel), which were submitted as separate files.

Fig.S1. The GC content distribution of conserved noncoding sequences located in different regions (upstream, intron, 5′ UTRs, introns, 3′ UTRs and downstream) and other regions (Gene regions and intergenic regions) of the Populus genome.CNSs positions inPopuluswere downloaded for this investigation (Veldeet al, 2016).

Fig.S2. Distribution of CNSs over distinct annotated genomic regions.CNSs positions in ten dicot plants were downloaded for this investigation (Veldeet al, 2016).

FigureS3. Density of CNSs across the first 2kb upstream.The position 0 correspond to the transcription start site (TSS).

Figue S4. CNSs are associated with more tissue expression specificity. (a) Genes with more CNSs tend to be higher expressed. The error bars show the 95% CI. (b) Genes with more CNSs tend to have higher tissue specificity. (c) The tissue specificity are positively correlated with the diversity of motifs. Average tissue specificity value in each co-expression modules was used.

Fig.S5. Forty co-expression modules were determined by applying K-means clustering with Euclidean distance as the distance metric.

Fig.S6. The global pattern of PopulusDNA methylomes under water-stressed condition. (a) Distribution of CG, CHG, and CHH methylation levels (mC/total C*100%) in each sequence context of gene-related regions, including upstream, UTR, exon, intron, and downstream. Error bars indicate 95% confidence intervals generated by 1000 bootstrap replicates. The ‘*’ above each column indicates P < 0.001. Statistical inference is conducted with a permutation test on the mean (perm.test in the R package exactRankTests). (b-f) The average methylation level at CNS sites and their flanking regions in distinct sequence contexts. (g) The average methylation level of random selected regions. Average relative DNA methylation levels in each CNSs region were split into four bins and their flanking regions were divided into 12 equally sized bins of 100-bp windows. Purple, green, and red indicate the distribution of CpG, CHG, and CHH methylation levels, respectively. (h) The fold change inmethylation level of CNSs surrounding the 910 differential expressed genes (up: 521, down:389, P < 0.05) between the two conditionsin each sequence context. Error bars indicate 95% confidence intervals generated by 1000 bootstrap replicates. The ‘*’ above each column indicates P < 0.001. The methylation data from the previous study (Liang et al, 2014)was reanalyzed for this experiment.

Fig.S7. MAF (minor allele frequencies) ofSNPs within CNSs. Shown are the distribution of MAF for all polymorphic SNPs by frequency bins of width 0.1. MAF were determined among four naturally occurring populations (Columbia, Tahoe, WA/BC, and Willamette) of P. trichocarpa. The resequencing data from a previous study (Evans et al, 2014)was reanalysed for the experiment.