Supplemental Information

H2A.B facilitates transcription elongation at methylated CpG loci

Yibin Chen1, Qiang Chen1, Richard C. McEachin2, James D. Cavalcoli2 and Xiaochun Yu1*

1Division of Molecular Medicine and Genetics, Department of Internal Medicine, 2Department of Computational Medicine and Bioinformatics,

University of Michigan Medical School, 1150 W. Medical Center Drive, 5560 MSRBII, Ann Arbor, Michigan, 48109, USA

*Corresponding author:

Phone: (734)615-4945; FAX: (734)647-7950; Email:

Running title: H2A.B and methylated CpG

Keywords: H2A.B; DNA methylation; Gene transcription


Supplemental Methods:

Data analysis of ChIP Sequencing (ChIP-seq) and MeDIP Sequencing (MeDIP-seq)

For ChIP-seq analysis, DNA libraries were analyzed by Illumina high-throughput sequencing. The read quality of each sample was determined by FastQC software. After pre-filtering the raw data by removing sequenced adapters and low quality reads, the tags were mapped to the mouse genome (assembly mm9) by Bowtie software. Parameter settings were as following: -v, 3 (reported alignments with at most 3 mismatches), -5, 3 and -3, 3 (trim3 bases from 5’ and 3’ end to remove low-quality bases).

For ChIP-seq of H2A.B, we received 12778821 sequence reads, 9266472 mapped reads and 8598508 uniquely mapped reads. For MeDIP of 5mC, we received 17861345 sequence reads, 12503839 mapped reads, and 8909258 uniquely mapped reads. The read length is 50 bp. Peak detection was performed using Model-based Analysis for ChIP-Seq (MACS) software from Galaxy browser (www.galaxy.psu.edu) (Zhang et al. 2008). Sequencing reads from irrelevant IgG in ChIP or MeDIP assays were used as negative control. The unique peaks obtained from ChIP-seq or MeDIP-seq were matched to the annotated reference genome (mouse mm9) using CisGenome 2.0 (Ji et al. 2008). For ChIP-seq and MeDIP-seq analyses, the reads were not limited to one read per chromosomal position, the total length of mapped reads were extended to the 3’ direction with a total length of 350 bases, which is estimate mean fragment length. The parameters used for peak calling in ChIP-seq and MeDIP-seq include tag size as 50 bp and band width as 350 bp; other parameters remained default. Genes not uniquely mapped to the genome were excluded. To avoid redundancy, only the longest transcript variant of each gene was used to define chromosomal locations of intragenic region, upstream region (10 kb upstream of TSS), distal intergenic region and downstream region (10 kb downstream of TES) in Figure 1B and Supplemental Figure S4A.

Because anti-H2A.B antibodies are rabbit poly-clonal antibodies, we used irrelevant rabbit IgG as the negative control for ChIP-seq assay. For the peak calling of H2A.B, MACS was performed using irrelevant rabbit IgG for background normalization. For the peak calling of 5mC, MACS was performed using irrelevant mouse IgG for background normalization because anti-5mC antibody is a mouse monoclonal antibody. To examine the quality of ChIP-seq and MeDIP-seq, we plotted the no. regions/fold enrichment in Supplemental Figure S2. To exclude biases in H2A.B enriched regions due to gene size, we compared and found that H2A.B enriched genes are not associated with gene size because R2 of H2A.B is less than 0.5 (Supplemental Figure S3).

To examine the distribution of H2A.B, the whole genome was partitioned into 4 regions: intragenic region (including 5’UTR, coding exon, intron, and 3’UTR); upstream region (10 kb upstream of TSS, including most transcription promoters and other transcriptional elements), distal intergenic region that does not encoding any genes, and downstream region (10 kb downstream of TES) (Figure 1B). We examined 5mC, methylated CpG and total CpG dinucleotide in these regions of genome, in which the patterns of the localization of 5mC, methylated CpG and total CpG dinucleotide are similar to that of H2A.B (Supplemental Figure S4A). The H2A.B and 5mC enriched regions are obtained from ChIP-seq and MeDIP-seq. The information of methylated CpG in mouse ES cells is obtained from NCBI database (GSM278905). Each CpG with more than 75 % methylation is considered as methylated CpG (Meissner et al. 2008). The localization of total CpG dinucleotide is from mm9 database.

To examine the enrichment of H2A.B in intragenic regions, we show the peaks of H2A.B in a section of Chromosome 2 (Figure 1C). The peaks of H2A.B are called with ChIP-seq of irrelevant IgG as mentioned above.

For gene body plots in Figure 1D, 50 non-overlapping windows with average tag number per base were calculated for each gene. 10 kb upstream of TSS and 10 kb downstream of TES was divided into of windows of 1 kb. For gene body plots in Figure 1F, between TSS and TES, each gene was divided into 50 windows of equal size and the average counts per 200 bp were calculated. We first normalized the enrichment of H2A.B and 5mC with control IgG. For comparing the localization of H2A.B and 5mC in gene body region, we set the relative enrichment of H2A.B and 5mC at -10 kb from TSS as “1”.

In Supplemental Figure S4B and S8A, we examined the overlapped regions between H2A.B and 5mC, H2A.B and 5hmC. The overlapped region was examined by Homer software. The 5hmC enriched region is from Gene Expression Omnibus database under accession number GSE40810 (Tan et al. 2013). In Figure 1E, we examined the overlapped gene body regions of H2A.B and 5mC. We clustered the distribution of H2A.B into 6 groups based on its localization in gene body. In each gene body, we examined whether 5mC localizes close to H2A.B. In Figure 1F, we summarized the results from 6 different cluster analyses in Figure 1E. The relative accumulation of tags around indicated region in Figure 1F was performed using seqMINER software (Ye et al. 2011). H2A.B-enriched regions were used as the reference. Tag densities from each ChIP (MeDIP)-seq were collected within a window of 10 kb around the reference coordinates. The tag density of H2A.B (y axis) in 200 bp window was normalized to “1” at the peak region. The tag density of 5mC in 200 bp window was normalized with the relative tag density of H2A.B.

At each CpG site, methylation sometimes occurs and sometimes does not occur. Thus, there is a methylation level of each CpG site. In Supplemental Figure S4C, we clustered all the gene body methylated CpG into 5 groups based on the methylation level of each CpG site (5-25%, 25-45%, 45-65%, 65-85% or > 85% methylation). The higher methylation level suggests that this CpG is more likely to be methylated. We plotted different groups of methylated CpG in H2A.B bound and unbound gene body regions.

In Supplemental Figure S4D, densities of methylated CpG per 1 kb in H2A.B bound and unbound regions were plotted. Methylated CpG is obtained from NCBI database (GSM278905). Each CpG with more than 75 % methylation is considered as methylated CpG (Meissner et al. 2008). And the regions with more than one read are counted.

H2A.B-bound genes were defined when gene body region overlapped with H2A.B peaks. The H2A.B-bound genes were shown in Supplemental Table S3. To calculate 5mC status for H2A.B bound or H2A.B unbound genes in Supplemental Figure S5B, the mean reads of the bins (5 kb) within gene body of indicated clusters were calculated. All statistics and plotting were performed using the statistical program R (Sakharkar et al. 2004).


Supplemental reference:

Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH. 2008. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol 26(11): 1293-1300.

Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB et al. 2008. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454(7205): 766-770.

Sakharkar MK, Chow VT, Kangueane P. 2004. Distributions of exons and introns in the human genome. In Silico Biol 4(4): 387-393.

Tan L, Xiong L, Xu W, Wu F, Huang N, Xu Y, Kong L, Zheng L, Schwartz L, Shi Y et al. 2013. Genome-wide comparison of DNA hydroxymethylation in mouse embryonic stem cells and neural progenitor cells by a new comparative hMeDIP-seq method. Nucleic Acids Res 41(7): e84.

Ye T, Krebs AR, Choukrallah MA, Keime C, Plewniak F, Davidson I, Tora L. 2011. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res 39(6): e35.

Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W et al. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9(9): R137.


Supplemental Figure S1. H2A.B antibodies specific recognize mouse H2A.B.

(A) Sequence alignment of H2A.B1 and H2A.B2. Highlighted regions are used for the antibodies generation. (B) H2A.B antibodies recognize exogenous H2A.B1 and H2A.B2 from MEFs lysates. Anti-H2A.B1 and anti-H2A.B2 recognize H2A.B1 and H2A.B2 respectively. Anti-H2A.B recognizes both H2A.B1 and H2A.B2. (C) H2A.B antibodies recognize endogenous H2A.B from mouse ES cell lysates. Chromatin fractions are subjected to immunoprecipitation and Western blotting with indicated antibodies. (D) shRNA of H2A.B down-regulates the mRNA (left, by RT-PCR) and protein (right, by Western blot) levels of H2A.B in mouse ES cells. RT-qPCR results are shown as mean +/- SEM (n = 3). The protein level of H2A.B in the chromatin fraction is examined by Western blotting with anti-H2A.B antibody. Histone H4 is used as protein loading control. (E) mRNA levels of H2A.B in different mouse tissues. RT-qPCR results are shown as mean +/- SEM (n = 3). GAPDH is used as control. (F) The protein expression level of H2A.B is shown in mouse ES cell line, male and female brain tissues. Chromatin fractions from mouse cell lines and tissues are subjected to Western blotting with anti-H2A.B1 and H2A.B2 antibodies. Histone H4 is used as protein loading control.


Supplemental Figure S2 Statistical summary of ChIP sequencing of H2A.B and MeDIP sequencing of 5mC

Distribution of peaks of H2A.B (A) and 5mC (B). y axis: log2 (fold change).


Supplemental Figure S3. Distribution of H2A.B is not associated with gene length.

Scatter plots show no correlation of gene length with tag densities of ChIP-seq of H2A.B. X-axis: gene length of H2A.B enriched genes; y-axes: normalized tag density of H2A.B and control IgG (Reads per million mapped reads per gene).


Supplemental Figure S4. H2A.B is associated with 5mC in gene body regions.

(A) Genomic distribution of 5mC, methylated CpG and total CpG dinucleotide are shown in pie charts. (B) Venn diagram shows overlapping regions between H2A.B and 5mC. (C) The pie charts show the distribution of methylated CpG in the H2A.B bound or unbound gene body region. The methylated CpG is classified into 5 groups with different methylation level (5-25%, 25-45%, 45-65%, 65-85% and >85%). (D) Boxplot shows methylated CpG in H2A.B bound and unbound gene body regions. Y-axis: the density of methylated CpG per 1 kb. ** P < 0.01 (t-test)


Supplemental Figure S5. H2A.B regulates gene transcription.

(A) The relative mRNA level of genes. Five H2A.B bound genes are randomly selected from the down-regulated gene group. The expression of these genes was examined by qPCR in the ES cells treated with control or H2A.B shRNA. (B) Boxplot shows the 5mC level in each cluster. The mean reads of 5mC in the bins (5 kb) within gene body are shown in the y-axis. * P < 0.05 (t-test); ** P < 0.01 (t-test).

Supplemental Figure S6. Loss of H2A.B does not affect the expression of genes surround Kcnq1.

(A) shRNA treatment down-regulates the mRNA (left, by RT-PCR) and protein (right, by Western blot) level of H2A.B in hybrid ES cells. RT-PCR results are shown as mean +/- SEM (n = 3). (B) Knockdown of H2A.B does not suppress the transcription of adjacent genes Ascl2 and Cdkn1c. Data are presented as mean +/- SEM (n = 4). (C) Allele-specific expression of gene Ascl2 is determined by different SfcI digestion pattern based on the SNP between mus and cas. (D) Allele-specific expression of gene Cdkn1c is determined by different TaqI digestion pattern based on the SNP between mus and cas. (E) Allele-specific expression of imprinted Kcnq1ot1 is determined by different TaqI digestion pattern based on the SNP between mus and cas.


Supplemental Figure S7. RA treatment induces differentiation in hybrid ES cells.

(A) A RA induced ES cell differentiation model. (B) Relative mRNA levels of Pou5f1 and Fgf5 are examined by qPCR to confirm ES cell differentiation. Following with or without RA treatment, qPCR is performed in ES cells treated with control shRNA or H2A.B shRNA(shH2A.B). Data are presented as mean +/- SEM (n = 4). (C) mRNA of Igf2r and Airn are dramatically increased during ES cell differentiation. qPCR is performed in hybrid ES cells. RT-qPCR results are shown as mean +/- SEM (n = 4). (D) Knockdown of H2A.B does not suppress the transcription of adjacent genes Slc22a2. Data are presented as mean +/- SEM (n = 4).


Supplemental Figure S8. Correlation of H2A.B with 5hmC.

(A) Venn diagram shows the overlapped H2A.B and 5hmC enriched regions. (B) 5hmC is not enriched at the DMR of Kcnq1. MeDIP or hMeDIP are performed to probe the indicated regions. Relative level of 5mC and 5hmC was calculated by comparing with an irrelevant IgG (negative control). Data are presented as mean +/- SEM (n = 4).


Supplemental Figure S9. H2A.B facilitated the transcription of Stau2

(A) Profiles of H2A.B ChIP-seq and MeDIP-seq at Stau2 locus are shown. The x-axes indicate the genomic region. The y-axes represent the fold enrichment of H2A.B and 5mC compared with control IgG. (B) ChIP analyses show that H2A.B is enriched in the Stau2 gene. (C) MeDIP analyses show that 5mC is enriched in the Stau2 gene too. (D) Down-regulation of H2A.B by shRNA does not affect the DNA methylation status at Stau2 locus. (E) Knockdown of H2A.B suppresses the transcription of Stau2. RT-qPCR is performed. (F) Down-regulation of H2A.B induces the arrest of Pol II at the region C of the Stau2 gene. ChIP analyses in the Stau2 gene are performed using anti-RNA Pol II antibody. Data are presented as mean +/- SEM (n=4).