Supporting Information

Figure S1. Chromatin immunoprecipitation of selected CRX-bound regions performed on Crx-/- retinas

Immunoprecipitation of wild-type but not Crx-/- chromatin with an anti-CRX antibody shows enrichment for Rho-CBR3 and Samd7-CBR2. Fold enrichment was calculated relative to IgG controls. Error bars represent the standard deviation for two biological replicates each of wild-type and Crx-/- retinas. *** indicates P ≤ 0.0003, unpaired Student's t-test.

Figure S2. Examples of CRX-bound regions around photoreceptor gene loci

(A-D) Examples of CBRs around genes encoding components of the phototransduction cascade or photoreceptor structural proteins. CBRs are highlighted in light red.

Figure S3. Computationally predicted photoreceptor cis-regulatory elements correspond to CRX-bound regions

(A) Graph of average photoreceptor cis-regulatory element-finding algorithm scores across a 10 Kb region centered on the 100 wild-type CBRs most heavily bound by CRX (as indicated by the number of assigned sequence reads). Details regarding this algorithm were described previously (Hsiau et al. 2007). The threshold for 'calling' a photoreceptor cis-element in that study was a score ≥ 200. (B) Correlation between CBRs at the Rho locus and the predictions of the photoreceptor cis-regulatory element-finding algorithm. Although there is a reasonable correlation between the algorithm predictions and the location of CBRs, both false-negative (red arrows) and false-positive (purple arrows) predictions are evident. 'cis-element predictions' indicates the predictions of the photoreceptor cis-regulatory element-finding algorithm. The black line indicates the threshold for 'calling' a cis-regulatory element that was used in our prior study (Hsiau et al. 2007). (C-E) Examples of the correlation between algorithm predictions and CBRs at three additional loci analyzed in the present study.

Figure S4. Additional CRX-bound regions tested for cis-regulatory activity

(A-H) Flatmount and cross-sectional images of retinas electroporated with the indicated constructs at P0 and harvested at P8. Genomic regions tested by electroporation are highlighted in light red.

Figure S5. ChIP-qPCR and luciferase assays on selected CRX-bound regions

(A) Graph showing the results of ChIP-qPCR for the indicated CBRs. Values represent the mean ± standard deviation of three replicates. (B, C) Graph of the indicated CBR-luciferase fusion construct assayed by transfection into HEK cells along with either a vector expressing Crx (pcDNA-Crx) or a mock control vector (pcDNA-V5).

Figure S6. CRX sites interact in a spacing- and orientation-dependent manner

(A) The number of occurrences of tandem pairs of CRX sites within replicated CBRs (blue curve) and a set of control regions (red curve) as a function of intersite distance. The blue arrow denotes a peak between 8 and 11 bp apart. Note that in this and all subsequent panels of this figure, red triangles represent CRX binding sites, with the orientation of the triangle reflecting the orientation of the site. In addition, the height of the triangle reflects the relative affinity of the binding site. (B) The number of occurrences of head-to-head pairs of CRX sites within replicated CBRs (blue curve) and a set of control regions (red curve) as a function of intersite distance. The blue arrow highlights a solitary peak of increased motif frequency at 32 bp apart (also present in the control dataset) which represents two putative CRX sites within SINE repeats which are widespread in the mouse genome (Kazazian 2004). Intersite spacing for sites in the opposite orientation is measured from the fourth nucleotide of the CRX binding site: CTAATCCC. (C) The number of occurrences of tail-to-tail pairs of CRX sites within replicated CBRs (blue curve) and a set of control regions (red curve) as a function of intersite distance. The two blue arrows highlight two isolated peaks at 11 and 29 bp apart which were not replicated in the 'singlehit' CBR dataset (see Methods). (D) The minimal 'Rho-basal' promoter fails to drive expression in electroporated retinas. A short region from the proximal promoter region of bovine Rho (referred to in this study as 'Rho-basal') was fused to DsRed and co-electroporated into P0 mouse retinas along with CAG-eGFP. This construct contains a single medium-affinity CRX site ('Crx-1'). Retinas were imaged after eight days in explants culture. (E) A synthetic construct containing a second high-affinity CRX site ('Crx-2') 32 bp (about three helical turns) upstream of 'Crx-1' and in the same orientation, also fails to drive expression in photoreceptors when assayed in the same manner. (F) Addition of another high-affinity CRX site ('Crx-3') 10 bp (about one helical turn) upstream of 'Crx-2' and in the same orientation results in robust photoreceptor-specific expression. (G) A similar construct in which a high-affinity NRL site replaces 'Crx-3' can also drive photoreceptor-specific expression. (H) Model of two homeodomains bound to adjacent, tandemly oriented sites spaced 10 bp apart (as measured from the corresponding nucleotide in both binding sites). Note that both molecules are predicted to bind the same face of the double helix. This model is based on the NMR structure of the bicoid homeodomain bound to its target site (Baird-Titus et al. 2006). The bicoid homeodomain is very similar to that of CRX, and the two factors have very similar DNA-binding preferences. It is therefore likely that the structure of the CRX homeodomain bound to DNA would be similar. (I) Quantitative effects on promoter activity of increasing spacing between tandemly oriented Crx-2 and Crx-3 sites by half-helical turns. The indicated constructs were co-electroporated into P0 mouse retinas along with a CAG-eGFP loading control and imaged after eight days in explants culture. All values are normalized to the activity of the 10 bp construct which is set equal to 100. Values represent the mean ± standard deviation of three replicate electroporations. (J) Model of two homeodomains bound to adjacent sites spaced 9 bp apart in a head-to-head orientation. This model represents the actual NMR structure of two bicoid homeodomains bound to a fragment of DNA containing two sites spaced 9 bp apart and in a head-to-head configuration (Baird-Titus et al. 2006). Note that this spacing places the two bound factors on opposite sides of the double helix. The two factors are identical but are colored differently to highlight their opposite orientation. As above, intersite spacing for sites in the opposite orientation is measured from the fourth nucleotide of the CRX binding site. In terms of absolute spacing between sites, 9 bp in the head-to-head orientation is roughly equivalent to 10 bp in the tandem orientation. (K) Quantitative effects on promoter activity of increasing spacing between head-to-head oriented Crx-2 and Crx-3 sites by half-helical turns. Constructs were evaluated as described in (I). All values were normalized to the 10 bp construct in the tandem orientation which was set equal to 100. Values represent the mean ± standard deviation of three replicate electroporations.

Figure S7. CRX is a key regulator of the rod determination pathway

(A) The photoreceptor transcription factor hierarchy regulating the rod determination pathway. (B) CRX directly autoregulates its own expression. A total of six CBRs were identified around the Crx locus. (C) A composite cis-regulatory element containing Crx-CBR6 upstream of Crx-CBR4 drives strong expression in both photoreceptors and a subset of bipolar cells in the INL. The indicated construct was co-electroporated at P0 along with CAG-eGFP and imaged after eight days. (D) The cis-regulatory region containing Nrl-CBRs 1,2 and 3 drives rod-specific expression. This image was published previously (Hsiau et al. 2007) and is reproduced here for comparison. (E) Multiple CBRs were identified around an alternative transcription start site of Rorb which encodes an isoform specific to retina and pineal gland (Andre et al. 1998). Rorb-CBR3 is phylogenetically conserved and present in both wild-type and Nrl-/- retinas, whereas Rorb-CBRs 1 and 2 are present only in wild-type. (F, G) Multiple, conserved CBRs were also found around the Nrl (F) and Nr2e3 (G) loci.

Table S1. CRX ChIP-seq data summary for wild-type retinas

This table includes summary data for both wild-type ChIP-seq replicates as well as an IgG control.

Table S2. Chromosomal coordinates for all CRX-bound regions from wild-type and Nrl-/- retinas

This Excel file contains two worksheets, 'wild-type CBRs' and 'Nrl mutant CBRs', which include the chromosomal coordinates of all CBRs derived from CRX ChIP-seq analyses of wild-type and Nrl-/- retinas, respectively. 'Number of sequence reads' indicates the total number of reads which correspond to the indicated region in both replicates. All CBRs in both lists are ranked according to the number of corresponding sequence reads. 'CBR rank identifier' indicates a unique identifier assigned to each of the CBRs.

Table S3. UCSC custom track including CRX-bound regions for both wild-type and Nrl-/- CRX ChIP-seq replicates

This text file includes four custom tracks displaying all CBRs from both wild-type and Nrl-/- CRX ChIP-seq replicates. In order to view these custom tracks on the UCSC Genome Browser, go to the custom track upload page (http://genome.ucsc.edu/cgi-bin/hgCustom), browse to find Table S3 and click 'submit'.

Table S4. Assignment of CRX-bound regions to mouse genes

This table is a list of 27735 mouse genes along with the wild-type and Nrl-/- CBRs assigned to them. CBRs were assigned to genes in an automated fashion as described in Methods. The chromosomal coordinates and strand of each gene is indicated. 'wild-type CBRs assigned to gene' indicates the CBR rank identifier(s) (if any) assigned to the indicated gene. 'CBR rank identifier' is described in Table S2. 'Number of wild-type sequence reads assigned to gene' indicates the total number of sequence reads corresponding to all wild-type CBRs assigned to that gene. CBR and sequence read assignments derived from Nrl-/- retinas are given. Whether or not a gene was dysregulated in Crx-/-, Nrl-/- or Nr2e3-/- retinas is indicated. Affymetrix features corresponding to individual genes are also shown.

Table S5. Annotation of previously characterized photoreceptor-specific cis-regulatory elements

This table includes information on 33 previously characterized photoreceptor cis-regulatory elements including the location of the cis-element relative to the gene it controls (e.g., '1 Kb 5' ' indicates that the cis-regulatory elements resides within the first kilobase upstream of the transcription start site of the gene). 'Evidence' indicates the type of experimental evidence provided in support of the cis-regulatory activity of the indicated cis-element. 'CBR in wild-type' and 'CBR in Nrl-/-' indicate whether the identified cis-regulatory region overlaps with a CBR in either wild-type or Nrl-/- retinas and, if so, whether that CBR was a 'doublehit' or 'singlehit' (as defined in Methods).

Table S6. Sequences of CRX-bound regions tested for cis-regulatory activity

This table includes the full DNA sequence of the 27 CBRs evaluated for their ability to drive expression in electroporated mouse retinas along with the sequences of the PCR primers used to obtain them. Restriction enzyme sites included in the PCR primers for subsequent cloning into the indicated cloning vectors are indicated in lower case. 'Expression in electroporated retinas' indicates whether or not there was detectable expression in the retinas eight days after electroporation at P0.

Table S7. Overrepresented pairs of transcription factor binding sites within CRX-bound regions

This table contains the 20 transcription factor binding site pairs with the highest Z-scores as determined using Genomatix RegionMiner (Ho Sui et al. 2005). The first member of each pair is always a 'CRX' binding site (indicated by the module identifier, 'V$BCDF') and the second member is the motif found in association with the CRX site. A module is a family of similar DNA sequence motifs that represent binding sites for a closely related sub-family of transcription factors. The second module identifier within a pair is defined under 'Module definition'. Example sequence logos for each of the twenty modules are given in a column under the table. 'Second motif class' indicates the class to which we have assigned each of the 20 modules. Also included in this table are a variety of statistics related to the motifs identified including their Z-scores (Ho Sui et al. 2005).

Table S8. CRX ChIP-seq data summary for Nrl-/- retinas

This table includes summary data for both Nrl-/- ChIP-seq replicates as well as two IgG controls.

Table S9. Association of CRX-bound regions with mouse orthologs of cloned human retinal disease genes

This table is a list of mouse orthologs of 125 cloned human retinal disease genes. 'CBR-associated' indicates whether a given mouse ortholog has at least one CBR assigned to it. 'Expression' indicates the expression pattern as indicated by BioGPS (http://biogps.gnf.org/). 'OMIM' is the identifier for the disease in the Online Mendelian Inheritance in Man database (http://www.ncbi.nlm.nih.gov/omim/). 'Mode' indicates the mode of inheritance: 'ar' = autosomal recessive; 'ad' = autosomal dominant; 'X' = X-linked; '?' = mode of inheritance undetermined. 'Locus' indicates the human chromosomal region where the gene resides.

Table S10. Mapping of CRX-regulated genes within uncloned human retinal disease regions

The Excel file contains two worksheets. The summary table lists the total number of all known human genes in the 31 mapped but uncloned retinal disease intervals. It also shows the number of human disease candidate genes with mouse orthologs that have assigned CBRs and the percentage of all genes in the interval with such mouse orthologs.

The second table lists all gene candidates within the 31 mapped but uncloned retinal disease loci that have mouse orthologs with associated CBRs. The candidates are ranked according to the number of sequence reads within the CBRs assigned to that gene in wild-type retinas. 'Locus' indicates the retinal disease locus corresponding to RetNet abbreviations (Retinal Information Network, http://www.sph.uth.tmc.edu/retnet). The human ENSEMBL ID, the official HUGO gene symbol, the chromosome number and the genomic coordinates based on genome annotation hg19 are shown. The mouse ENSEMBL ID, the official MGI mouse gene symbol, the chromosome number and the genomic coordinates based on genome annotation mm9, and the total number of CRX-ChIP Seq reads from wild-type and Nrl-/- retinas are shown for the mouse orthologs.

Table S11. Peak-calling results of QuEST algorithm

This file contains the peak-calling results of the QuEST algorithm derived from the raw sequence reads of the two wild-type CRX ChIP-seq replicates and the IgG control. The file is in 'bed' format and can be directly uploaded onto the UCSC Genome Browser (see legend for Table S3 for instructions).

Table S12. Primer sequences used in this study

Primer sequences used for a variety of assays in the present study. Primers used to obtain CBRs for testing cis-regulatory activity are given in Table S6.