Supplementary tables, figure legendsandmaterials.

Evolution of general transcription factors

K.V. Gunbin1 and A. Ruvinsky2

1 Institute of Cytology and Genetics of Russian Academy of Sciences, Novosibirsk-90, Russia

2 University of New England, Armidale, NSW 2351, Australia

Table 1S. Aligned exon-intron structures of GTF2I genes from several vertebrate species (Ensembl data). Lengths of exons for each species are shown on the top lines and intron phases on the bottom lines. The following plausible changes (underlined) were introduced in the exon structures: O. anatinus GTF2IRD1: 109 and GTF2I: 63; X. tropicalisGTF2IRD1: 63 and 205; L. africanaGTF2IRG2: 181. Full structure of GTF2IRD1 for M. domestica was not available. Evolutionary changes or incorrect predictions are the likely explanations for imperfect alignments of some exons. GTF2IRD1 is found in bony fishes and tetrapods; GTF2IRD2 was not found in bony fish and GTF2IRD2 was found only in eutherian mammals. Blue highlights all GTF2I repeats except GTF2IRD1 Ra5 repeats, which are dark blue.

GTF2IRD1

Homo_sapiens 129 142 156 184 311 90 84 184 26 109 38 81 90 48 66 184 50 57 84 184 29 78 193 38 137 226

ENSG00000006704 0 1 1 2 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 2 1 1 2 1 0

Bos_taurus 123 142 156 184 311 90 84 184 26 109 38 81 90 48 66 184 50 57 84 184 29 5exons 78193 38 140 219

ENSBTAG00000006212 0 1 1 2 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 2 1 1 2 1 0

Loxodonta africana 123 142 156 184 305 90 84 184 26 109 38 81 90 7 30 10 26 21 5 30 41 146 50 57 84 184 29 78 193 38 57 44 69

ENSLAFT00000006508 0 1 1 2 1 1 1 2 1 2 1 1 1 2 2 0 2 2 1 1 0 2 1 1 1 2 1 1 2 1 1 0

Ornithorhynchus_anatinus 145 147 184 335 81 84 18426 109 34 88 90 54 66 127 87 80 207 93 193 46

ENSOANG00000006511 1 1 2 1 1 1 2 1 2 0 1 1 1 1 1 2 1 1 1 2

Gallus_gallus 123 146 12 131 184 311 90 84 184 26 121 38 81 90 60 77 173 59 60 84 184 29 78 193 38 137 164

ENSGALG00000001263 0 2 2 1 2 1 1 1 2 1 2 1 1 1 1 0 2 1 1 1 2 1 1 2 1 0

Xenopus_tropicalis 126 142 147 184 311 90 84 18484 57 38 81 14 70 54 66184 59 63 84 184 29 78 205 38 +4exons

ENSXETT00000065540 0 1 1 2 1 1 1 2 2 2 1 1 0 1 1 1 2 1 1 1 2 1 1 2 1

Danio_rerio 126 142 183184 296 96 84 184 26 124 4481 87 51 6318462 66 268 29 48 78 193 38 116 2109

ENSDARG00000022203 0 1 1 2 1 1 1 2 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 1 0

Takifugu_rubripes 45 186 123 19 232 317 87 84 184 26 130 116 126 90 184 62 72 268 29 78 193 38 110 69

ENSTRUG00000010068 0 0 0 1 2 1 1 1 2 1 2 1 1 1 2 1 1 2 1 1 2 1 0

Tetraodon_nigroviridis 46 145 8 129 184 311 90 84 184 26 116 103 102 24 51 21 184 62 89 209 29 78 193 38 107 69

ENSTNIG00000004695 1 2 1 1 2 1 1 1 2 1 0 1 1 1 1 1 2 1 0 2 1 1 2 1 0

GTF2I

Homo_sapiens 104 139 135 184 29 55 44 78 60 57 63 111 66 184 59 72 184 59 72 184 59 75 102 66 184 56 81 84 184 29 42 42 76

ENSG00000077809 0 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1

Bos_taurus 104 139 135 184 29 55 44 78 60 57 63 111 66 184 59 72 184 59 72 184 59 75 102 66 184 59 81 84 184 29 42 42 76

ENSBTAG00000009780 0 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1

Loxodonta africana 99 139 135 184 29 55 44 78 60 57 63 111 66 184 59 72 184 59 72 184 59 75 102 72 184 59 81 84 184 29 42 42 59

ENSLAFT00000003649 0 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1

Monodelphis_domestica 99 139 135 184 34 26 56 78 60 57 63 99 66 184 59 72 184 59 72 184 59 75 99 66 184 53 78 84 184 29 42 42 56

ENSMODG00000007022 0 1 1 2 0 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1

Ornithorhynchus_anatinus 99 139 135 184 29 55 41 78 63 57 63 99 66 184 59 78 193 59 72 184 59 75 102 66 184 53 78 84 184 31

ENSOANG00000012919 0 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2

Gallus_gallus 106 139 135 184 29 49 44 78 60 63 102 66 184 59 72 184 59 72 184 59 84 96 66 184 53 111 84 184 29 69 15 76

ENSGALG00000021665 0 1 1 2 1 2 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1

Xenopus_tropicalis 101 139 123 18429 49 12 44 78 59 55 63 96 66 18456 72 184 59 72 184 59 63 18 93 66 184 53 3 78 84 184 36 40 43 76

ENSXETG00000018435 0 1 1 2 1 2 2 1 1 0 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 2 2 0 1

GTF2IRD2

Homo_sapiens 104 139 120 184 29 55 44 78 57 66 78 84 184 29 2122

ENSG00000196275 0 1 1 2 1 2 1 1 1 1 1 1 2 1

Bos_taurus 104 139 126 184 29 55 44 78 57 63 78 84 184 29 1814

ENSBTAG00000026286 0 1 1 2 1 2 1 1 1 1 1 1 2 1

Loxodonta africana 182 29 55 44 78 57 66 78 84 181 29 635

ENSLAFG00000013081 2 1 2 1 1 1 1 1 1 2 1

Table 2S.GTF2I homologous sequences fromGTF2IandGTF2IRD1 found in cartilaginous fishes.

GTF2I / GTF2IRD1
GTF2I repeats from high vertebrates / Presence of homologous sequence in cartilaginous fishes / GTF2I repeats from high vertebrates / Presence of homologous sequence in cartilaginous fishes
Rb1 / +/-* / Ra1 / +/-*
Rb2 / + / Ra2 / -
Rb3 / + / Ra3 / +
Rb4 / + / Ra4 / +
Rb5 / + / Ra5 / +
Rb6 / +

* - significant similarity to Rb1 GTF2I and Ra1 GTF2IRD1.

Figure Legends

Figure 1S. Phylogeny of GTF2, GTF2IRD1 and GTF2IRD2 proteins reconstructed using program PhyloBayes with different matrixes of relative rates of amino acid substitutions (JTT, LG, WAG) as well as with application of CAT-model. Light gray represent branches leading to GTF2IRD1 proteins; gray – GTF2IRD2 proteins and black – GTF2I proteins

Figure 2S. Phylogenetic trees for intermediate GTF2I repeats. Inner branches for atypical amino acid substitutions with statistical significance p≤0.01 are bold.

Figure 3S. Phylogenetic trees for GTF2I repeats found only in GTF2IRD1. Inner branches for carrying atypical amino acid substitutions with statistical significance p≤0.01 are bold.

Figure 4S. 12 alternative binary tree topologies constructed on the basis of phylogenetic data. These topologies correspond to Figure 5A with restriction on GTF2I Rb6 - GTF2IRD1 Ra4 and GTF2I Rb1 - GTF2IRD1 Ra1 clades to be ancestral (in agreement with data on Figures 3 and 4).

Supplementary materials

List of species used for collecting the data presented in the paper

GTF2IRD1 proteins were collected from numerous species. Eutherian mammals: Loxodonta africana, Bos taurus, Equus caballus, Sus scrofa, Ailuropoda melanoleuca, Canis familiaris, Macaca mulatta, Pan troglodytes, Homo sapiens, Pongo abelii, Mus musculus, Rattus norvegicus, Cavia porcellus. Monotreme mammals: Ornithorhynchus anatinus. Birds: Taeniopygia guttata, Gallus gallus. Amphibian species: Xenopus tropicalis and bony fish: Danio rerio, Oryzias latipes, Gasterosteus aculeatus, Tetraodon nigroviridis, Takifugu rubripes.

Complete sequences of GTF2I protein are known only for terrestrial vertebrates including: eutherian mammals - Callithrix jacchus, Nomascus leucogenys, Pongo abelii, Pan troglodytes, Homo sapiens, Macaca mulatta, Oryctolagus cuniculus, Loxodonta africana, Cavia porcellus, Rattus norvegicus, Mus musculus, Vicugna pacos, Pteropus vampyrus, Bos taurus, Canis familiaris, Ailuropoda melanoleuca, Sus scrofa, Equus caballus; marsupials - Monodelphis domestica; monotremes - Ornithorhynchus anatinus; birds - Meleagris gallopavo, Gallus gallus, Taeniopygia guttata; amphibians - Xenopus tropicalis.

GTF2IRD2 proteins are found only in eutherian mammals. Information across the following species was collected: Ailuropoda melanoleuca, Canis familiaris, Bos taurus, Equus caballus, Pteropus vampyrus, Rattus norvegicus, Mus musculus, Macaca mulatta, Pongo abelii, Homo sapiens, Pan troglodytes.

Partial data on proteins from GTF2I family were collected from cartilaginous fish: Callorhinchus milii, Leucoraja erinacea and Squalus acanthias and also from lancelet Branchiostoma floridae.