Supplementary Info Box 1. Sequences used for alignments.

Human (Homo sapiens)

The uniprot sequence Q96SN8-1 was used for the alignment. This is the longest form of the protein and represents isoform 1 of four isoforms (see Figure 3 alignment of the human isoforms). The in vivo existence of these isoforms is confirmed by EST evidence, i.e., each can be found by a tblastn search of the NCBI GenBank EST library.

Rhesus monkey (Macaca mulatta)

The ensembl genome browser protein ENSMMUP 00000030613 was used for the alignment. Since the part of this protein sequence aligning to exon 11 of the human sequence showed no homology, the human exon 11 was aligned against the macaca genome using tblastn in the NCBI Blast program. This alignment revealed a sequence with 100% homology to the human protein (EELNSEIEKLSAAFAKAREALQKAQTQEF), and the respective fragment was thus exchanged in the macaca sequence used in the alignment.

Chimpanzee (Pan troglodytes)

The GenBank protein sequence ABF83588.1was used for the alignment.

Orangutan (Pongo abelii)

The ensembl genome browser protein ENSPPYP00000021910 was used for the alignment.

Horse (Equus caballus)

The ensembl genome browser protein ENSECAP 00000000238 was used for the alignment.

Cow (Bos taurus)

The GenBank protein sequence XP_584826 was used for the alignment.

Dog (canis lupus familiaris)

The GenBank protein sequence XP_855524.1was used for the alignment.

Opossum (Monodelphis domestica)

The GenBank protein sequence XP_001369278 was used for the alignment.

Chicken (Gallus gallus)

The chicken protein sequence used for the alignment is composed of different sequences found in the databases mentioned below. The ensembl genome browser protein ENSGALP 00000011278 was used as a basis for the alignment. Since this sequence obviously lacks N-terminal residues, the sequence was used to screen the NCBI genbank EST library using tblastn. This revealed two ESTs , DR426191 and BU286933, both showing homology to the ensembl protein sequence but extending the N-term to a length comparable to other proteins in the alignment. Furtheron the resulting sequence lacks the is equivalent to the human isoform 4 , lacking the exon 32. A search for gallus ESTs in the NCBI EST library revealed an EST (CN234342) containing this exon. The exon sequence (DPRCDASEEFRKDQNNPVDLHELLTEIQ

SLRVQLERSIETNKTLHEKLEEQLSKEKKEEMGSVSAVNINYLFKQESQHYAGMN ) was inserted into the protein sequence used for alignment.

Rat (Rattus norvegicus)

The GenBank protein sequence XP_001059116 was used for the alignment. Similar to the situation in the mouse (see below), this sequence is equivalent to the human isoform 4. As for mouse no evidence for an expressed isoform 1 could be found in the GenBank.

Mouse (Mus musculus)

The ensembl genome browser protein ENSMUSP 00000119891 was used for the alignment. This sequence is shorter than the human reference sequence since it represents the equivalent to the human isoform 4 lacking exon 32. A NCBI GenBank search did not reveal an equivalent for the human isoform 1. Neither the protein library nor a tblastn search of nucleotide sequences or ESTs contains a sequence including the sequence homologous to human exon 32. However, this sequence can be found by a tblastn search of the mouse genomic sequence. Thus, the lack of an expressed isoform 1 equivalent does not indicate that this exon does not exist in mice.

Fruit fly (Drosophila melanogaster)

The GenBank protein sequence NP_725298.1 was used for the alignment.