Split decomposition — Intro

In reconstructing a phylogenetic tree, we are essentially framing a series of hypotheses about the relationships between species. Generating hypotheses is an important component of the scientific process, but not the entirety. If we want to have confidence in the tree we have just constructed (or in any other hypothesis), it’s important to test it.

One simple technique for testing phylogenetic hypotheses is split decomposition. This procedure measures the support for evolutionary relationships among a set of four taxa, or quartet. There are three different ways in which members of a quartet can be related, assuming a strictly bifurcating tree:

Topology 1:

((A, B), (C, D))

Topology 2:

((A, C), (B, D))

Topology 3:

((A, D), (B, C))

Note that these are unrooted trees, also called phylogenetic networks: split decomposition doesn’t depend on the location of the evolutionary root. Also note that because we are interested in the system’s topology, the physical position of the taxa is irrelevant. For example, the networks ((A,B), (D,C)) , ((C,D), (B,A)) , and ((D,C), (A,B)) are all topologically equivalent to topology 1.

Given a distance matrix for the four taxa, we can calculate the split indices for each of the three topologies shown above. We begin by adding together the distances of taxa at opposite corners of the phylogenetic network. Next, we sum the distances of taxa on each end of the internal branch, and subtract this sum from the previous result. Finally, we divide by two to obtain the length of the internal branch. It may be helpful to envision this procedure graphically:

For topology 1, for example, we could calculate the split index as

,

where is the distance between taxa X and Y.

Alternatively, since the physical position of taxa doesn’t matter, we could just as well twist topology 1 around the internal branch and then calculate its split index as

.

Thus, each topology has two split indices, which may or may not be equal.

In practice, only one of the three possible topologies can correctly describe the four taxa’s phylogenetic relationship. That topology’s split indices are equal to the length of its internal branch. We can also find the “internal branchlength” for the other two topologies by calculating their split indices; however, since those topologies describe incorrect phylogenetic relationships, these indices have no biological interpretation and need not be identical (or even positive). Split indices therefore provide a way to test the relative support for each possible topology of a phylogenetic network.

Excel worksheet: “Split decomposition”

Summary: This worksheet generates a pairwise distance matrix for a quartet, using data entered by the user, randomly generated, or a combination of the two. The sheet then calculates split indices for each of the three possible topologies.

In the first row of red-lined cells, enter any branchlengths you want fixed. You may also enter a number into the next red cell: the worksheet will then generate random branchlengths lower than or equal to this maximum. Note that user-entered branchlengths override randomly generated ones.

If you want to control the phylogenetic network’s true topology, you may enter the letter corresponding to taxon A’s closest relative into the final red cell. Otherwise, a random closest relative will be chosen. The program will then calculate a distance matrix for the four taxa and split indices for each of the three possible topologies.

Points to note:

• The two taxa with the shortest distance between them aren’t necessarily closest phylogenetic relatives. For example, on the following network, the true topology is

((A, C), (B, D)). However, the shortest distance between two taxa is between A and D. Distance-based methods of phylogenetic reconstruction will therefore tend to yield the incorrect topology ((A, D), (B, C)). It’s not until we perform the split decomposition that the error becomes clear.