1

Cladistic Parsimony, Historical Linguistics, and Cultural Phylogenetics[*]

Abstract: Here, I consider the recent application of phylogenetic methods in historical linguistics. After a preliminary survey of one such method, i.e. cladistic parsimony, I respond to two common criticisms of cultural phylogenies:(1) thatcultural artifactscannot be modeled as tree-like because of borrowing across lineages, and(2) that the mechanism of cultural change differs radically from thatof biological evolution. I argue that while perhaps (1) remainstrue for certain cultural artifacts, the nature of language may be such as to side-step this objection. Moreover, I explore the possibility that cladistic parsimony can be justified even if (2) is true by appealing to the inference pattern known among philosophers as‘Inference to the Best Explanation’ (IBE).

1. Introduction

Recently,within historical linguistics, a number of studiesattempting to reconstruct the historical relationshipbetween extant languages have been undertaken usingmethods normally used by biologists to infer evolutionary history. In biology, evolutionary history is often represented on branching, tree-like diagramsknown as ‘phylogenetic trees’, or ‘evolutionary trees’, or just ‘trees’ for short. So tooin linguistics, language history is often displayed in a similar fashion. Among thosephylogenetic methods used by biologists to feature in historical linguistics, one that has proved quite popular iscladistic (or maximum) parsimony, a brief overview of which I’ll provide in section 2.The other prominent method that biologists use to infer the topology of phylogenetic trees is called ‘maximum likelihood.’This method has also been increasingly applied to language data, about which I’ll have more to say in section 6. Henceforth, I’ll refer collectively to this nascent research program as ‘linguistic phylogenetics’, or more broadly as ‘cultural phylogenetics’ , which includes the use of such methods to study other elements ofculture. Despite theprospect of rendering more exact our knowledge of the history of languages,the analysis of language data using phylogenetic methods has not been met with wide acceptance among historical linguists (Nichols and Warnow 2008, p. 760).[1]Two general objections to cultural phylogenetics loom large: (1) the history of cultural artifacts, such as languages, cannot be modeled as tree-like because of borrowing across lineages, and (2) the mechanisms of cultural change, including language change, differ radically from the mechanisms of biological evolution.

In this article, my purpose is twofold. First, I aim to bring these exciting methodological debates to a wider, interdisciplinary audience. Second, I aim to analyze to what extent these general objections undermine linguisticphylogenetics. Before doing that, I begin by explaining how cladistic parsimony works in biology and consider briefly one such parsimony analysison languages.Here, I focus on cladistic parsimony for a number of reasons: (1) the method is relatively simple and non-technical, and thusserves as an accessible example of phylogenetic inference;(2) the first major objection that I discuss applies equally to all phylogenetic methods,and so a detailed survey of all such methods is not necessary to appreciate this objection;and, most important,(3) the nature of maximum parsimony, but not maximum likelihood, might be such as to allow parsimony to avoid the second major objection. As I discuss in sections 6 and 7, if cladistic parsimony does not depend on its being vindicated by maximum likelihood in order to be justified—as some proponents of parsimony in biology aver—then challenges to the evolutionary models assumed by likelihood methods miss their mark, at least as concerns the use of parsimony. In support of a kind of non-statistical justification of parsimony—a possibility which has gone unappreciated in the methodological reflections of both opponents and proponents of language phylogenies—I propose the novel view that such a defense might naturally find a home in the epistemological framework known among philosophers as ‘Inference to the Best Explanation’.

2. Some Biological and Cladistic Preliminaries

In addition to the theory of natural selection, the other great triumph of Darwin’s On the Origin of Species is the advancement and defense of the theory of common ancestry. This is the idea that any two organisms,including those that belong to different species, will have, if we lookfar back enough in time, some ancestor in common from which both are descended.What’s more, it is not only organisms which are morphologically similar, such as coyotes, wolves, and foxes that are related because of theirdescent froma common ancestor, but rather, Darwinsurmised ‘all organic beings which have ever lived on this earth have descended from some one primordial form, into which life was first breathed’(1859, p. 484).In the last 150 years since Darwin’s bookwas first published, a fruitful research program has succeeded in amassing an abundance of evidence for the truth of the theory of common ancestry. Moreover, recent statistical analyses support the stronger claim that Darwin himself was cautious to assert, namely that there is one universal progenitor of all living things (Theobald 2010).

For contemporary biologists the difficult task thatremainsis to reconstruct the way in which the tree of life is structured.The theory of common ancestrysays that gray wolves, coyotes, and red foxes are all genetically related. But we want to know which two are more closely related, if, for example, wolves and coyotes share a common ancestor that is not at the same time a common ancestor of foxes. Put differently, we want to know if wolves and coyotes form a ‘monophyletic group’, i.e. a group that includes some ancestral organism and all and only its descendants. In this case, there are three distinct possibilities: 1) wolves belong to a group with coyotes that excludes foxes,2) wolves belong to a group with foxes that excludes coyotes,and3) foxes belong to a group with coyotes that excludes wolves.

One method that biologists have employed to tackle the problem of reconstructing the topology of the tree of life is that of cladistic parsimony.Like other principles of parsimony, such as Ockham's razor, which counsels us not to postulate entities beyond those that are necessary,cladistic parsimony is also concerned with the minimization of some quantity. But instead of minimizing entities,cladistic parsimony counsels us, when constructing trees,to minimize the number of ‘homoplasies’, i.e. the independent reappearance of some given character trait.Of course, this is not to say that homoplasies do not ever occur in nature.A classic example of a homoplastic trait is the ability to fly in bats and birds. Even though both birds and bats can fly, the most recent common ancestor of bats and birds could not fly. On the other hand, the ability to fly in both sparrows and robins is not homoplastic, but is instead ‘homologous’, as the most recent common ancestor of sparrows and robins did have the ability to fly.

In broad outline, to perform a simple application of cladistic parsimony on the three taxaabove, one needs to first choose a set of character traits and then determine which state each of the taxa is in.[2] These traits may be dichotomous, such as the presence or absence of canine teeth, but they need not be. Suppose that we pick 100 character traits and score wolves, coyotes, and foxes accordingly, where a 1 represents the presence of that trait and a 0 represents the absence of that trait. Next, in order to get a parsimony analysis off the ground, it is necessary to determine which traits are ‘plesiomorphic’, i.e. ancestral, and which state is ‘apomorphic’, i.e. derived. This can be done in a number of ways, one of which is by looking at the character states of some taxon which is thought not to belong to the clade whose genealogy is being reconstructed—an ‘outgroup’—and assigning the character states of the outgroup member the plesiomorphic state. Finally,what one needs to do is to determine how many homoplasies each of the three respective tree topologies shown above would require in order to accommodate the observed distributions of the 100 character traits in wolves, coyotes, and foxes.

On the one hand, certain tree topologies will require homoplasies for certain distributions of traits, and other tree topologies will not, the latter of which are thus favored by those distributions. On the other hand, certain distributions will be uninformative for the reason that these distributions can be accommodated on any of the trees with only one evolutionary change and no homoplasies. In general, for cladistic parsimony, only matchings of traits that are in the derived state, i.e.‘synapomorphies’, are evidentially relevant, whereas matchings of traits that are in the ancestralstate, i.e.‘symplesiomorphies’, are evidentially irrelevant.In the case at hand, and in general for three taxa,one simply needs to pick the tree with the smallest number of required homoplasies.Of course,few phylogenetic problems are this simple. Because the number of possible topologies increases to 34,459,425 when considering only 10 taxa (Felsenstein 1978b, p. 31),performing a parsimony analysis on even a tiny fraction of the millions of species identified is computationallyintractable. Consequently,for more complicated problems, sophisticated computeralgorithms have been developed to search for the most parsimonious tree.[3] Biologists are thus forced to rely on the power of computers in order to continue Darwin’s project.

3.Phylogenetic Methods in Historical Linguistics

Most of the attempts to apply phylogenetic methods to language data have been concerned with reconstructing the history of major language families, such as the Indo-European (Rexova et al. 2003), Austronesian (Gray and Jordan 2000), Bantu (Holden 2002; Rexova et al. 2006), and Papuan (Dunn et al.2005) language families.[4] While the method of maximum parsimony has been popular in biology since the late 20th century, in recent years, in light of concerns over how and whether parsimony is justified, new, more complicated methods, such as maximum likelihoodand Bayesian approaches, have come to rival parsimony (Steel and Penny 2000, p. 839), especially in studies that use DNA sequences as character traits.In addition to parsimony, these other phylogenetic methods have also been applied to language data (e.g. Gray and Atkinson 2003; Gray, Drummond, and Greenhill 2009; Dunn et al. 2011; Bouckaert et al. 2012) to infer the structure of majorlanguage families.

As discussed in the previous section, a parsimony analysis—or any phylogenetic analysis for that matter—requires that one has available a set of character traits on the basis of whichone can score the different taxa that are being analyzed. In the phylogenetic analyses that have been done on languages, the characters used have consisted of a variety of linguistic properties. These characters take the form of lexical, morphological, phonological,or syntactical features, or some combination thereof.

An example of a lexical character could be membership ina cognate set associated with the meaning hand. To code for this character trait in the Indo-European language family, for instance, one considers the various sets of cognates which mean hand in the language family, and then one assigns a language a 1 if it belongs in that cognate setand a 0 if it does not. There may be, and often is, more than one cognate set associated with any given meaning in a language family, and thus more than one character trait associated with that meaning. So, for instance, since hand in German is ‘Hand’ and in English is ‘hand’—both of which derive from the Proto-Germanic form ‘*handuz’(Skeat 2005, p. 259) –English and German belong to the same cognate set—call it C1—and so receive a 1 for membership in the cognate set C1.[5]In Russian, hand is ‘ruká’—which derives from the Proto-Slavicform‘*rǫka’ (Barford 2001, p. 18)—and so Russian is not a member of C1, since ‘ruká’is not a cognate of ‘Hand’or‘hand’. Thus, Russian receives a 0 for this character trait. Furthermore, since there is more than one set of cognates for hand in the Indo-European language family,one adds another character trait to the data set to account for that fact. In Italian, Spanish, and French hand is ‘mano’,‘mano’,and ‘main’, respectively—all of which derive from the Proto-Italic form ‘*manus’ (de Vaan 2008, pp. 363-4)—in which case all three languages belong to a different cognate set—call it C2—and so receive a 1 for that character trait. But there is no word for hand in Italian, French, and Spanish that is a cognate of ‘hand’ and ‘Hand’. So, unlike English and German, it follows that Italian, French, and Spanish do not belong to C1, and thus these three languages receive a 0 for that trait. Likewise, German and English receive a 0 for membership in the set C2, as there are no cognates of ‘mano’ or ‘main’ in English and German.So too, Russian receives a 0 for membership in C2, as ‘ruká’ is not a cognate of ‘mano’or‘main’.[6]

Similarly, a morphological character trait, such as the presence of a conjugated future tensecan also be coded. For such a trait, Italian, Spanish, and French receive a 1 because they all have a conjugatedfuture tense, which derives from their Latinate origin. On the other hand, German, English, and Russian receive a 0 because they lack a future tense, all of which represent thefuture construction by means of auxiliary verbs. In addition, phonological characters, such as particular sound changes can be coded, and other syntactical/structural features such as the presence of prepositions can also be coded as dichotomous traits. Normally,the coding of linguistic character traits proceeds in this fashion.[7]

As an example of an application of phylogenetic methodsto language data, consider the attempt in Holden (2002) to reconstruct the phylogeny of the Bantu language family, a group of 450 languages spoken across Africa south of the fifth parallel.[8] In this study, a parsimony analysis was run on 73 languages of the Bantu language family, in accordance with available lexical data. In addition, two closely related languages were selected as outgroups based on the likely location of the ancestral language. The data on the basis of which the Bantu tree was constructed includes 92 items of basic vocabulary, such as man,woman, tongue, fire, etc., where different cognate sets were treated as different character traits in the manner described above.Search algorithms were used in an attempt to find the shortest tree, and the results consisted of an un-weighted tree with a consistency index of .65, and a weighted tree (weighted on the basis of words thought morelikely to change) with a consistency index of .72, which is comparable to biological trees with similar numbers of taxa,suggesting that the language family is largely tree-like.[9]

Another crucial feature of the Bantu study is that, according to Holden, it sheds light on controversial questions regarding population and cultural history. In particular, it is consistent with a hypothesisconcerning the spread of farming across modern Bantu-speaking Africa. Many researchers who performphylogenetic analyses on language data attempt to argue for some archaeological or anthropologicalhypothesis on the basis of their tree constructions. For instance, Gray and Jordan (2000), using maximum parsimony, argue that the reconstructed tree, even with a consistency index of only .25, is evidence that colonization of Polynesia by pre-historic residents of Taiwan must have been relatively rapid. Rexova et al. (2006) suggest an ‘unorthodox scenario of Bantu expansion’ (p. 189) on the basis of a new parsimony analysis performed on more languages and with more characters in addition to those used in the analysis done by Holden (2002).

4. A Presumptive Argument in Favor of Linguistic Phylogenetics

Before addressing the controversy surrounding attempts to use phylogenetic methods to infer language trees, it is necessary to consider the motivations for appealing to these methods from biology in the first place. To fully appreciate these motivations, it is necessary to consider firstthe procedure normally used by historical linguists for establishing language families.This procedure is called the ‘comparative method’.[10]The comparative method is perhaps best illustrated by example, but can be described abstractly as a sequence of steps. In broad outline, in using the comparative method one infers languages families by way of the reconstruction of an ancestral proto-language. First, one begins with a set of languages alreadysuspected to be related. Second, one collects a cognate set, i.e. a collection of words or morphemes in the languages being investigated which are thought to be related because they descend from an ancestral language.[11] Third, one determines sound correspondences, i.e. the sounds found in related words of the cognate set which correspond among the related languages. Fourth, one reconstructs the proto-phonology, i.e. the sounds that featured in the proto-language on the basis of the phonetic properties of the daughter languages and ‘conventional wisdom regarding the directions of sound changes’(Durie and Ross1996, p. 7). Fifth, one uses the reconstructed proto-phonology to reconstruct proto-morphemes. Sixth,one establishes the shared innovations,(e.g. phonological, lexical,etc.) of groups of languages relative to the proto-language in order to construct the family tree. Finally, the completion of the project lies in constructing an etymological dictionary for the various languages in the language family, tracing the origin of the words in theirrespective lexicons.

The motivations for appealing to phylogenetic methods in historical linguistics are many, and I will mention only some of the most importantones here. On the one hand, a number of researchers have noted apparent parallels between both linguistic and biological evolution, which makes linguistics amenable to biological methods.[12]On its surface, it seems that, just as in the case of organisms,languages descend in a Darwinian fashion from common ancestors. For example,many English-speakers at some point are confronted with the historical fact that unlike French, Spanish, and Italian, which are Romance languages descended from Latin, English is more closely related to modern German.It is thus natural to use the language of ancestry and descent when it comes to describing the history of languages. And when we look closely at languages, say Dutch, English, and German, we notice sufficient overall similarities to suggest relatedness, such as the sound correspondence of the [t] in English and Dutch (‘tongue’ and ‘tong’; ‘twelve’ and ‘twaalf’) and the[ts]in German (‘Zunge’; ‘zwölf’). This regularity is robust enough to suggest common ancestry (Lass 2003, pp. 52-53). Darwin himself noticed similar linguistic correspondences, and in fact used analogies of cultural evolution in general to elucidate his own proposal of biological evolution(1871, pp. 78-79).