Words by Default: the Persian Complex Predicate Construction 1

2003. Words by default: the Persian Complex Predicate Construction. Elaine Francis and Laura Michaelis (eds.) Mismatch: Form-Function Incongruity and the Architecture of Grammar. CSLI Publications. 83-112.

Words by Default: the Persian Complex Predicate Construction[1]

Adele E. Goldberg

University of Illinois

1. Introduction

Persian (Farsi) has a large and open-ended set of complex predicates that consist of a non-verbal element, the host, followed by a light verb. Complex predicates (CPs) are of interest in the context of the present volume because they display a mismatch of lexical and phrasal properties: they act in some ways as a single word, and in other ways like more than one word. They form a central part of the grammar of Persian and many other languages, including Hindi, Japanese and Hungarian.[2]

This paper offers an account in which the Persian CP is treated as a construction represented in the lexicon[3]. Constructions are pairings of form and meaning that are learned and stored as pieces of linguistic knowledge. The existence of a construction can be established by demonstrating that some aspect of a usage pattern is not strictly predictable from its component parts or from other facts about the language. Productive lexical or phrasal patterns, semi-productive lexical or phrasal patterns, fixed idioms and morphemes are all potential constructions as long as some aspect of their form or function is not strictly predictable.

I use the somewhat cumbersome “not strictly predictable” circumlocution instead of saying that the forms are “unpredictable” or “arbitrary.” This is because most forms that are not strictly predictable are neither arbitrary nor totally unpredictable. As Bolinger (1965) reminds us, “what is 95% old is not 100% new.” That is, a given construction often shares a great deal with other constructions that exist in the language; only certain aspects of its form or function are unaccounted for by other constructions.

It is clear that not strictly predictable knowledge must be learned and stored as such since it is not predictable from other facts of language. Thus evidence that a word or pattern is not strictly predictable provides sufficient evidence that the form must be listed as a construction in this expanded version of the lexicon, or what is sometimes called the ‘constructicon.’ At the same time, unpredictability is not a necessary condition for positing a stored construction. There is evidence from psycholinguistic processing that patterns are also stored if they are sufficiently frequent, even when they are fully regular instances of other constructions and thus predictable (e.g., Losiewicz 1992; Bybee 1995). We assume patterns are stored as constructions even when they are fully compositional under these circumstances. The inclusion of these more frequent items brings the present approach in line with usage-based models of grammar (Langacker 1988; Barlow and Kemmer 2000; Bybee 1995; Goldberg 1999). On this view, item-specific knowledge exists alongside generalizations.

Thus, morphological stems and productive lexical and phrasal constructions are all treated as the same basic type of entity. This idea is the cornerstone of theories such as Construction Grammar, Cognitive Grammar and HPSG, in which grammar consists of CONSTRUCTIONS which are not strictly predictable form-meaning patterns that are morphological or phrasal (e.g., Fillmore, Kay, & O'Connor 1988; Pullum & Zwicky 1991; Fillmore & Kay 1993; Goldberg 1992, 1995; Jurafsky 1992, Lakoff 1987, Michaelis and Lambrecht 1996; Langacker 1987, 1991; Croft 2002; Pollard & Sag 1987). The fact that the repository of stored entities (the “lexicon”) does not coincide with a list of words in a language is a point that has been made by many others as well (e.g., DiSciullo and Williams 1996; Williams 1994; Marantz 1997; Culicover 1999; Jackendoff 1996; 2002).

At the same time, traditional behavioral differences between zero level categories and phrasal categories are recognized on the present account. For example, zero level categories can appear in derivational constructions and cannot be separated syntactically.[4] This fact is important to keep in mind: both zero level words and phrasal patterns are stored together, but the classic distinctions still retain their force.

Below it is argued that the categorial status of the CP is a simple verb (V0) by default. Its expression as a verb or as a phrasal entity is determined by independently motivated constructions. Default V0 status accounts for the CP’s zero level properties, including its resistance to separation and its appearance in derivational constructions. V0 status is a default in the sense that it can be overridden if and only if there is another construction in the grammar that specifically overrides it. This proposal is implemented via a default inheritance hierarchy.

Hudson (1984; 1990; this volume) motivates the role that default inheritance hierarchies can serve in simultaneously capturing broad-generalizations, partial generalizations, and exceptions (see also Flickinger 1987 and references therein). Broad generalizations exist in the highest levels of the inheritance hierarchy; partial generalizations are captured by lower level representations, and exceptions are specified with their own peculiar properties below one or more of the generalizations. Default inheritance ensures that all non-conflicting information is shared between mother and daughter nodes. Conflicting (exceptional) information in the daughter node overrides the inheritance; it is in this sense that the inheritance is default.

As noted by Hudson and others, non-linguistic domain knowledge operates on the basis of a default logic. To take an example, consider our understanding of plane-boarding procedures. Almost all airlines have assigned seats and paper boarding passes with the seat assignment on them. When boarding, passengers are boarded from the back of the plane first. This is a broad generalization, and it determines what we know and how we expect to board most familiar or new carriers. Southwest Airlines, on the other hand, does not offer assigned seating, but instead distributes colored plastic boarding passes with ascending numbers, handed out in the order in which passengers check in. Passengers are boarded in groups of thirty and may take any available seat once inside the plane. The more specific knowledge we have about Southwest airlines’ boarding practices overrides the more general knowledge, and determines our expectations about boarding that particular airline.

As is the case with linguistic knowledge, exceptions are, to varying degrees, regular as well. The Southwest Airlines boarding procedures share with other airlines many things: all involve some type of boarding pass, all allow pre-boarding of families with young children, and all board passengers in groups. By allowing whatever information is non-conflicting to be inherited, regular aspects of exceptional elements are captured.

On the usage-based approach adopted here, more specific knowledge always preempts general knowledge in production, as long as either would satisfy the functional demands of the context equally well. In particular we assume that items lower in the inheritance hierarchy (i.e., the more specific) are preferentially produced over items above them in the hierarchy, when the items share the same semantic and pragmatic constraints. Note that this idea does not predict that speakers must always opt for a word that is maximally specific, universally selecting beagle over dog, for example. This is because beagle and dog are not semantically equivalent: there are contexts where the more general term is more felicitous either because it is more accurate or because the specific information is not relevant (see Murphy and Brownell 1985 on a relevant Gricean explanation for why basic level terms are often preferred over subordinate or superordinate terms). That more specific information should override more general information when the two are functionally equivalent is not a necessary consequence of adopting an inheritance hierarchy, but is one with much precedent (cf. the Elsewhere Condition of Kiparsky 1968, who attributes the generalization to Panini).

A hierarchical network of constructions clearly enables the theory to be in principle fully descriptively adequate. Generalizations are captured by higher level constructions in the hierarchy. Moreover, any sort of language-particular idiosyncratic factoid about a language can be captured by a specific enough construction.

What imbues a constructional approach with explanatory adequacy is a further desideratum that each construction must be motivated.[5] Motivation aims to explain why it is at least possible and at best natural that this particular form-meaning correspondence should exist in a given language.[6] Motivation can be provided by factors outside of the language-particular grammar, for example, by appeal to constraints on acquisition, principles of grammaticalization, discourse demands, iconic principles or general principles of processing or categorization. Alternatively, motivation may come from within the grammar. For example, the motivation for one construction having the form it does may come from the inheritance hierarchy itself, insofar as the form is inherited by a construction higher in the hierarchy. Motivation is distinct from prediction: recognizing the motivation for a construction does not entail that the construction must exist in that language or in any language. It simply explains why the construction “makes sense” or is natural (cf. Haiman 1985; Lakoff 1987; Goldberg 1995).

To return to the airline example, the general boarding procedures are motivated by the need to get passengers on board in an orderly fashion while respecting passengers’ desire to sit in particular seats. The boarding practices of Southwest airlines are also motivated; Southwest is a low-budget airline specializing in short flights. Priority is given to boarding passengers as quickly as possible. Less priority goes to ensuring that each passenger receives his preferred seat. At the same time that both boarding practices are motivated, neither boarding practise had to be exactly the way it is. The facts are not strictly predictable. Many other regional airlines operate like national carriers and not like Southwest. And it is conceivable that the national carriers could have all operated like Southwest instead of issuing seat assignments. Still, the existing facts are clearly motivated by their function; understanding the function “makes sense” of the procedures, or explains why they are natural.

The constructions posited to account for the Persian data are each motivated independently, and are claimed to be typologically natural; however, identical constructions clearly do not exist in every language: it is not claimed that they are universal or that they are innate. Instead it is assumed that they are learned from the positive input learners receive.[7]

In the present paper, an account of Persian complex predicates involving a default inheritance hierarchy is proposed, and compared with alternative accounts, including Goldberg (1996), which had proposed a ranked constraint analysis of a subset of the data discussed here, using the formalism of OT; it is argued that the DI analysis is preferable on empirical and theoretical grounds.

2. Identifying CPs in Persian

Complex predicate is used here to refer to host+light verb combinations in which the host appears in bare form, without plural or definite marking. In finite sentences with simple verbs, primary stress is placed on the main verb. But in finite sentences with CPs, primary stress falls on the host instead.

(1) Ali mard-râ ZAD(simple verb)

Ali man-acc hit.1.sg

Ali hit the man.

(2) Ali bâ Babak HARF zad (complex predicate)

Ali with Babak word hit

Ali talked with Babak.

Thus the stress facts treat the CP as a single zero level verb (see Lambrecht & Michaelis 1998 for discussion of principles of sentence accent placement). Additional evidence argues that the CPs act as simple lexical items: they may differ from their simple verb counterparts in argument structure properties, they undergo derivational processes that are typically restricted to applying to zero level categories, and they resist separation, for example, by adverbs and by arguments.

The present discussion focuses on combinations that have been classified as “inseparable complex predicates” in that the host cannot appear with a determiner (Karimi-Doostan 1997).[8] These so-called inseparable complex predicates are in fact separable under certain conditions. The ability to separate the pieces, although limited, would seem to argue against an analysis that treats the CP simply as a zero level verb. The present account offers an explicit account of the range of lexical and phrasal properties of these CPs, simultaneously capturing both its lexical and phrasal properties.

3. Additional Zero Level Properties

3.1 Changes in Argument structure

The complex predicate often differs in its argument taking properties from the corresponding simple verb. For example,in simple sentences, gereftan, “to take,”may occur with an explicit source argument:

(3) ketâb râ az man gereft

book ACC from me took

S/He took the book from me.

When used as a light verb in the CP arusi gereftan, “to throw a wedding,” the benefactive barâye phrase appears:

(4) a. barâye u arusi gereftam

for her/him wedding took

I threw a wedding for her/him.

In this case, the CP as a whole does not allow a source argument:

(5) b. * az u arusi gereftam

from her/him wedding took

3.2. The existence of transitive CPs

On a phrasal account of Persian CPs, nominal hosts would presumably be treated a direct object argument of the verb, since it often has the semantics of a direct object and it does not occur with a preposition. However, several of the Persian CPs are transitive, taking a(nother) direct object as the examples in (6) and (7) illustrate:

(6) Ali-râ setâyeS kardam

Ali-acc adoration did.1.sg

I adored Ali.

(7) Ali Babak-râ nejât dâd

Ali Babak-acc rescue gave.3sg

Therefore, the light verbs involved would have to be analyzed as double object verbs. But there are no verbs in Persian other than CPs that take two objects. Therefore the double object analysis would be an ad hoc way of accounting for transitive CPs.

3.3 Nominalizations

Another piece of evidence for lexical status is the ability to form nominalizations, since nominalization is a process that applies to zero level items. Persian CPs can form nominalizations by attaching the present stem of the light verb to the host:

(8) V: bâzi kardan Lit., “game + do” (“play”)

N: bâzikon “player” (as in soccer player)

(9) V: negah dâStan: Lit. “HOST + have” (to keep)

N: negahdâri: maintanance

(10) V: ruznâme neveStan “newspaper + write” (“to write newspapers”)

N: ruznâmenevis “journalist”

Complex predicates can also serve as input to gerundive nominalizations and adjectival past participles, both of which apply to simple verbs in an identical manner (Karimi-Doostan 1997: 61). One might wonder whether these constructions really form compounds since compounds are known to take as input two independent words, not a single zero level category. Compound formation is readily identifiable in Persian because compounds are formed by inserting an ezafe morpheme (/e/) between the two zero level items; the ezafe is not found between the two elements of the CP in these lexicalization patterns, however, demonstrating that the CP is indeed treated as a single zero level item.