HPSG without PS?

Richard Hudson, UCL

draft: August 1995

1. Introduction

There are two ways of thinking about the structure of a simple example such as Small babies cry: in terms of phrases and the relations among their parts or in terms of the words and their relationships to one another. At a very elementary level, the two approaches could be diagrammed as in Fig. 1.

Fig. 1

The arrows will be explained shortly, but their main significance is to represent some kind of `horizontal' word-word dependency, in contrast with the vertical relationships (following the diagonal lines) in the first diagram. The approaches are based respectively on phrase-structure (PS) and dependency-structure (DS). Both diagrams show that small babies combine to form a phrase, but they show it in different ways. In the PS analysis the phrase itself is explicit, and the word-word relationship is implicit: node 4 stands for the phrase, and the lines connect 1 and 2 to 4, but not to each other. For the DS analysis this balance is reversed: the arrow shows the word-word relationship explicitly, but the resulting phrase is left implicit.

The purpose of this paper is to argue that syntactic analysis which includes DS should not use PS as well; more precisely, I shall argue that this is true of most of syntax, in which constructions are headed (and involve subordination), though I shall not try to argue the same for coordination. (Indeed, I have argued elsewhere that coordination is precisely the one area of syntax where PS is appropriate; see Hudson 1990: chapter 14.)

Virtually everyone accepts DS as part of syntax, even if not by name - the notion `long-distance dependency' makes it explicit, but government, agreement, valency (alias subcategorization), and selection are all horizontal dependency relationships, and all word order rules seem to be expressible as dependencies. Similarly most theories now recognise `grammatical relations' such as head, complement, adjunct and subject; although usually expressed in terms of a function in the larger phrase, these can all be translated easily into types of word-word dependency. As PS was originally defined by Chomsky, none of these notions was available; so there really was no alternative to PS as a basis for syntactic analysis. But now that so many dependency notions are available in most syntactic theories, it is time to ask whether we still need the PS as well.

The question applies particularly urgently to Head-driven Phrase Structure Grammar (HPSG; Pollard & Sag 1994) as can be seen from the simplified version of Pollard and Sag's analysis of Kim gives Sandy Fido (p. 33) in Fig. 2.

Fig. 2

The most interesting thing about this diagram is the way the verb's structure cross-refers directly to the nouns by means of the numbers [1], [2] and [3]. These cross-references are pure DS and could be displayed equally well by means of dependency arcs. Almost equally interesting is the way in which the verb shares its class-membership, indexed as [4], with the VP and S nodes. An even simpler way to show this identity would be to collapse the nodes themselves into one. The only contribution that the phrase nodes make is to record the word-word dependencies via their `SUBCAT' slots: the top node records that the verb's subcategorization requirements have all been met (hence the empty list for SUBCAT), while the VP node shows that it still lacks one dependent, the subject. This separation of the subject from other dependents is the sole independent contribution that PS makes in this diagram; but why is it needed? Pollard and Sag argue persuasively (Chapter 6) against using the VP node in binding theory, they allow languages with free constituent order to have flat, VP-less structures (40), and in any case HPSG recognises separate functional slot for subjects (345). It is therefore important to compare the HPSG diagram in Fig. 2 with its pure-DG equivalent in Fig. 3?

Fig. 3

What empirical difference is there between these two diagrams? What does Fig. 3 lose, if anything, by not having a separate node for the sentence? Could an analysis like Fig. 3 even have positive advantages over Fig. 2? Questions like these are hardly ever raised, less still taken seriously. Pollard and Sag go further in this respect than most syntacticians by at least recognising the need to justify PS:

But for all that a theory that successfully dispenses with a notion of surface constituent structure is to be preferred (other things being equal, of course), the explanatory power of such a notion is too great for many syntacticians to be willing to relinquish it. (p. 10)

Unfortunately they do not take the discussion further; for them the `explanatory power' of PS is self-evident, as it no doubt is for most syntacticians. The evidence may be robust and overwhelming, but it should be presented and debated. A reading of the rest of Pollard and Sag's book yields very few examples of potential evidence. PS seems to play an essential role only in the following areas of syntax:

 in adjunct recursion (55-6),

 in some kinds of subcategorization where S and VP have to be distinguished (125),

 in coordination (203),

 in the analysis of internally-headed relative clauses, for which they suggest a non-headed structure with N' dominating S (233).

Apart from coordination (where, as mentioned earlier, I agree that PS is needed) the PS-based analysis is at least open to dispute, though the dispute may of course turn out in Pollard and Sag's favour.

The question, then, is whether a theory such as HPSG which is so well-endowed with machinery for handling dependencies really needs PS as well. My personal view is that this can now be thrown away, having served its purpose as a kind of crutch in the development of sophisticated and explicit theories of syntax; but whether or not this conclusion is correct, our discipline will be all the stronger for having debated the question. The rest of the paper is a contribution to this debate in which I present, as strongly as I can, the case for doing away with PS. The basis for my case will not be simply that PS is redundant, but that it is positively harmful because it prevents us from capturing valid generalisations. My main case will rest on the solutions to two specific syntactic problems: the interaction of ordinary wh-fronting with adverb-fronting as in (1), and the phenomenon in German and Dutch called `partial-VP fronting', illustrated by (2).

(1)Tomorrow what shall we do?

(2)Blumen geben wird er seiner Frau.

Flowers give will he to-his wife. `He'll give his wife flowers.'

First, however, I must explain how a PS-free analysis might work.

2. Word Grammar

My aim as stated above is `to argue that syntactic analysis [of non-coordinate structures] which includes DS should not use PS as well'. Clearly it is impossible to prove that one PS-free analysis is better than all possible analyses that include PS, so the immediate goal is to compare two specific published theories, one with PS and the other without it, in the hope of being able to isolate this particular difference from other differences.

Fortunately there are two such theories: HPSG and Word Grammar (WG; see Hudson 1984, 1990, 1992, 1993, 1994, forthcoming; Fraser and Hudson 1992; Rosta 1994). Apart from the presence of PS in HPSG and its absence from WG, the two theories are very similar:

 both are `head-driven' in the sense that constructions are sanctioned by information on the head word;

 both include a rich semantic structure in parallel with the syntactic structure;

 both are monostratal;

 both are declarative;

 both make use of inheritance in generating structures;

 neither relies on tree geometry to distinguish grammatical functions;

 both include contextual information about the utterance event (e.g. the identities of speaker and hearer) in the linguistic structure; and perhaps most important of all for present purposes,

 both allow `structure sharing', in which a single element fills more than one structural role.

Admittedly there are some theoretical differences as well:

 HPSG allows phonologically empty elements,

 HPSG distinguishes complements from one another by means of the ordered SUBCAT list rather than by explicit labels such as `object'.

And not surprisingly there are disagreements in published accounts over the vocabulary of analytical categories (e.g. Pollard and Sag's `specifier' and `marker') and over the analysis of particular constructions (e.g. Hudson's analysis of determiners as pronouns and total rejection of case for English; see Hudson 1990: 268ff, 230ff; 1995a). However these differences, both theoretical and descriptive, are only indirectly related to the question about the status of PS, so we can ignore them for present purposes.

One problem in comparing theories is to find a notation which does justice to both. The standard notation for HPSG uses either attribute-value boxes-within-boxes or trees, both of which are specific to PS, whereas DS structures are usually shown in WG by means of arrows between words whose class-membership is shown separately. To help comparison we can start by using a compromise notation which combines WG arrows with the HPSG unification-based notation, so that the information supplied by the grammar (including the lexicon) will be a partial description of the structures in which the word concerned may be used. For example, a noun normally depends on another word, to which it is connected by an arrow, and (obviously) can be used only at a node labelled `noun'; so Mary emerges from the grammar with the point of an arrow (whose shaft will eventually connect it to the word on which it depends), and also with the label `N' for `noun' (as well as `nN' for `naming noun', alias `proper noun'). In terms of morphosyntactic features it is singular, i.e. `[sg]'. A simple way of showing this information is by an entry like (3):

(3)

Mary

N, nN

[sg]

For some words the grammar supplies a little more information. For example, deeply must be an adjunct (abbreviated to `a') of whatever word it depends on, and he must be subject (`s') of a tensed (`[td]') verb (which typically follows it).

(4) -adeeply

Av

(5)he s- V

N, pN [td]

For a slightly more interesting word consider loves. As in HPSG this is supplied with a valency consisting of a singular subject and any kind of noun as object. These requirements will eventually be instantiated by dependencies to some other word, which we show by a labelled dependency arrow. The English word-order rules fix their (default) positions, hence their positions to the left and right of loves in the entry. Unlike the previous examples, loves can be the `root' of the whole sentence, so it does not need to depend on anotehr word (though it may depend on one, in which case it is the root of a subordinate clause); this is shown by the brackets above the top arrow.

(6) ( )

N s-loves -o N

[sg] V

Putting these four entries together generates the structure for He loves Mary deeply in Fig. 4.

Fig. 4

Word order is handled, as in HPSG, by means of separate `linear precedence' statements. Some apply to very general patterns, the most general of all being the combination of any word with its `anchor' (the word on which it depends[1]). By default the anchor comes first in English, though this default may be overridden by increasingly specific patterns. At the extreme of specificity are combinations of specific lexical items, which we can illustrate with the adverb deeply. Normally this follows a verbal anchor as in (7)-(9), but it can precede the verb resent as in (10):

(7) I love her deeply

*I deeply love her.

(8) I slept deeply.

*I deeply slept.

(9) We looked deeply into each other's eyes.

*We deeply looked into each other's eyes.

(10)I resent the suggestion deeply.

I deeply resent the suggestion.

This idiosyncrasy can be shown by a special lexical entry for the sequence deeply resent, supplementing the normal entry:

(11)( )

deeplya-resent -o N

Av V

This example illustrates an important general characteristic of WG, which it again shares with HPSG. Default inheritance and unification allow complex and specific entries to be composed on the basis of much more general entries, with the consequence that there is no clear division between `the lexicon' and `the grammar'. In WG the basic units of grammatical description are in fact elementary propositions such as those in (12).

(12)resent is a verb.

resent has an object.

The object of a verb is a noun.

A word's anchor is obligatory.

A verb's anchor is optional.

A word follows its anchor.

deeply is an adverb.

An adverb may depend on a verb.

deeply may both depend on resent and precede it.

Only the last of these propositions is specific to the sequence deeplyresent; but all the rest are available in the grammar, and when applied to this sequence they generate the more complex structure in (11).

Even at this elementary stage of explanation we can illustrate one of the positive disadvantages of PS. In a PS-based analysis, deeply is the head of an adverb-phrase and resent is the head of a verb-phrase, so the combination is in fact not a combination of words, but of phrases. In a tree-structure, the two words are likely to be separated by at least one AP and two VP nodes. These cannot simply be ignored in a lexical entry, because they are part of the definition of the relationship between the words. In contrast, a pure-DS analysis shows the words themselves as directly related to each other.

A similar problem arises with lexically selected prepositions such as the with required by cope or the on after depend, which can be handled very simply and directly in WG (with `c' standing for `complement').

(13)( )

cope -cwith

VP

In contrast, PS imposes a PP node between the verb and the preposition, so the only sister node available for subcategorization by the verb is labelled simply PP; but cope does not simply take a PP, it takes one headed by with. Pollard and Sag discuss the fact that regard lexically selects a PP headed by as, but their suggestion that the relevant phrases could be identified by the feature [+/- AS] (ibid: 110) is hard to take seriously. Similarly, stop requires a subordinate verb to be a present participle, which can easily be expressed in a DS entry as a restriction imposed directly by one word on the other:

(14)( )

stop -c V

V[prpt]

But if PS is available the first verb's sister is the VP headed by the participle, so the latter's inflectional features have to be projected up to the phrasal node.

The problem in each case is that the phrase node is a positive hindrance to satisfactory analysis. The problems created can of course be solved by projecting the head word's features onto the phrase; but the PS is part of the problem, not of the solution.

3. Structure sharing and (dis)continuity

As mentioned earlier, perhaps the most important characteristic of HPSG and WG is the notion of `structure sharing'. (`It is not going too far to say that in HPSG structure sharing is the central explanatory mechanism ...', Pollard and Sag 1994:19) In both theories virtually all the major complexities of syntactic structure require some kind of structure sharing. We start with a simple example of subject-to-subject raising, which I shall assume needs no justification. The entry for (intransitive) stop shows the raising as structure sharing whereby the two verbs share the same subject:

(15)( )

N s-stop -cV

___ s_/[prpt]

The sharing appears in sentence structure as two dependencies converging on the same word as in the two diagrams of Fig. 5. The second diagram shows the recursive application of raising where stop is combined with another raising verb, the auxiliary have.

Fig. 5

Structures such as this are a major innovation in dependency theory (as they once were in PS theory) because they disrupt the normally simple relationship among dependencies, phrases and linear order. If we assume that all the words which depend (directly or indirectly) on a single word constitute a phrase, it remains true that phrases are normally continuous, i.e. not interrupted.

Normally an infringement of this generalization leads to ungrammaticality. For example, what happens if we combine the following entries?

(16)-aafter -c N

P

(17) 

parties

N

[pl]

(18)biga- N

A

The ordering restrictions on after and big require parties to follow both these words, but nothing in the entries restricts the relative order of after and big; and yet after big parties is fine while *big after parties (with big depending on parties) is totally ungrammatical. The obvious explanation is that big parties is a phrase, so it must be continuous - i.e. it must not be split by a word, such as with, which is not part of the phrase.

Traditionally PS-based theories have assumed that phrases should be equivalent to a bracketing of the string of words, which means that discontinuous phrases are ruled out a priori as a fundamental impossibility - a discontinuous phrase simply cannot be separated by brackets from the rest of the sentence, because any brackets round the whole phrase must also include the interruption. Admittedly alternative views of PS have been proposed in which discontinuity is possible (McCawley 1982), and indeed the same is true of any PS-based theory (such as HPSG) which admits structure sharing. But if discontinuous structures are admitted in principle, what then excludes strings like *big after parties? The question seems trivially easy to answer from an HPSG perspective: discontinuity is only permitted when it is caused by structure sharing, and structure sharing is only permitted when it is explicitly sanctioned in the grammar. Structure sharing removes a single element from one constituent and makes it act as part of another (higher) constituent. For example, subject-raising causes discontinuity by locating the lower clause's subject in a named place (subject) in the higher clause; but since the grammar says this structure sharing is ok, it is ok. In contrast, there is no structure sharing pattern which allows big to move away from parties, so discontinuity is not ok.

What will turn out to be a weakness in this answer is that the grammar has to stipulate each structure sharing possibility. In most cases this is no problem but the problem data that we consider below will show that more flexibility is needed. There is an alternative which is much more obviously compatible with DS than with PS, and which will play a crucial part in the later discussion; so to the extent that the explanation works it will count as evidence against PS. The alternative is to ask what the essential difference is between the structures of It stopped raining, where the discontinuous phrase it ... raining is grammatical, and of *big after parties, with the illegal *big ... parties - apart from the fact that one is allowed and the other isn't. The difference is that all the words in It stopped raining are held together by one continuous phrase (the whole sentence). It has a legitimate place in the sentence thanks to its dependency on stopped, whereas the only link between big and the rest of the sentence lies through the discontinuous phrase. In other words, the grammatical sentence has a substructure which is free of discontinuities but holds all the words together.