Journal of Language and Linguistics Vol. 3 No. 1 2004 ISSN 1475 - 8989
Agreement Mapping System Approach to Language
László Drienkó
ELTE University, Budapest
0. Abstract
The most fundamental idea behind generative linguistics – as evident from the very beginning (see Chomsky (1957)) – is closely connected with the basic property of natural languages that longer sequences of linguistic elements can behave like shorter sequences, or, in actuality, like a single element. This is trivially expressed in the form of production rules of generative grammars. Rule ‘A B C’ reflects that the linguistic behaviour of a sequence of elements (words) representable as ‘B C’ is similar to the behaviour of a single element represented as ‘A’. In other words, ‘B C’ can be substituted for ‘A’. Thus the key idea behind generative linguistics is SUBSTITUTION.
Nevertheless, there is another property of NLs which, we propose, seems to be equally important. It is related to the capacity of NLs to express correlations of linguistic elements and to encode information through such correlations. We refer to this capacity as AGREEMENT, and to such linguistic correlations as agreement relations. The notion of agreement lends itself naturally not only to conjugation or inflection paradigms but – we believe – many phenomena in linguistics –syntax, phonology , or morphology - can be interpreted in terms of it. Our approach does not hinge on ‘classical’ notions of generative grammars, thus non-regular or non-context-free cases such as e.g. cross-serial dependencies (Shieber, 1985), Bambara vocabulary (Culy, 1985), vowel harmonies, or phenomena in Arabic morphology can be accounted for by using the same framework.Our model incorporates a reconstruction mechanism to handle unidentified input.
In this paper we give a formal definition of our mapping system and relate it to such notions as perception-production, competence-performance, learning, implementation, parsing, grammaticality judgements, openness of language. To demonstrate the usability of the model, we apply it to the problematic cases named above.
1. Introduction
The most fundamental idea behind generative linguistics – as evident from the very beginning (see Chomsky (1957)) – is closely connected with the basic property of natural languages that longer sequences of linguistic elements can behave like shorter sequences, or, in actuality, like a single element. This is trivially expressed in the form of production rules of generative grammars. Rule ‘A B C’ reflects that the linguistic behaviour of a sequence of elements (words) representable as ‘B C’ is quite similar to the behaviour of a single element represented as ‘A’. In other words, ‘B C’ can be substituted for ‘A’. Thus the key idea behind generative linguistics is SUBSTITUTION.
Nevertheless, there is another property of NLs which, we propose, seems to be equally important. It is related to the capacity of NLs to express correlations of linguistic elements and to encode information through such correlations. We refer to this capacity as
AGREEMENT, and to such linguistic correlations as agreement relations.
In the following part of Section 1 we outline our approach. In section 2 the model will be applied to ‘problematic’ issues like cross-serial dependency, Bambara vocabulary, Arabic morphology, and vowel harmony.
1.1 Agreement relations
Most agreement relations are expressed morphologically. Consider e.g. the Russian sentence in (1.1):
(1.1)Krasivaya d’evushka rabotayet
beautiful-FEM-NOMgirl-FEM-NOMwork-PRES-3rd-sing
The/a beautiful girl works/is_working
The linguistic correlation of the adjective ‘krasivaya’ and the noun ‘d’evushka’ is expressed through the shared feminine-nominative feature.
However, there may be correlation between elements which does not seem to have explicit manifestation. In (1.2),
(1.2)The beautiful girl works,
no morphologically expressed correlation can be detected between the adjective and the noun. Then, we may either say that the adjective and the noun share a nominative feature represented by zero morphemes, or we may posit that the adjective-noun correlation is built into the sequence configurationally, i.e., the environment for the adjective determines that it must be the modifier of the noun. Thus it seems reasonable to infer that configurationality and explicit agreement conditions should both be incorporated into our agreement model.
Note how the notions ‘substitution’ and ‘agreement’ relate to each other. The right-hand side of any production rule can be thought of as being a configurational agreement relation.
Intuitively, the left-hand side gives non-configurational information on the agreement relation on the right. More specifically, consider rules (1.3) - (1.4)
(1.3) A B C
(1.4)A B E D
We can say that there is a relationship between B and C in (1.3) characterised, on the one hand, by the very fact that B and C can somehow belong together (and in the given order), and, on the other hand, by the fact that, linked to each other, they can be substituted for element A. The difference between (1.3) and (1.4) is that different elements are linked together on the right-hand side, i.e they differ in configurational properties. The similarity is that both right-hand sides can substitute for A, i.e. (1.3) and (1.4) share a feature related to substitution properties.
Using agreement-relation terminology, we will say that elements B, C, D, and E share a feature, or attribute, – say, S_FOR – whose value is the same for these elements, i.e.
B: S_FOR = AC: S_FOR = AD: S_FOR = AE: S_FOR = A.
Thus, a production rule can be viewed as an agreement relation: the right side of the rule specifies which elements have to agree on a certain feature, the left side determines the value for that feature.
The forgoing lines serve only demonstrative purposes. In our agreement approach we need not make any reference to production rules.
1.2 Phrase, sentence: Pattern
Now we can say that any given phrase is representable as an agreement relation, i.e., as a sequence of elements which satisfies the agreement conditions/constraints for that relation.
We shall call a sequence – or more generally, a set - of elements , together with its agreement constraints a PATTERN. Graphically, we represent a pattern as in (1.5)
(1.5)
B CD
ATTR
where B, C, and D represent the elements which are configurationally linked together, and the line connecting B and C indicates the explicit agreement constraint that B and C should have the same values for attribute ATTR. For instance, the pattern for (1.3) could look like (1.6).
(1.6)
B C
S_FOR
The agreement constraint in (1.6) is satisfied if elements B and C have the necessary feature, i.e.:
B: S_FOR = AC: S_FOR = A
The pattern for ‘krasivaya d’evushka’ may be given as (1.7).
(1.7)
A N
S_FOR
and it is satisfied by elements
krasivaya: CAT = Ad’evushka: CAT = N
S_FOR=NS_FOR=N
(1.7) corresponds to the production-rule interpretation of (1.3), namely, that the adjective-noun sequence can be substituted for a noun. However, we may choose to abandon that interpretation and refer to agreement features which are more convenient or useful. E.g., (1.7)
does not express the requirement that the adjective and the noun must agree in gender and case. Pattern (1.8) straightforwardly incorporates such an agreement relation, and it is still possible to view the Adjective-Noun sequence as a single grammatical unit – a gender-case complex.
(1.8)
A N
GENDER
CASE[1]
Insofar as a phrase can be viewed as made up by (configurational and explicit) agreement relations, obviously, the grammaticality of a sentence will be determined by the agreement relations holding for its ‘phrasal’ subparts. In (1.1), for instance, besides the A-N gender-case correlation, person and number agreement is also required between the noun and the verb. Consequently, a pattern for (1.1) should incorporate both agreement relations. Cf. (1.9).
(1.9)
A NV
GENDER PERS
CASE NUM
The agreement-approach counterpart of the generative sentence as made out of phrasal constituents is the sentence pattern incorporating subpatterns.
1.3The mapping system
After the forgoing introductory bunch of information in this section we set the scene for our model more formally.
Basically, we consider the operation of the language faculty to be a mapping from input elements to patterns.
Patterns are ordered finite sets – sequences – of (symbolic or representational) elements.. Each pattern is associated with a finite (possibly empty) set of agreement conditions/constraints. The set PAT of all patterns should be finite.
The set ALLREP of all possible representational elements is not finite. The finite number of patterns is built out of elements in REPEL, a finite subset of ALLREP.The actual contents of REPEL must be specified by linguistic research.
Input elements are simple attribute-value structures (AVS) consisting of a finite number of attribute-value pairs. (1.10) exemplifies such an AVS.
(1.10)
PHFORMdog
CATn
PERS3rd
NUMsing
Known input elements are a finite set KEL. Elements in KEL are AVSs whose attribute values are fully specified, i.e. each attribute has a value. (1.10), for instance, is a known element.
Unknown input elements, i.e. AVSs with unspecified attribute values, constitute a nonfinite set UEL.
Attributes constitute a non-finite set ALLATTR. The actual contents of ATTR, a subset of ALLATTR to be used in an agreement model, must be specified by linguistic research. The values for the attributes can also be arbitrarily chosen so that they best satisfy linguistic needs. The set ALLVAL of all attribute values need not be finite. Representational elements in REPEL refer to elements of ATTR and VAL. VAL is a finite subset of ALLVAL. [2],[3]
The connection between attributes and their values is established by the mapping process. Formally, algorithm VALA(a(X)) returns a value[4] for attribute X of input element ‘a’ by
(1.11)
- looking up the value for elements in KEL (known elements)
- selecting values that yield successful mapping for elements in UEL (unknown elements)
Agreement constraints are finite sets of symbolic representations of requirements for agreement relations. Basically, they have three parts. Cf. (1.12).
(1.12)
1 3 4PERSFIRST-TO-FIRST
The first part specifies the elements that take part in the agreement relation. The second part specifies which attribute must have the same value for all the elements taking part in the agreement relation.[5] The third part, the recursive agreement strategy, is relevant only for recursive cases. See below. Thus an agreement constraint is fulfilled, iff the elements enlisted in the first part of the constraint have the attribute in the second part specified, and its value is
the same for all the elements (while using the necessary agreement strategy in the recursive case). CONSTR, the set of constraints is a finite subset of ALLCONSTR, the set of all possible constraints. (Cf. Footnote 2).
Agreement constraints may be (finite-length) conjunctions or disjunctions of other constraints. Requiring that all ‘subconstraints’ of a constraint be satisfied can be interpreted as a logical AND relationship. Our default interpretation of the relationship between constraints is conjunction, since we require that all constraints of a pattern should be fulfilled. In this respect, (1.13) and (1.14), for instance, are equivalent, so graphically we will use only one line for several (AND) agreement relations between the same elements.
(1.13)
B CD
PERS
NUM
C1 : 1 2 PERS FIRST-TO-FIRST
C2 : 1 2 NUM FIRST-TO-FIRST
(1.14)
B CD
PERS, NUM
C: C1 C2 = (1 2 PERS FIRST-TO-FIRST ) (1 2 NUM FIRST-TO-FIRST)
Agreement constraints can be applied disjunctively, as well. This means that, given e.g. constraint C: C1 C2 , either C1 or C2 should be satisfied. In the examples below our OR relations will not be exclusive. Disjunctive agreement relations will be indicated with dotted lines. In (1.15) for instance either the person or the number value must be the same for B and C.
(1.15)
B CD
PERS, NUM
Non-graphically, we may refer to patterns as sets of attribute-value specifications. For instance, (CAT = noun), or (X = x) represent such attribute-value specifications: they are atomic or non-recursive elements of ALLREP ( and of REPEL, if we choose to use them in our model).
We will say that a representational element is saturated, if an input element is mapped on it. That is, representational element Re: (X = x) is saturated if there is an input element
i( ..., X, ... ) such that VALA(i( X )) = x. Re can be saturated by unknown input element iu(?). since our VALA function may ‘suppose’ X=x for iu(?)
Patterns can contain recursive elements. Recursive representational elements are sequences of non-recursive elements. While atomic representational elements can be saturated only once in a pattern, recursive ones can be saturated any times. ((Xi = x i) (Yi = yi)) may symbolise a recursive representational element Rei consisting of two atomic elements.
A recursive element is n-saturated if all of its atomic elements are saturated n times, n 0.
The mapping algorithm can be sketched in the following way.
(1.16)
Given:
a sequence S of n input elements:
S = {i1(I11, ... , I1m(1)), ... , in(In1, ... , Inm(n))},
where 0 < n, m(i) is a function that returns the number of attributes input element i has; I11, ..., Inm(n) ATTR,
i1(I11, ... , I1m(1)) KEL UEL, ... , in(In1, ... , Inm(n)), KEL UEL
pattern P = {((X1 = x1)…) ... ((XL(P) = xL(P))…)}= {Re1 ... ReL(P)},
L(P) is a function that returns the length of a pattern, i.e, how many representational elements the pattern consists of;
x1, ... , xL(P) VAL; X1, ..., XL(P) ATTR; Re1 ... ReL(P) REPEL
P PAT
C(P), the set of constraints for P:
C(P) = {c1, c2, ... ch}, > h 0 , c1, c2, ... ch CONSTR
iorder(S): a function that returns a list containing input elements in the ‘original’ order, i.e. in the order they are given to the mapping system
morder(S,P): a function that returns a list containing input elements in the order they are mapped onto representational elements of P
For each input element i S find a representational element Rej P such that i saturates Rej.
S is successfully mapped on P /input is licensed by pattern/input is ‘grammatical’ iff
(1.17)
1.a.)random-access/non-linear case:
all elements of P are n-saturated and N, the number of input elements, or the cardinality of S is
L(P)
N = len(Rej)st(Rej)
j=1
where
L(P) is the function that returns the length of a pattern,
len(Rej) is a function that returns the length of representational element Rej of P, its value is 1 for an atomic element, and it returns the number of atomic elements that a recursive element consists of for recursive elements
st(Rej) is a function that computes how many times a recursive element has been saturated. Its value is 1 for atomic elements, since they can be saturated only once. We say that representational element Rej is n-saturated, where
n= st(Rej).
represents algebraic multiplication
b.)linear case:
saturation is monotonic, i.e.
all representational elements are n-saturated, where n is specified by st(Rej)
iorder(S) = morder(S, P), i.e, input elements are mapped in the order they are presented to the mapping system
representational element Rej of P is saturated by input elements if , where
f(j, k, pos(j)) = start(j) + klen(Rej) +Pos(j),
j-1
start(j) = len(Reg)st(Reg),
g=0
0 < pos(j) len(Rej), 0 k < st(Rej), len(Re0) = 0, st(Re0) = 0.
2.Constraints c1, c2, ... ch are satisfied.
The graphical representation in (1.9), repeated here as (1.18) is a shorthand notation for the more detailed (1.19).
(1.18)
A NV
GENDER PERS
CASE NUM
(1.19)
CAT=Adj CAT=Noun Cat=Verb
GENDER PERS
CASE NUM
That is, we omit the attribute specifications when they are self-evident.
Elements can be mapped onto patterns recursively. It may be specified for a pattern where recursion is allowed to occur. Such specification may be incorporated in the actual patterns. We will use arrows to show recursion.
(1.20)
A NV
GENDER PERS
CASE NUM
Pattern (1.20) licenses sentences like (1.21).
(1.21)
Krasivaya mal’en’kaya russkaya ... d’evushka rabotayet
Beautiful little Russian ... girlworks/is working.
In pattern (1.20) recursion is restricted to a single element represented by (CAT = A). It is a recursive representational element – as the arrow indicates. Generally, we will use arrows to mark recursive representational elements. The staring point of the arrow is the last atomic representational element of a recursive element. The arrow points to the first atomic element of the recursive element. In (1.22), e.g. the arrow indicates that the first element of the pattern is recursive representational element ((CAT = Adv) (CAT =A)) consisting of the two atomic elements (CAT=Adv) and (CAT=A).
(1.22)
AdvA NV
GENDER PERS
CASE
However, in recursive cases it may not be straightforward how to check agreement since several input elements can be mapped on a single representational element. Given i.e. pattern (1.23)
(1.23)
AN
GENDER
and input elements
i1: PHFORM = krasivaya,i2: PHFORM = malen’kiy
CAT = A CAT = A
GENDER = femGENDER = masc
i3: PHFORM = devushkai4: PHFORM = malchik
CAT = N CAT = N
GENDER = femGENDER = masc
and morder(S,P)= (i1 i3, i2, i4) – i.e: krasivaya devushka malen’kiy malchik – it should be desirable that i1 agree with i3 and i2 with i4. This type of agreement strategy will be called FIRST-TO-FIRST since agreement checking observes the order in which elements are mapped.
Another possibility – absurd for our current example, but useful in other cases – is checking agreement in semi-reverse order, that is the checking process observes mapping order for one atomic element, but input elements mapped on the other atomic element are taken in reversed order. Such a strategy may be called LAST-TO-FIRST. Imagine that mapping order is morder(S,P)= (i1 i2, i4, i3), i.e krasivaya malen’kiy malchik devushka. Then strategy LAST-TO-FIRST guarantees that malen’kiy malchik and krasivaya devushka be checked.
Of course, there are many more strategies to check agreement for recursive cases. It is for linguistic research to determine which strategy is optimal for which case.
Recall we view the language faculty as the operation of a mapping mechanism. We assume that the same – or similar - mechanisms may be involved for both language production
(‘generation’) and perception (recognition). The difference lies in the selection of input elements for the mapping system. In the case of recognition linguistic input is provided by the environment more directly, i.e. linguistic stimuli from the speaker’s environment activate representations of input elements in the speaker’s internal linguistic system (elements of KEL). In the case of production, input elements for the linguistic mapping system are activated by other internal cognitive modules.
For both cases – production and perception – the mapping system is equipped with a reconstruction mechanism to handle unknown[6] elements (elements of UEL, see (1.11)).This reconstruction mechanism can make guesses as to the status of unknown input elements. Basically, it is done by finding a pattern into which the unknown input element can fit together with the other (known) elements of the input and supposing that the unknown input element has the features required by the pattern.
We distinguish between linear and random-access ways of mapping. Arguably, the choice of the concrete mapping mode may depend on the particular linguistic task, i.e., on the particular circumstances under which linguistic input is available. The basic difference between linear and random-access mapping is analogous to the difference between ‘on-line’ and ‘off-line’ processing. Our terminology, however, attempts to emphasise that in the random-access case all elements of the input set are available for the mapping system at the same time – input is considered an unordered set – , while in the linear case only one input element - the next one – is available at a time – input is regarded as a sequence. Accordingly, linear mapping may play an important role in syntactic processing – parsing – where the temporal order of words is relevant, while random-access mapping should be supposed to be at work in cases where the temporal order of input elements is irrelevant or cannot be detected. The latter case is exemplified by phonetic processing where the temporal aspect of setting phonetic feature values seems to be irrelevant or unmanageable suggesting the alternative that phonetic features of an individual phoneme be considered collectively, i.e. the order of input elements – phonetic features – should not be taken into consideration when mapping onto phonetic patterns.