August 26, 2008

A Foundation of Generative Grammar as an Empirical Science[1]

Hajime Hoji

1.The essentialsof the proposal

Our goalis as stated in(1).

(1)Our Goal:

To discover the properties of the Computational System (CS) that is hypothesized to be at the center of the language faculty.

Our main concern with (2) has led us to consider a fundamental question in(3).

(2)A fundamental methodological concern:

How could we proceed to make progress toward achieving the goal in (1)?

(3)A fundamental empirical question:

What should count as evidence for or against a hypothesis about the CS?

The major goal of this book is provide an answer to (2) and (3), and illustrate its proposals with concrete empirical materials. Our answer to (3) is repeatable phenomena.[2] And our primary answer to (2) is the heuristic in (4).

(4)Auxiliary hypotheses should not be used for making further theoretical deduction and deriving further empirical consequences if they have been shown not to be backed up by a repeatable phenomenon.

The essential aspects of the view presented and defended in the book can be summarized as follows, by borrowing the notions of Lakatos' 1970/1978 'scientific research programmes'.[3]

(5)Three components:

a.Hardcore: the hypotheses that are adopted without direct empirical evidence and not subject to refutation or modification

b.Auxiliary hypotheses: hypotheses that are subject to modification and abandonment.

c.Heuristics: research guidelines to follow

The hypotheses in the hard core are of the following three types.[4]

(6)Three types of hypotheses in the hardcore.

a.hypotheses of the most general nature

b.hypotheses about the properties of the CS

c.hypotheses about the properties of the Lexicon in general[5]

(7) is an instance of (6a) and it is perhaps the single-most important hypothesis in the hardcore.

(7)Hypothesis in the hardcore (I):

The CS is embedded in the model of judgment making by the informant, as roughly schematized in (8).

(8)The Model of Judgment Making by the Informant: (To be elaborated below; see (20).)

Presented Sentence ≈≈> CS ≈≈> Judgment

Presupposed in (7) are(9a) and(9b), both of which are of the (6a) type.

(9)Hypotheses in the hardcore (II):

a.The CS exists.

b.The input to the CS is a numeration (a set of items taken from the mental Lexicon) and its outputs are PF and LF representations, as indicated in (10).

(10)The Model of the CS:

umeration  / => / CS / => / LF()

PF()

Numeration: a set of items taken from the mental Lexicon

LF(): an LF representation based on 

PF(): a PF representation based on 

The arrows in (10) represent the 'is the input of' relation and the 'yields as an output' relation, respectively. The curvy arrows "≈≈>" in (8), on the other hand, do not represent an input/output relation, as will be discussed below.

The hypothesis in (11) is of the (6b) type.

(11)Hypothesis in the hard core (III):

A hypothesis about the CS:

There is an operation Merge, internal and external, and that is the only structure-building operation in the course of a derivation starting from a numeration.[6]

Presupposed in (9b) is the existence of the mental Lexicon; (12) is thus another hypothesis of the type (6a).[7]

(12)Hypothesis in the hard core (IV):

The mental Lexicon exists.

The auxiliary hypotheses are of the three types in (13).

(13)Three types of auxiliary hypotheses:

a.hypotheses about the CS

b.hypotheses about (items of) a specific Lexicon.

c.hypotheses about the relation between properties mentioned in (13a) (and those mentioned in (13b)), on the one hand, and the informant's intuitions on the other

Hypotheses of the (13a) type do not make reference to any language-specific property and they are meant to be a universal claim. An example of (13a) is given in (14).

(14)A hypothesis about the properties of the CS:[8]

FD(A, B) is possible only if Ac-commands B.

The hypotheses in (15) are examples of (13b); cf. the discussion in chapter 2: xx, chapter 4: xx.

(15)Hypotheses about a specific Lexicon:[9]

a.Zibunzisin is [+A].

b.A so-NP is I-indexed or not indexed.

c.An a-NP is D-indexed.

The hypotheses of the (13c) type are called bridging statements, and they state what concepts/relations necessarily underlie certain linguistic intuitions of the informants. The form of the bridging statement we have considered is given in (16) and we take this to be the basic form of the bridging statement.

(16)The basic form of a bridging statement:

A certain linguistic intuition involving two linguistic expressions a and b (i.e., (a, b)) obtains only if there is a relation between LF(a) and LF(b)—R(LF(a), LF(b))—where LF(a) and LF(b) stand for what correspond to a and b at LF, respectively.

It is possible, as will be pointed out in footnote 33, that the hard core includes hypotheses about the general properties of the bridging statement.

Other than the 'negative heuristic' in (17), the most general research heuristic I would like to adopt is (18).

(17)Lakatos' general negative heuristic: (See Lakatos 1970: 133 (reproduced in1978: 48).)

Do not direct modus tollens at the 'hard core'.

(18)A general research heuristic:[10]

We should maximize our chances of learning something about the properties of the CS from the disconfirmation of our predictions.

From (18), we obtain a more specific heuristic in (19).

(19)Auxiliary hypothesesshould not be used for making further theoretical deduction and deriving further empirical consequences if they have been shown not to be supported empirically.

We takethe content of 'being supported empirically' as 'being backed up by repeatable phenomenon. We thus obtain(4), repeated here.

(4)Auxiliary hypotheses should not be used for making further theoretical deduction and deriving further empirical consequences if they have been shown not to be backed up by a repeatable phenomenon.

A repeatable phenomenon obtains if and only if a *Schema-based prediction has survived a rigorous test of disconfirmation and its corresponding okSchema-based predictions have been confirmed. In relation to the above discussion, a *Schema-based prediction is that no examples conforming to a *Schemaare judged acceptable under (a, b) and the okSchema-based prediction is that there are some examples that conform to the corresponding okSchema are judged to be not totally unacceptable under (a, b).[11] See chapter 3: section xx for more details.

In the next subsection (section 2) I will elaborateon the model of judgment making and address the question of how we could draw a definitive implication from the informant's judgment for properties of the CS. The considerations there point to the merit of focusingon the informant's judgments on the acceptability of sentence  under interpretation (a, b) (i.e., with an interpretation involving two elements a and b), at least at the initial stage of our research. This is a consequence of adopting the heuristic in (18), as will be pointed out in section3.1, where it is also noted that (18) helps us grapple with Duhem's problem that it is in principle impossible to determine what hypothesis in the theory is responsible for the failure of a prediction (to be borne out) because a prediction is not made by a single hypothesis but by the whole theory. I will also introduce another heuristic in section3.3, based on a proposal made in Reinhart 1983. A summary of the chapter is provided in section 5 and brief concluding remarks will be given in section 6, along with some promissory notes for what I plan to add later.

2.The model of judgment making

2.1.Judgments ontheacceptability of sentence  with interpretation(a, b)

Consider the model of judgment making in (8), repeated here.

(8)The Model of Judgment Making by the Informant:

Presented Sentence ==> Parser ≈≈> Numeration == CS ==> LF ==> SR ≈≈> Judgment

We can elaborate on (8), by considering the informant's judgment on theacceptability of sentence  with interpretation (a, b).[12]

(20)The Model of Judgment Making by the Informant on theacceptability of sentence  with interpretation (a, b):

Lexicon / a, b)
 / 
 / = / Parser / ≈≈ /  / = / CS / => / LF() / = / SR() / ≈≈> / 
 / 
 / PF()
 / 
 /  /  /  /  / pf()

(21)a.: presented sentence

b.: numeration

c.(a, b): the interpretation intended to be included in the 'meaning' of involving expressions a and b

d.LF(): the LF representation that obtains on the basis of 

e.SR(): the information that obtains on the basis of LF()[13]

f.PF(): the PF representation that obtains on the basis of 

g.pf(): the surface phonetic string that obtains on the basis of PF()

h.the informant judgment on the acceptability of under(a, b)

The arrows to and from CSin (20) represent the 'is the input of' relation and the 'yields as an output' relation; cf. (9b). Similarly, what is meant by the arrow between LF and SR in (20) is that SR obtains on the basis of LF. The other instances of the arrows in (8) are used more loosely, as indicated in (22).[14]

(22)a.=> Parser: ... is part of the input to ...

b.Parser ≈≈> : ... contributes to the formation of ...

c.SR () ≈≈> : ... serves as a basis for ...

The informant's judgment must be based on the comparison, so to speak, between  and pf() and between (a, b) and SR(); see the shaded parts in (20). If pf() is distinct from , that would mean that the informant is judging something other than . We have suggested in chapter 3: xx that we could eliminate this possibility by making recourse to the informant's string sensitivity,by conducting some training sessions,for example. Let us assume that we can ensure the informant's string sensitivity and that the numeration  that the informant 'comes up with' necessarily resultsinpf()non-distinct from , as long as pf() indeed obtains. This has the consequence of making it unnecessary to address (23b) and the sole factor that contributes to the informant's judgment on the acceptability of under(a, b) is (23a), provided that pf() indeed obtains.

(23)a.the (a, b)-SR() compatibility

b.the -pf(a, b) compatibility

The value of  gets affected not only by the choice in each of (24) but also by the degree in each of(25).

(24)The informant's judgment on the acceptability of sentence  with interpretation (a, b) is affected by all of:

a.whether or not  obtains, corresponding to 

b.whether or not pf() obtains

c.whether or not SR() obtains

d.whether or not SR() compatible with (a, b) obtains

(25)a.[P]: the degree of difficulty associated with obtaining  corresponding to 

b.[I]: the degree of unnaturalness associated with SR()

This is reflected in (26).

(26)The informant's judgment  on the acceptability of sentence  under interpretation (a, b):

ranges between 0 and 1, with the former corresponding to 'complete unacceptability' while the latter corresponding to 'full acceptability'.

 =[G]—[P]—[I], where

[G] is 1 if and only if (i) pf() obtains and (ii) SR()compatible with (a, b) obtains; otherwise, [G] is 0.

[P] is some value (0  [P] 1) whichrepresents the difficulty the informant 'feels' in 'obtaining'  and 'reports' as such.

[I] is some value (0  [I] 1) which represents the unnaturalness the informant 'feels' about SR() compatible with (a, b) and 'reports' as such.

The chart in (27) below shows how the various factors in (24) and (25)would affect . As indicated in (26), 'full acceptability' and 'total unacceptability' are represented as =1 and=0 respectively.

(27)How the various factors in (24) and (25)would affect  in (20):

(i) / (ii) / (iii) / (iv) / (v) / (vi) / (vii)
(24a)? / yes / yes / yes / yes / yes / yes / no
(24b) pf()? / yes / yes / yes / yes / no / no / no
(24c)SR()? / yes / yes / yes / no / no / yes / no
(24d) SR(
compatible with (a, b)? / yes / yes / no / no / no / no / no
(25a) [P] / zero / not zero / N/A
(25b) [I] / zero / not zero / N/A / N/A / N/A / N/A / N/A
= / 1 / 01 / 0 / 0 / 0 / 0 / 0

Each blank cell in (27) indicates that the 'value' there would not affect .[15] Shown in (28) is how the value of  in (27) obtains for each of (i)-(vii) in the terms of (26). N/A indicates that the factor in question could not be addressed.

(28)a.(27-i):

 obtains corresponding to 

pf() non-distinct from obtains.

SR()compatible with (a, b) obtains.

Hence [G]=1.

[P]=0 and [I]=0

Hence [G]—[P]—[I] = 1 —0 —0 = 1

b.(27-ii):

 obtains corresponding to 

pf() non-distinct from  obtains.

SR() is compatible with (a, b).

Hence [G]=1.

[P]  0 and [I]  0

Hence 0[P]1 and 0 [I]1.

Hence  = [G]—[P]—[I]: 01.

c.(27-iii):

 obtains corresponding to 

pf() non-distinct from  obtains.

SR() obtains but SR() compatible with (a, b) does not obtain.

Hence [G] =0.

Hence [G]—[P]—[I] = 0.

d.(27-iv):

 obtains corresponding to 

pf() non-distinct from  obtains.

SR() does not obtain and hence SR() compatible with (a, b) does not obtain.

Hence [G]=0.

Hence [G]—[P]—[I] = 0.

e.(27-v):

 obtains corresponding to 

pf() does not obtain and hence pf() non-distinct fromdoes not obtain.

SR() does not obtain and hence SR() compatible with (a, b) does not obtain.

Hence [G]=0.

Hence [G]—[P]—[I] = 0.

f.(27-vi):

 obtains corresponding to .

pf() does not obtain and hence pf() non-distinct from  does not obtain.

SR()obtainsbut SR() compatible with (a, b) does not obtain.

Hence [G]=0.

Hence [G]—[P]—[I] = 0.

g.(27-vii):

 does not obtaincorresponding to .

Hence neither PF() nor LF() obtains, hence neither pf() nor SR() obtains, and hence pf() non-distinct from  nor SR() compatible with (a, b) obtains.

Hence [G]=0.

Hence [G]—[P]—[I] = 0.

The purpose of checking the informant judgment on theacceptability of  under (a, b) is to evaluate a hypothesis or hypotheses about properties of the CSand more in particular about properties of LF. A bridging statement, as in (29), puts forth a hypothesis about what formal object/relation at LF necessarily underlies the intuition/interpretation(a, b).

(29)Bridging statement:

(a, b) arises only if there is R(LF(a), LF(b)) at LF, where LF(a) and LF(b) are what corresponds to a and b at LF.

Because SR( obtains on the basis of LF(), the compatibility between SR( and(a, b) would mean that there could be numeration  corresponding to  that would result in LF()containing the formal object/relation necessary for the intuition/interpretation(a, b). Conversely, we infer from the compatibility or incompatibility between (a, b) and SR( corresponding to  and hence whether or not  can correspond to an LF representation that contains the formal object/relation necessary for the intuition/interpretation (a, b). As we start accumulating the information of this sort, we are (finally) in a position to start building a lab, so to speak, for generative grammar as an empirical science. Recall that the fundamental problem with research concerned with the properties of the CS is that we can only have indirect access to the relevant data, as discussed before and as schematically illustrated in (8) and (20) above.[16] It is by obtaining clearly established correlations between surface arrangements of elements and the abstract structural relations among them that our lab becomes effective in the pursuit of discovering the properties of the CS.[17] After all, without such correlations, the informant's judgment on sentence  under (a, b) would not reveal anything about the properties of the LF and hence of the CS. Therefore, it is of upmost importance that we be able to ensure, as much as possible, that the informant's judgment  is revealing about the properties of the LF and hence of the CS.

The bridging statement[18] relates (a, b) with some LF object/relation that necessarily underlies it. Our hypothesis about the CS states that such an LF object/relation obtains or gets established only if LF(a) and LF(b) stand in a particular structural relation. We need to be able to determine when we can regard  to be a reflection of, and hence as being revealing about, the existence of SR() compatible with (a, b) or the lack thereof. Let us thus return to the informant's judgment in (20). Notice that the judgment 01 can arise only if there is numeration  corresponding to  such that SR() compatible with (a, b) would obtain; see (i) and (ii) in (27). In that sense, the judgment 01 on the acceptability of  under (a, b) in (20) is revealing about the structural relation at LF between LF(a) and LF(b). Notice also that the failure to obtain=1, on the other hand, is not revealing about the existence or the absence of SR() compatible with (a, b) because [P] and [I] may make  lower than =1 even if the informant 'comes up with' numeration  corresponding to  such that SR() compatible with (a, b) would obtain; see (ii) in (27).

What about =0? It can arise in a variety of ways; see (ii)-(vii) in (27). Can it be revealing about the existence of SR() compatible with (a, b) corresponding to  and hence about the structural relation at LF between LF(a) and LF(b)? It can be, but only if we can manage to ensure that =0 arises as an instance of (iii) in (27). While =0 in (iii) in (27) can be reasonably attributed to the absence of SR() compatible with (a, b) corresponding to , and more in particular to the failure of the structural condition for (a, b) to be satisfied at LF, =0 in (ii), (iv), (v), (vi) and (vii) in (27) cannot. In (ii), for example, =0 arises not because of the absence of SR() compatible with (a, b) corresponding to  but because of [P] and/or [I].  in (iv) arises because the CS fails to give rise to an SRcorresponding to . In (v), not only SR() but also pf() fail to obtain. In (iv) and (v), therefore, the absence of SR() corresponding to  makes it impossible to address the structural relation at LF between LF(a) and LF(b).[19] In (vi), SR() obtains but pf() fails to obtain. Since the absence of pf() corresponding to  necessarily makes  unacceptable (see (26)), =0 results regardless of whether SR() compatible with (a, b) obtains corresponding to . Hence =0 in (vi) is not revealing about the structural relation at LF between LF(a) and LF(b), either. Finally, =0 in (vii) is due to the failure of the informant to 'come up with' numeration  corresponding to , and =0 in (vii) is not revealing about the structural relation at LF between LF(a) and LF(b), either.

These considerations are summarized in (30).

(30)What gives rise to =0 in (ii)-(vii) in (27); cf. (28):

a.(ii):[P] and/or [I]

b.(iii):the absence of SR() compatible with (a, b) corresponding to 

(and additionally [P]?)

c.(iv):the absence of SR() corresponding to 

d.(v):the absence of pf() and SR() corresponding to 

e.(vi):the absence of pf() corresponding to 

f.(vii):the failure of the informant to 'come up with' numeration  corresponding to 

The relevance of 01 and =0 on the acceptability of  under (a, b) in (20) is summarized in (31).

(31)Regarding the acceptability of  under (a, b) in (20):

a.The judgment 01 is revealing about the structural relation at LF between LF(a) and LF(b).

b.The judgment =0 is revealing about the structural relation at LF between LF(a) and LF(b) only if we can ensure that we are dealing with (iii) in (27).

I would like to suggest that the minimal paradigm when working under (20) must consist of examples of the three types as indicated in (32).

(32)a.a *Example such that at least one of the conditions (structural and possibly lexical as well) for (a, b) is not satisfied in any of the LF representations that could correspond to it

b.an okExample1such that it minimally differs from (32a) and the structural and lexical condition(s) for (a, b) is/are satisfied in an LF representation that could correspond to it

c.an okExample2such that it is identical to (32a) in terms of the surface string but with an interpretation that does not include (a, b)

I further suggest that the informant judgments must obtain as indicated in (33) in order for a repeatable phenomenon to obtain involving the bridging statement that has given rise to the *Example and okExamples in (32).

(33)

the judgments necessary for a repeatable phenomenon to obtain
(32a) *Example / =0
(32b) okExample1 / 01
(32c) okExample2 / 01

(32a) is intended to be an instance of (iii) in (27), namely, the case where =0 is due to the absence of SR() compatible with (a, b). As noted above, however, =0 may arise in a number of ways other than (iii) in (27), as in (ii), (iv), (v), (vi) and (vii) in (27). It is by obtaining the informant judgment 01 on (32c) that we eliminate the possibility of=0 on (32a) being due to (iv), (v), (vi) or (vii) in (27). Notice that =0 in (iv), (v), (vi) or (vii) in (27) is due to the failure of pf() and/or SR() to obtain and is independent of the compatibility between SR() and (a, b). If a given informant's =0 on (32a) were due to (iv), (v), (vi) or (vii) in (27),it should be independent of the compatibility between SR() and (a, b), and the same informant's judgment on (32c) should also be =0 because (32a) and (32c)are identical in terms of the surface string; see (32c), repeated above. That is to say, pf() and/or SR() should in that case fail to obtaincorresponding to (32c) (and hence (32a)), resulting in=0 rather than 01 on (32c)(as well as on (32a)). Obtaining the informant judgment 01 on (32c) therefore ensures that =0 on (32a) is not an instance of (iv), (v), (vi) or (vii) in (27). =0 on (32a) in that case must therefore bean instance of (ii) or (iii) in (27).

Now, =0 can also arisedue to [P] and/or [I], despite there being SR() compatible with (a, b) corresponding to sentence , as in (ii) in (27). That is to say, if SR() compatible with (a, b) obtains and if the effects of [P] and [I] are not zero but not as large as to result in =0, we obtain 01. If the effects of [P] and [I] are zero when SR() compatible with (a, b) obtains corresponding to , we get the case in (i), i.e., =1. If a given informant's judgment is 01, the effects of [P] and [I] are not as large as to result in =0. Notice that (32a) differs minimally from (32b), i.e., only in regard to the condition(s) (structural and possibly lexical as well) for (a, b); see (32a) and (32b). It is therefore likely, and we in fact assume, that,for a given informant whose judgment on (32b) is 01, the effects of [P] and [I] are not as large as to result in =0 in relation to not only (32b) but also (32a). That would then eliminate the possibility of =0 on (32a) being due to [P] and/or [I]. We have earlier concluded that by obtaining the informant judgment 01 on (32c) we can ensure that =0 on (32a) is not an instance of (iv), (v), (vi) or (vii) in (27). We have now seen that by obtaining the informant judgment 01 on (32b) we can (virtually) eliminate the possibilityof=0 on (32a) being an instance of (ii) in (27).