Abstract

Building the valency lexicon of Arabic verbs

The proposed contribution will describe the building of a valency lexicon of Arabic verbs using linguistically annotated corpus,The Prague Arabic Dependency Treebank (PADT), as its primary source. Valency of a verb is a set of its obligatory and/or optional arguments potentially or actually realized in an utterance. Valency is not predictable automatically. Valency information is useful in restoring the syntactic structure of an utterance, and has consequences for the study of the meaning. Valency lexicons can find application in automatic parsing as well as in language generation.

The primary goal of the study is to prepare theoretical and methodological background for creating the valency lexicon of the most frequent Arabic verbs based on the theoretical framework of Functional Generative Description (FGD). This Arabic lexicon, inspired by The Valency Lexicon of Czech Verbs VALLEX 2.0 ([http.//ufal.mff.cuni.cz/vallex/2.0; Lopatková et al., 2006; Žabokrtský, 2005]), may be used in particular for the tectogrammatical annotation of the PADT as well as for the proposed second edition of the corpus-based Arabic-Czech Dictionary ([Zemánek et al., 2006]).

PADT ([ Hajič et al., 2004]), a multi-level linguistically annotated corpus of Modern Standard Arabic consists of three levels of annotation – morphological level, analytical level of surface syntax, and tectogrammatical level describing the linguistic meaning of the sentence. Verb valency is studied on the second and especially on the third level of annotation, which provide us via dependency trees with relevant information about all syntactically dependent arguments on particular verb or verbonominal derivative.

According to the valency theory of FGDalso applied in the Czech lexicon VALLEX 2.0, each verb (lexem) has at least one valency frame. The exact number of these valency frames depends on the number of meanings of particular verb (lexical units). The valency frame consists of both obligatory and optional inner participants and obligatory free modifications ([Panevová, 1994]). For expressing relations between a verb and its complementation, FGD uses different functors. These functors are divided into actants (inner participants) and free modifications. The entire number of actants is five (Table 1) and there are many different free modifications which denote various types of adverbial complementation.

Table 1.Types of actants(inner participants of the valency frame) and their examplesillustrated on English sentences ([Lopatková et al., 2006: p. xvi]).

Actant / Meaning / Examples
ACT / actor / Peter read a letter.
ADDR / addressee / Peter gave Mary a Book.
PAT / patient / I saw him.
EFF / effect / We made her the secretary.
ORIG / origin / She made a cake from apples.

We believe that all the valency frames created on Czech and gathered in VALLEX 2.0 (about 6,500 valency frames for about 2,700 lexeme entries) can serve as a useful source of information for describing the valency in other languages taking into consideration their natural word order. It is possible to compare these valency frames with relevant data in Arabic and to preserve them, if they match, or to modify them, if they differ. As an example, the valency frame of one particular meaning ofvery frequent Arabic verb qāla (“to say”) can be mentioned (Table 2). Its valency frame corresponds to the Czech verb říciwith the same meaning.

Table 2. An example of the valency frame of the Arabic verb qāla(“to say“).

qāla / subject / li- (preposition) / can (preposition) / object/’inna (conjunction)
ACT / ADDR (optional) / PAT (optional) / EFF
to say / someone (subject) / to somebody / about something / something/that
  • Example from the corpus (shortened):

canal-calāqātiqāla al-wazīru’inna...

about the-relations [PAT]he-said [PRED]the-minister [ACT]that [EFF]

about the relations the minister said that...

Comment: Optional actant ADDR (addressee) was not realized.

In case of Arabic, verb valency should be studied in close connection with its verbonominal derivatives – participle (active and passive) and verbal noun (masdar in traditional Arabic linguistic terminology). Not only can these bear similar syntactic function as the verb (e.g. participle as a predicate in nominal sentences), but in many cases they preserve the same or almost the same valency frame as the verb they are derived from. As an example for preservation the valency frame (Table 3), we can mention the verb tālaba(“to demand”) and its active participle mutālib (“demanding”) and verbal noun mutālaba (“demanding, demand”).

Table 3.Valency frame of the Arabic verb tālaba (“to demand”)

tālaba / subject / object / bi- (preposition)/’an (conjunction)
ACT / ORIG (optional) / PAT
to demand / someone (subject) / of somebody / something/that
  • Examples from the corpus (shortened)
  • verb tālaba

tālabatal-wikālatual-ittihādāti

she-demanded[PRED]the-agency [ACT]of-the-unions [ORIG]

bi-tahlīli...

an-analysis [PAT]

the agency demanded an analysis of the unions...

  • active participle mutālib

...mutālibanqijādata’l-ğajšbi-’iqāmati

(he) demandingof-a-leadership [ORIG]the-army establishing [PAT]

hiwārin...

a-dialog

(he) demanding of the army leadership to enter into a dialog

  • verbal noun mutālaba

...tasilu ’ilā haddi mutālabati’l-zawğati

reachesan-extenta-demandof-the-wife [ACT]

bi-’l-infisāli

the-separation [PAT]

...to the extent that the wife demands to separate

Comment: Optional actant ORIG (origin)was not realized.

The paper will also present some of the tools and methods for querying the syntactic structures of the treebank. Based on our previous experience, the search tools (used in the “project”) will includeTrEd, Netgraph, and Xaira. TrEd and Netgraph allow structural queries into the trees. Xaira searches in linear text, but the underlying data for it can include e.g. annotations of functors and surface syntax (analytical) functions.

As the primary lexical resources, we will use the Czech Vallex 2.0, Hans Wehr’s Dictionary of Modern Written Arabic, the Czech-Arabic Dictionary, and the ElixirFM lexicon derived from the Buckwalter lexicon (allowing transformations/derivations between verbs/participles/verbal nouns).

References

BADAWI, Elsaid, CARTER, M.G. and GULLY, Adrian. Modern Written Arabic : A Comprehensive Grammar.London : Routledge, 2004.

HAJIČ, Jan, SMRŽ, Otakar, ZEMÁNEK, Petr, PAJAS, Petr, ŠNAIDAUF, Jan, BEŠKA, Emmanuel, KRÁČMAR, Jakub and HASSANOVÁ, Kamila. Prague Arabic Dependency Treebank 1.0. LDC catalog number LDC2004T23, 2004.

LOPATKOVÁ, Markéta, ŽABOKRTSKÝ, Zdeněk and BENEŠOVÁ, Václava. Valency Lexicon of Czech Verbs VALLEX 2.0. Technical Report TR-2006-34, Praha : ÚFAL MMF UK, 2006.

PANEVOVÁ, J. Valency Frames and the Meaning of the Sentence. In Luelsdorff, P.A. (ed.). The Prague School of Functional and Structural Linguistics. Amsterdam – Philadelphia : Benjamins Publ. Comp., 1994, p. 223-243.

RYDING, Karin C. A Reference Grammar of Modern Standard Arabic. Cambridge : Cambridge University Press, 2005.

WEHR, Hans. A Dictionary of Modern Written Arabic (Arabic-English). 4th ed. Urbana : Spoken Language Services, Inc., 1994.

ZEMÁNEK, Petr, MOUSTAFA, Andrea, OBADALOVÁ, Naděžda and ONDRÁŠ, František. Arabsko-český slovník [Arabic-Czech Dictionary]. Praha : Set Out, 2006.

ŽABOKRTSKÝ, Zdeněk. Valency Lexicon of Czech Verbs. PhD thesis. Charles University, 2005.