The Faculty for Teacher Education, University of Zagreb

Mr. Katarina Aladrović Slovaček, research assistant

Melita Ivanković, teacher

Savska cesta 77, Zagreb

Analysis of Croatian corpus of child language (age 7 to 12) and its usage in teaching

Language as an abstract system of signs has its two realizations: oral and written. A child utters syllables first, then words, phrases and then sentences. The process of voice articulation, together with one of the phases in language learning, is finished by the age of 6. In this phase a child whose language development is proper is capable to compose a sentence consisting of five, six or seven words (Pavličević-Franić, 2005). Vocabulary of a seven-year-old contains around 10 000 words (Pavličević- Franić, Gazdić- Alerić, 2010). Institutional learning of the Croatian language as a mother tongue starts in the kindergarten and it happens through different communicational language games. However, when starting school (at the age of 6 or 7), a child starts to learn language systematically. Since the Croatian language, as a Slavic language, is morphologically rich, language learning often presents a real problem to children. They are afraid of learning language and they don't like it (Pavličević- Franić, Aladrović, 2009). This fact shows the necessity to change language teaching, especially in primary school which according to Croatian educational system lasts eight years. Those changes require the usage of communicative-functional approach in language teaching (Miljević- Riđički and others, 2004).

Croatian National Educational Corpus has over a hundred million basic words (, June 2010). A base of Croatian child language, which includes a period till the age of six, was created within a CHILDES base. But, the corpus of Croatian child language during primary school is not registered in the mentioned bases, so the purpose of this research is to analyze the corpus of child language from the age of seven to twelve, the period when children are in the concrete operational stage (Piaget, 1969). The corpus has around 1500 written works collected while doing a research in 30 primary schools in all the regions of Croatia, and it contains around 30 000 words. The corpus will be coded and afterwards analyzed on morphological, syntactical and lexical level. The research will also try to answer the question of how to use the corpus while learning the mother tongue in primary school and help children to start loving their language and to learn it happily.

I. Language acquisition and learning

Language makes man different from all other creatures on the Earth and therefore the language acquisition is, on one hand, a completely usual occurrence while, on the other hand, it is very special and fascinating. The language acquisition itself shows the general features common to all children in the world since all of them manage to successfully acquire the language regardless of its other features, regardless of the language to which the children are exposed in natural situations and the teaching method since they manage to acquire the most different language stages in a very short period. The child's language development is connected to its physical, cognitive, emotional, social and communicative development (Owens, 1984). In the first several years the children of orderly language development gain full control of their language. When they are five years old, the children's vocabulary comprises 1,000 words, the majority of the phonological and grammatical system of their language has been acquired as well as the basics of the word meaning and their use and the manner of use of language in certain situations (McGregor, 2009: 203). The language acquisition also depends on the habits of spoken language in the child's surroundings, the speech of their parents, other members of the community which includes other children interacting with them. Though all children are able to learn the language to which they are exposed in early childhood, there are nevertheless individual differences among the children, such as: features of their mother tongue and different circumstances in which the language is learnt since they influence the speed by which the language is learnt. In order to explain the language acquisition, the researchers developed several different theories which can be divided into three main ones: behaviourist, generative and cognitive. Behaviourists (most significant of them being B. G. Skinner), consider the language acquisition to be learnt behaviour and therefore they condition it by creation of associative links between the stimulus and the response. They believe that the language and speech are learnt by imitation of speech of the adult person which could be called learning according to the model: auditive/visual stimulus - response to stimulus - reinforcement. The child listens to the model and imitates what they have auditory perceived. Imitating the adult speakers, by the method of trials and errors, stimulation and repeating, the child acquires the language structures which results in improvement of their language development (Pavličević-Franić, 2005). Generative theory occurred in the 50's of the XX century. In linguistics, this period was marked by development of generative grammar which views at the language as the knowledge of people to whom that language is a mother tongue or who are native speakers of that language. The aim of this linguistic theory is to reach the grammar in the mind of the speaker (Palmović, 2005). N. Chomsky, the creator of the generative grammar, differentiates the competence of the native speaker as the unconscious innate language knowledge and the performance– actual use of language in the actual situation. He believes that language acquisition is actually grammar acquisition because the children are born with language abilities and general knowledge about the form of the human language (Vilke, 1991). In order to develop it, it is necessary to expose the children to the language of the environment. In this way, the innate grammar of the children is stimulated and appropriately reinforced. This is how the generativists prove the fact that almost all children manage to acquire the mother tongue regardless of their other differences and the differences among the languages to which they are exposed (Jelaska, 2007). Chomsky is the main representative of the nativist theory which explains the easiness and the speed by which the children acquire the language thanks to the fact that a large part of their language knowledge is innate to them (Palmović, 2005). The innateness of the language model, according to Chomsky (1965), explains the similarity in the process of language acquisition in different languages and cultures. Chomsky calls the content of the said language modelLAD - the language acquisition device. According to such language acquisition model, the child is exposed to the language data from which they discover the language parameters specific for the particular language (Kuvač and Palmović, 2007: 52). This means that all children go through the same stages of language acquisition, use similar structures and make similar deviations from the language to which they are exposed, regardless of the language which they are acquiring. They only have to be exposed to any human language and their innate grammars will be stimulated and reinforced in a certain way. Based on these facts one can conclude that the language speakers adopted the production rules applicable to new linguistic occurrences and therefore the language acquisition is actually the grammar acquisition and acquisition of the cognitive system which enables the people to understand and use the language. Grammar is not learnt, it is acquired, adopted, spoken (Jelaska, 2007: 68). Though the grammars of natural languages differ, they also have a lot of similarities which are called universalities. The said universalities are considered to be an important proof of innateness because they could not have appeared by accident. The theory of innate ability is also proved by the fact that the children manage to master the language much better than it can be expected on the basis of the language data which they have been exposed to. The representative of cognitive theories, J. Piaget (1967), believes that cognitive abilities enable learning in general, which includes the learning of language which means that the developed cognitive abilities are necessary precondition for successful language development (according to: Pavličević-Franić, 2005). Piaget considered the language to be a means of the thinking process or thinking about the reality, the appearance of language therefore depending on the structure of the reality itself. In view of this fact, he believed that the appearance of language is conditioned by the level of the sensorimotor intelligence during the first eighteen months of the child's life. J. Piaget (1947.) believes that cognitive abilities enable learning in general, including the learning of language which means that the developed cognitive abilities are precondition for successful language development (cognitive theory). He claims that the language acquisition and learning happens in four stages: sensorimotor (from birth to the age of two); preoperational (from the age of two to the age of seven), concrete operations (from the age of seven to the age of eleven/twelve) and formal operations stage. After discussing Piaget’s theory, L. Vigotski (1962) concluded that the child becomes a sensible being at the moment of occurrence of speech and that the development of cognitive abilities and the child's development depend on language and are conditioned by language (according to Kovačević, 1996). The language acquisition does not end when the child enters school (in Croatian educational system about the age of seven), but goes on until the age of twelve when the language automation occurs which means that the children know the morphology and syntax on the level of language automation. This period is called the early language learning period and in Croatian educational system it lasts from the age of seven to the age of twelve. This is the period when the language should be learnt by developing and stimulating communicative competence. Since the language learning is very often connected with negative attitude of pupils towards the mother tongue caused by the quantity and difficulty of the content, the aim of this paper was to find out whether this attitude could be changed and what improvements can be made if corpus is introduced as one of the methods of language learning. The research made in 2004 (Miljević-Riđički and associates) confirmed that children do not like the Croatian language as the mother tongue and that it is placed on the bottom of the scale of favourite subjects. The research made in 2009 (Pavličević-Franić and Aladrović) shows somewhat better attitude of pupils to the Croatian language as the school subject, though itis still connected to many negative connotations. The extensiveness of content, inappropriate manner of content processing and inadequate content can cause problems in learning of the standard form of the Croatian language and „the fear of language“ which can consequentially cause long-term problems to the pupils related to their expression and literacy.For the sake of illustration, it should be mentioned that the Croatian language is the most comprehensive subject in primary school which the pupils are taught for five lessons per week in the period of early language learning (until the age of 12). In addition, the communication in the mother tongue is the first and the crucial competence of the lifelong learning since the child will more easily learn other languages as well as other subjects if they have learnt their language well (European Commission, 2005). With the aim to improve the quality of the Croatian language teaching and learning, the intention was to investigate a small corpus of written papers of pupils in order to identify the language problems which the pupils encounter at a certain age and to accordingly change the language teaching and learning methods in order to change the attitude of pupils towards the Croatian language as a school subject.

II. Corpus of Children's Language

Croatian National Corpus includes 101.3 million tokens ( and consists of a systematic collection of selected texts of the contemporary the Croatian language covering different media, genres, styles, areas and themes. Apart from Croatian National Corpus, there are some other corpora of the Croatian language, such as the Croatian Language Treasury of the Institute of the Croatian Language and Linguistics ( December 2010).

Research of children’s language in Croatia did not start as early as in the United Kingdom, the United States of America or Germany. The first description of the children’s language and its lexical development was provided by Ivan Furlan in his dissertation „Diversity of vocabulary and speech structure“(1961). Many language researches were done in the 60’s and 70’s, however, their name contained the word „speech“ instead of the word „language“. By the end of the 70’s, Ante Fulgosi published his paper „Recent Research in Psycholinguistics“ (1979), where he presented the recent research from the field of psycholinguistics which was primarily inspired by the generative theory of Noam Chomsky about the language acquisition. More systematic research of children’s language had not started until the 80’s of the XX century and the works of Stjepko Težak (Grammar in Primary School, 1980) and the 90’s and the works of M. Ljubešić, M. Kovačević, Z. Babić and D. Pavličević-Franić, while a larger step forward was made with opening of the Laboratory for Psycholinguistic Research (POLIN) in 1999. Through their research, the Laboratory members contributed to the understanding of acquisition of the children’s language within the Croatian language corpus.The first Croatian corpus of children’s language was made by POLIN and it is included in the CHILDES world database. It consists of spontaneous speech of three monolingual children and a corpus of story-telling abilities of preschool children. The corpus of school children’s language (lexical level) has been collected and shown in the First School Dictionary of the Croatian Language which is at the same time the only e-dictionary with 2,500 explained words with recorded correct pronunciation, 2,000 drawings and 185 cartoons. It was published by the Institute of the Croatian Language and Linguistics in 2009. The corpus of school children’s language also includes the textbook language whose analysis was made in 2008 (Pavličević-Franić and Gazdić-Alerić, 2010). Within the textbook corpus, the words which most often appear in textbooks were counted and then sorted out into four categories: polysyllabic, affective, professional terminology and other.

III. Research

3.1. Methodology of the research

The research has been conducted in primary schools of the Republic of Croatia from the second to the sixth grade. The research instruments were written papers at certain topics given to examinees depending on their grade (including the topics from the area of linguistic expression). Out of the large sample of 1,500 written papers of pupils, 100 papers (of approximate length) were selected, 20 written papers from each class. The papers were selected by random sampling. The written papers were analyzed by the content analysis method, while the papers were processed in the SPSS statistics software by the t-test, variance analysis and chi-square test methods.

3.2. Targets of the research

Targets of the research are:

To investigate the syntactic form of written works of the pupils.
To investigate how much the pupils deviate from the grammatical and orthographic standards in their written works.
To investigate whether there is a statistically significant difference in the results as it regards the age, sex and final grade of the pupil.

3.3. Hypotheses of the research

Hypotheses of the research are:

H1. – In their writing, the pupils mostly use simple sentences, and if they use multiple sentences, most of them are compound.

H2. – The pupils show the largest deviation in knowledge of the orthographic standard: writing of the sound č and ć, writing of the reflex of the proto-Slavic yat and knowledge of rules on the capital and small letter.

H3. – There is no statistically significant difference in the results considering the age.

3.4. Results of the research

It is interesting that in the 2nd and the 4th grade one sentence per essay can be found, while in the 3rd and the 6th grade up to ten sentences can be found. On the average, the essays consist of three to five sentences.The largest number of sentences can be found in the 3rd and the 5th grade which is probably connected with the topic about which the pupils of these grades wrote and which inspired them the most. The least number of sentences has been noticed in the 2nd grade, statistically significantly smaller than in other grades, which has been expected since the pupils have just started to write essays and therefore their essays are less comprehensive and mostly contain simple sentences.

Graph 1Average of sentences per essay

The essays mostly consist of simple sentences. However, a certain number of multiple sentences can be found, mostly compound sentences connected by conjunctions: and, or, but and so. In the majority of the essays there is only one multiple sentence, but in the 3rd and 5th grade there are even up to four multiple sentences per essay. In some grades, there are two to three multiple sentences per essay. There is no statistically significant difference in use of multiple sentences as it concerns the age.

Graph 2Number of multimple sentences per essay

Orthographic errors occur the least frequently in the 2nd grade and mostly in the 3rd and the 6th grade, probably because in these grades the largest number of sentences occurs, therefore the number of errors is also the largest. Regarding the orthographic errors, most of them refer to orthographic errors related to writing of punctuation marks, especially commas, exclamation marks and ellipsis. Apart from punctuation errors, a large number of orthographic errors refers to writing of the capital and small first letter and writing of the sounds č and ć as well as the reflex of the proto-Slavic sound Yat.