Evaluating learning multimedia

Evaluating learning multimedia

Jaan Mikk & Piret Luik

Paper presented at the European Conference on Educational Research, University of Crete, 22-25 September 2004

Jaan Mikk

Doctor of Education

Professor Emeritus at the Department of Education

University of Tartu

Ülikooli Street 18

50090 Tartu

Estonia, Europe

Tel: +3727 424565

Fax: +3727 375156

E-mail:

Piret Luik

PhD of Education

Lecturer at the Department of Education

University of Tartu

Ülikooli Street 18

50090 Tartu

Estonia, Europe

Tel: +3727 375612

Fax: +3727 375156

E-mail:

Evaluating learning multimedia

Abstract

The paper describes the elaboration of a formula for quick evaluation of the quality of learning multimedia. 54 sixteen years old students from Estonian high schools participated in a quasi-experimental research. All the students studied 35 units of text-based learning multimedia for different subjects: mathematics, chemistry, history, geography, and the Estonian language. The average post-test score of the students was modelled using regression analysis. The predictor variables in the developed efficiency formula were terms in the submenus, symbols and examples in the text. The developed formula can be used to prognosticate the level of acquiring the other units of text-based learning multimedia if the values of the predictor variables will be measured. The application of the formula facilitates the composing of the high quality learning multimedia.

Keywords: The efficiency of learning multimedia, CAL, Secondary education, Readability


Introduction

In Estonia, schools have introduced many computers as well as learning multimedia. However, the computers have not been used to the expected extent across different subjects. One possible reason for this is that the learning multimedia is too difficult for the students to use. The navigation in the program may be too complicated or the content of the subject may be difficult to understand. The suitability of learning multimedia should be evaluated before applying the programs in schools.

Literature Review

In the case of textbooks, an analogous problem is solved by readability formulae. The formulae have been used for more than seventy years to assess the quality of textbooks, newspapers, instructions, etc (Singer, 1988). In the United States, the readability of a textbook is one of the most important characteristics considered in the process of selecting textbooks for students. Rudolph Flesch measured the readability of newspaper manuscripts and rewrote the complicated ones - the readership of his newspaper increased by approximately 50 percent (Chall, 1958, 103-107).

The readability formulae are based on the formal characteristics of the text complicacy. The syntactic complicacy of the text is usually characterised by sentence length. Long sentences are difficult to understand because they may exceed the limits of the reader’s working memory. In addition, long sentences usually contain many complicated syntactic constructions. The semantic complicacy of the text is often characterised by word length. Most long words are seldom used in everyday language; therefore they are unfamiliar to many readers and difficult to understand. Besides the word length, the percentage of terms and the percentage of rare words or abstract words are also used to characterise the complicacy of the content of the text. The text characteristics are dependent on the level of familiarity of the content of the text (Mikk, 2001).

There are more than 100 readability formulae developed all over the world (Klare, 1995; see for reviews Chall, 1958; Klare, 1963; 1984; Mikk, 2000). Most of the formulae are for English texts but in recent years formulae for other languages have also been published (Bamberger & Vanecek, 1984; Henry, 1975; Matskovskii, 1976; Tuldava, 1993; Vanecek, 1995).

One of the most popular readability formulae is the Flesch Reading Ease (RE) formula. The formula is as follows.

RE = 206.835 – 0.846wl – 1.015sl

where: wl — the number of syllables per 100 words,

sl — the mean sentence length in words.

It can be seen that the readability of a text is evaluated by two characteristics: the average word length and sentence length in the text. To apply the formula to a text, the number of syllables per 100 words and the mean sentence length in words should be measured and the numbers put into the formula.

The Reading Ease index varies from 0 — a very complicated text, to 100 — a very comprehensible text. Texts with a Reading Ease index of 75 are optimal for readers with a reading ability level of an average seventh grade pupil; texts with Reading Ease index 55 are optimal for readers with a reading ability of a grade level twelve, etc (Klare, 1988, 21). Usually the application of the readability formula results in a number that indicates the grade level of the reading ability needed to comprehend the text.

The sentence length and the word length are the most often used text characteristics in the formulae. However, other characteristics have also been used: the percentage of the words of the text included in a list of frequent (well-known) words, the percentage of words with abstract suffixes, the percentage of terms in the text and so on. The overall list of text characteristics related to the text readability is a good base for rules of clear writing.

Readability formulae should be used to assess the quality of Computer Assisted Learning (CAL). However, the formulae for texts are not sufficient for learning multimedia because the difficulty of acquiring the content of CAL materials depends not only on the text studied, but also on the navigation in the program, the appearance of the screen, the students' self-control offered by the program and so forth. The characteristics important in evaluating the quality of learning multimedia can be found in many investigations and handbooks on composing educational software (Alessi & Trollip, 2001; Berson, 1996; Caftori, 1994; Crozier, 1999; How to evaluate… 2000; Hughes, 1998; Liao, 1992; Mayer & Gallini, 1990; McCoy, 1996; Phillips, 1997; Van Dusen & Worthen, 1995; Wang & Sleeman, 1993).

Purpose of the Study and Research Hypotheses

The aim of this research was to elaborate a readability formula for text-based learning multimedia.

It was hypothesised that the formula will include some characteristics of the text in the multimedia and some computer-specific characteristics. It was hypothesised that the important text characteristics would be sentence length, word length, word familiarity, and text abstractness. It was also hypothesised that the important computer-specific characteristics would be related to navigation in the program, self-control, illustrations, and attractiveness of the presentation.

Methods

Readability formulae are usually elaborated as follows. More than 30 passages of text are taken from the area for which the formula is to be elaborated. Students study the texts and the level of acquiring the content of the passages is measured. The texts are also analysed in regard to word familiarity, level of abstractness, sentence length, etc. Finally, regression analysis is used to combine the most important text characteristics into a formula. We used the same idea to elaborate the readability formula for text-based learning multimedia.

The subjects were 54 students (21 boys and 33 girls) who studied in the 10th grade (sixteen years old). The experiment was carried out in three schools in Estonia. These schools had contemporary computer-labs and the students had good experience of computer learning. The students were of mixed ability.

There were six text-based learning multimedia programs in Estonian, the content of which was in accordance with the curricula of the 10th grade students. The sample of materials for our experiment was selected from this multimedia. There were 5 units on mathematics, 6 units on chemistry, 12 units on history, 6 units on geography, and 6 units on the Estonian language; a total of 35 units.

To measure the level of the unit content acquired, experienced teachers compiled tests in two forms covering the content of every unit. The teachers were asked to compile criterion-referenced tests, the items of which are in accordance with the content of the unit. An expert checked all the tests paying special attention to the availability of the answers in the study units. Both forms of test were in paper format.

The procedure was composed of two parts: the experimental assessment of the difficulty of the study units and the analysis of the units.

The students independently studied the content of the units with computers when they reached the topics in their curricula. After completing each text-based learning multimedia unit, a post test was given to each student to measure the level of their acquired knowledge. The students were not allowed to use the study unit whilst completing the tests. The time for studying the units and for filling in the tests was unlimited. The students needed up to 45 minutes for studying one unit and they did not study more than one unit a day.

The aim of the experiment was to find out the relative difficulty of the units. However, the students’ post-test score depends not only on the unit's difficulty but on the capabilities of the students as well. To eliminate the influence of students' capabilities, every student studied all 35 units of text-based learning multimedia. This organisation of the experiment enables us to compare the difficulties of different multimedia units.

The influence of teachers was eliminated by not allowing them to explain the material to students. There was an instructor of computer sciences in the computer class, who provided technical help when needed. Students were not permitted to communicate with peers to ensure that the results were dependent on the individual students and the characteristics of the text-based learning multimedia units.

The tests items differed in their level of complicacy. A greater number of points was given for the correct answers to more complicated items. Some tests contained multiple-choice answers. In scoring these items, we used correction for guessing the right answer. As the tests differed in the number of items, we calculated the student's score for every program unit as a percentage.

Concurrently, the text-based learning multimedia units were analysed to determine which characteristics should be included in the readability formula. The values of the characteristics for every unit were mostly found using strictly fixed rules. In some cases expert opinions on the five-point Likert scale and yes-no scale were also used.

The complicacy of the program’s navigation was characterised by the number of navigational possibilities (keys, mouse, buttons and menus) (characteristic No 162). A large number of possibilities may cause difficulties in navigation (Alessi, Trollip, 2001, 173). The number of hyperlinks (No 154) was used for the same reason. We calculated the percentage of terms in the words of sub-menus (No 141), because the terms usually disturb comprehension. We considered as terms the nouns and verbs denoting scientific concepts, which were not used in everyday speech. For example, the term redo reactions in the sub-menu of chemistry program and the verb demobilise in the program of history. Familiar commands and buttons are likely to facilitate understanding (Boling et al., 1998); therefore, we calculated their percentage (characteristics No 149 and 1491). The commands and buttons similar to those used in MS Office and Internet explorer, both taught in Estonian schools, were considered as familiar.

Van Dusen & Worthen (1995) state that animated characters, voice capabilities, and full-colour graphics motivate students. Relying on this assertion, the number of modes of presentation in every unit (No 244) was counted. The utilisation of computer capabilities (No 174) was also assessed by experts. The experts’ opinions were on a scale of –2 to +2. The experts assessed the attractiveness of the computer program as well (No 167).

The text used in the learning multimedia units was characterised thoroughly: the percentage of symbols among the letters (No 204), the percentage of long words (nine or more letters) (No 208), the average sentence length in letter spaces (No 213), the percentage of concepts in the unit (No 210), the mean terminological index (No 211) and the mean abstractness of nouns (No 212). The last characteristics were found according to three-stage scales (Mikk, 2000, 84 - 89). The content of the texts was also characterised by the presentation of examples (No 241). We used the following five-stage scale:

-2 - no example in the program unit,

-1 - some single examples not brought forth in the text,

0 - several examples not brought forth in the text.

1 - many examples and some of them were brought forth,

2 - many examples and all of them were brought forth.

Graphics in the units was characterised by the number of three-dimensional graphic - photos, three-dimensional illustrations (for example, a cube), and diagrams, which had three axes (No 2511).

Alessi & Trollip (2001, 63) and Berry (2000, 46) recommend that the text should not to be squeezed onto half of the screen. When the window of the program does not fill the whole screen, the students' attention is led away with the information on the rest of the screen. Therefore we calculated the percentage of the screen area used for the presentation of content information (excluded the table of contents, commands, menus, etc) (No 203).

An important part of learning programs is self-control. We characterised this by corrective feedback (available or not) (No 337), the announcement of the percentage of correct answers (available or not) (No 338), and questions in self-control (the questions on the studied unit or the questions on the whole topic) (No 300).

For data analysis, we used the average post-test score for every study unit as the dependent variable. The characteristics of study units were used as independent variable. Regression analysis of the data was carried out. We calculated linear and rank correlation coefficients between the variables and descriptive statistics of the characteristics.

Results

We calculated the average post-test score for every unit. The mean of these averages was 49% of the maximum possible number of points for every unit. The units were difficult for independent learning. For successful learning the optimal post-test score is approximately 70-80% of correct answers in a criterion-reference test (Mikk, 2000, 70). The difficulty of the programs may be one of the reasons why the programs are not widely used in schools.

The standard deviation of the average post-test scores for program units was 13% of the points. The varying difficulty of the program units proves satisfactory for developing a readability formula.