European Union Training Application (EU Tap)
The GCSE training program for geography students interested in the European Union
By Rosaleen Hegarty,(ID: 99606887)
Sonya Conlon,(ID: 41615103)
Elijah Blyth (ID: 41754903)
Abstract: The field of Intelligent Multimedia integrates the use of multimodal input in aiding human computer interaction and addressing correspondence. Educational systems are designed to help with learning, making the process more effective and fun. Traditional approaches based on computing stemmed from the use of “Text book style learning” where the information was given to the user on screen instead of book format. Later attempts included simple pictorial information and sound. This report outlines the design and implementation of CSLU toolkit based European Union training and testing program with GCSE students in mind, that takes as its inputs and outputs both Gestural (point and click), Image based, textual (through Natural Language Processing) and verbal modalities. The aim of the report is to design and implement a system that helps make the learning process both interactive and fun.
Key words: Natural Language Processing, E-Learning, multimedia, Natural Language Generation, Fun
Table of Contents
1. INTRODUCTION
1.1Introduction
1.2 Aims and Objectives
2Background
2.1Introduction
2.2 What is Intelligent Multimedia?
2.3Why Multimodal Systems?
2.4 The CSLU Toolkit
2.4.1Tool Command Language (TCL)
2.4.2Speech Recognition, Generation, and Facial Animation
2.4.3Rapid Application Development Environment
2.5E-Learning
2.5.1Benefits of E-Learning
2.6Related Papers
3. ANALYSIS
3.1Introduction
3.2Analysis of similar systems
3.3System Requirements
2.3.1Non-Functional
2.3.2Functional
3.4User Requirements
3.4.1Non-Functional
3.4.2Functional
3.5Hardware Requirements
3.6Software Requirements
4.DESIGN
4.1Introduction
4.2Application Architecture
4.3Human Computer Interaction Guidelines
4.3.1Consistency
4.3.2Compatibility with User Expectations
4.3.3Flexibility and Control
4.3.4Error Prevention and Correction
4.3.5Continuous and Informative Feedback
4.3.6Visual Clarity
4.3.7Relevance of Information
4.4Unified Modelling Language
5.IMPLEMENTATION
5.1 Introduction
5.2European Union EU Tap Demo
5.3 Implementation Stage
6.TESTING
6.1Introduction
6.2White, Black and Grey Box Testing
6.2.1White Box Testing
6.2.2Black Box Testing
6.2.3Grey Box Testing
6.3Format
6.4Forms
6.5Debugging
6.6Conclusion
7.CONCLUSION
7.1Introduction
7.2Results
7.3Critical Analysis
7.4Future Developments
8.REFERANCES
9.APPENDICES
Transcripts of Test Runs of System
Bug Tracking Form for the EU Tap System
Program Scripts
Table of Figures
Page 9 - Figure 1: Computational Model for integrating linguistic and pictorial information
Page 11 - Figure 2: The CSLU Toolkit Overview
Page 12 - Figure 3: Example of the CSLU RAD interface
Figure 4: Bug Tracking System or BTS used in this project
Figure 5: Use Cases are used to develop the static and dynamic object
models
Chapter 1
Introduction
“Until comparatively recently, Artificial Intelligence was seen as the concern of only the most advanced researchers in computer science. Now, largely due to the falling costs of computers and silicon chips, the ‘Intelligence’ of information technology equipment is continually being increased in practical terms.”
(Aleksander, I. 1984)
1.1 Introduction
Many attempts have been made to create and develop electronic educational training programs that are both natural and easy to use. In the past these attempts have ranged from simple textual systems giving the information out, then the lecturer handing out written or typed exams on the subject (this will be referred to as the traditional approach). Further attempts incorporated the use of point and click systems of informational recovery, and pictorial references, while still incorporating the use of text as the main information display technique, Microsoft’s Encarta Encyclopaedia while not directly an educational tool, is one such example. There have been some educational tools developed for computer systems that teach a language or range of languages that have used audio output extensively; however none to date have incorporated language input. Intelligent Multimedia gives the programmers of these systems a new approach to improve the usability, and also possibly the effectiveness of the system, through the incorporation of more than one or two modalities in these “E-Learning Systems”. It is understood now that in order for learning to progress in an efficient manner in the classroom, the students should be shown not only textual information, but also pictorial, sound, and even movement. For this reason this paper shall attempt to show how these important modalities can be incorporated, more easily than in a classroom, into the European Union E-Learning system outlined, and thus improve not only the user experience but also the effectiveness of the system and learning.
The type of Diexis being used in this paper is known as Demonstratio Ad Oculus, this is due to the fact that the objects on display are visually observable (i.e., already been introduced) and the user and the system share a common visual field (i.e., the map of Europe). Therefore complex visual systems and overly long audio options are avoided, due to the commonalities shared by the user and the system.
This report shall contain Background information regarding E-Learning Systems, Intelligent Multimedia, and previously related papers on the subject. The Background section shall also cover an introduction to the CSLU Toolkit, and its components. The Analysis chapter shall address the user and system requirements, both functional and non-functional, and also the hardware and software requirements that are necessary for a user to ensure optimum efficiency and low error count within the system. The design chapter shall introduce the Application Architecture, and the Human Computer Guidelines that affect the design of the system, with any other factors that were taken into account during the design of the system. The Implementation and Testing chapters shall cover how the CSLU toolkit was utilised to satisfy the aims and objectives of the system based on the design, and also how this system was tested, debugged and what methodologies were used when testing to ensure a satisfactory level of testing was carried out. These sections shall also include a basic description of the different methodologies of testing that are available for use, the format of the testing, and the forms used to track errors, bugs, and other issues (the Bug Tracking System used). The final section will cover our conclusion and critical analysis, and finally the future changes and additions to the project.
1.2 Aims and Objectives
“Multimodal speech and gesture—interfaces are promising alternatives to desktop input devices and WIMP (windows, icons, menus, pointer) metaphors. They provide a wide spectrum of appropriate interaction ranging from gesture-based direct manipulation to distant multimodal instruction or even discourse based communication with artificial humans”
(Marc Erich Latoschik 2005)
The aim of this system is to include not only the traditional techniques of visual outputs (Text and Pictorial references) but also include the audio outputs used in the language training tools. This gives the student a personal feel for the information being imparted. Another aspect of this system will be the ability for the student to learn at their own pace, choosing when to repeat information, when to continue to the next section, and a “Pick and Choose” methodology where by the student can choose to only hear information appropriate to what they require (for example Geographical, Economic, Etc.). The final aspect of the system will be the optional “Test” section, where the student is asked if they wish to answer a series of questions based on the information they have just been shown/heard, and also give them the opportunity to go back if they find an area that they feel that they don’t know enough about. The systems user interface will be similar in layout to a map system, where by the user will be presented with a map of Europe and asked to select a country, the user shall then be shown a map of the country they have selected, at every point in the program the user will have the option of returning up a level, and also be given a “Quit” word, where they can quit out of the program at any point, this system should also include a “Pause” word where a student can pause if they need a break and pick up exactly where they left off, thereby giving the student full control over their learning process. Traditional approaches also tend to force the student to go at paces that can be either too quick, or too slow for their needs, the aim of this system is to give the power to the student to control the rate of learning.
The main aim of any learning system is to appear intelligent and reactive to the individual, while remaining accessible to the group, in this case students. Therefore, the underlying aim of this paper is to show that the incorporation of the discussed modalities, and the method with which they are incorporated, can give such systems greater advantages over similar systems developed according to the “Traditional Approach” of simply showing the text to the screen. The Ultimate aim of this paper is to reach a system that can not only be used by individuals, but also by groups, or even entire classes, where one or more speakers interact with the system, other users give information, and ask questions. The system will ultimately be able to answer direct questions based on the domain knowledge of the system, as well as ask questions pertinent to the domain.
Chapter 2
Background
“One of the Main reasons why it is so difficult for computers to understand natural language (and indeed visual representations) is that understanding requires many sources of knowledge, including knowledge about the context of the communication, and general ‘Common Sense’ knowledge shared by speaker and hearer”
(Nilson 1998)
2.1 Introduction
In this chapter, topics covered include the background information used in the decision to choose the EU Tap GCSE Training Program. Also covered will be a brief introduction to the field of Intelligent Multimedia, the reasons behind the choice of a multimodal system, the CSLU toolkit, the field of E-Learning, some related research in the field of Intelligent Multimedia, as well as some previous E-Learning systems that use Multimodal Interfaces.
What is being conveyed in this section is the reasoning behind the design of the system outlined in the paper, and the background to the ideas used, as well as an explanation of the category this paper falls into.
2.2 What is Intelligent Multimedia?
“Multimedia systems are those which integrate data from various media (e.g., paper, electronic, audio, video) as well as various modalities (e.g., text, diagrams, photographs) in order to present information more effectively to the user”
Maybury (1993)
The field of Intelligent Multimedia concerns itself with the combining of multiple modalities (Vision, Audio, Text, Gestures etc) in order for a system to exhibit intelligent behaviour (Winograd 1973, Waltz 1981, Srihari 1995). The field also attempts to form methodologies and theories for the consolidating of these modalities in ways that lead not only to better understanding for the user of the systems, but also better understanding between the user and the systems themselves. A typical system that shows the different modules needed for a system that needs both text and image is below (Srihari 1995):
Figure 1 Computational Model for integrating linguistic and pictorial information
One of the main problems Intelligent Multimedia attempts to solve is the correspondence issue, namely the combining of textual, pictorial, and other modalities into one combined meaning. The issue being that Pictorial, Textual, and Gestural information can be ambiguous, and even after combination with other modalities, the overall meaning could be lost without background knowledge bases.
Wittgenstein (1889-1951) describes this problem in terms of “Family Resemblances”, not between family members such as sisters and brothers, but the resemblances between the knowledge bases of individuals in similar situations or professions. The most famous example is that of the Builders, when Builder A says “Go and fetch me a slab”, Builder B knows exactly the type and colour of slab that builder A is asking for, since the builders have been working on the same building site, the second has also seen the first working with a certain type of slab, and doesn’t need to ask where to get the slab either. However, should a Mortician come on to the building site, and Builder A asks the same question, the Mortician would have a different family resemblance to the builder and would have to ask questions like “What kind of slab, a slab of toffee, a Concrete slab, or a mortician’s slab, or some other form of slab?” the mortician would also have to ask “What type of slab, a round white one, a square blue slab, a brown heavy slab, or a light pink slab etc.?” The mortician would also have to ask where to get the slab from, since the mortician wouldn’t have the background knowledge to know that the slabs are all kept in one place. The task of Intelligent Multimedia is to translate the family resemblances of the user and the system being developed in such a way as to allow the natural and comfortable usage of the system.
2.3 Why Multimodal Systems?
“Current computing systems do not support human work effectively. They restrict human-computer interaction to one mode at a time and are designed with an assumption that use will be by individuals (rather than groups), directing (rather than interacting with) the system. To support the ways in which humans work and interact, a new paradigm for computing is required that is multimodal, rather than unimodal, collaborative, rather than personal, and dialogue-enabled, rather than unidirectional.”
(MacEchren et al. 2004)
The above quote describe the problem that is attempted by this paper also, where as this paper shall concern itself with E-Learning as opposed to Collaborative Geoinformation Access, however the problems outlined are similar in both. Traditional systems for E-Learning concerned themselves with a design for an individual, rather than a group, and the use of single modalities for the information.
In class rooms however it has become accepted that simply giving a student a text book, or simply reading the information out wont teach the student in the best way, in fact the use of several modalities have been introduced into classrooms over the last 50 or so years. Therefore, why should computer systems only use one or two modalities? This is one of the main reasons behind this system, the idea that several modalities can be used to better portray the information to the student. Also, class rooms rarely consist of a single student and a lecturer, they often have several students all talking and interacting with the lecturer (groups personal experience), therefore this system attempts to allow several students interact with the system, and encourages students to talk with one another about the subject area, since often the best people to explain information to a student is another student. For this reason at key points within the program there are “Discussion Points”, where a student or teacher can turn to another student and get their take on the information being presented.
A multimodal system gives students a range of inputs, and since human learning and memory works through association, these multimodal systems should facilitate learning far more successfully than a system that relies on a single modality i.e. text, or audio etc. (Sowa 1991).
2.4 The CSLU Toolkit
“The CSLU Toolkit was created to provide the basic framework and tools for people to build, investigate and use interactive language systems. These systems incorporate leading-edge speech recognition, natural language understanding, speech synthesis and facial animation technologies. The toolkit provides a comprehensive, powerful and flexible environment for building interactive language systems that use these technologies and for conducting research to improve them”
( 2006)
Figure 2 The CSLU Toolkit Overview
The CSLU Toolkit is an open source project undertaken by the Oregon Graduate Institute of Science and technology that encompasses several technologies to present a package that allows the Rapid Development of applications capable of voice recognition, voice generation, multimodal input and output, the package also includes an open source easy to use programming language (TCL). The package enables the user to create and use fully working applications built within the package, and enables the fast prototyping of such applications at a presentation level. In the following sections the main components of the toolkit are described, with brief explanations as to their use in this project.
The CSLU toolkit also uses a RAD or Rapid Application Development style of prototyping and even Full Application creation. This is done by the use of Icons representing the different modules being placed on a screen like so:
Figure 3 Example of the CSLU RAD interface
2.4.1 Tool Command Language (TCL)
“TCL (Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more. Open source and business-friendly, TCL is a mature yet evolving language that is truly cross platform, easily deployed and highly extensible“
( 2006)
TCL is the command line programming language used in the CSLU toolkit, this language an open source programming language similar in design to the Pro-log based “Pop-11” (partially designed by Aaron Sloman of Birmingham Universities AI department 1996) and offers much of the same functionality (although not as wide ranging), and being the main command line language of the CSLU toolkit, interfaces with the CSLU allowing far more flexibility in the design of software on the toolkit. The TCL and associated TK languages are designed for use with wide ranging applications such as Web and desktop applications, network programming, general purpose programming, system administration and a wide range of other such applications. The TCL language is specifically designed for RAD development, since the language is very user friendly, and similar to human language in structure (an example would be an If – Else – Then statement that is just that).