Intelligent Multimedia Assignment 2

CSLU-Based Driving Theory Test System

Abstract

This report details the development process of a CSLU Toolkit-based Driving Theory Test System (DriveTheory). The aim of the project was to design and implement an interactive spoken language system. The decision was made to develop a driving theory test application in order to demonstrate the benefits of spoken language applications, and the usefulness of such programs in every day life.

The system that has been developed allows users to attempt a driving theory test. It makes use of a number of features of the CSLU Toolkit. The system deals with general questions relating to the rules of the road. The user attempts a number of questions and at the end receives the number of correct and incorrect answers given.

Contents

Chapter 1Introduction…………..………………………………….…6

1.1Introduction………………………………………………….6

1.2Aims of the Project………………………………………….6

1.3System Overview…………………………...……………….7

Chapter 2Background…………………………………………………8

2.1Introduction……..………………………………...………….8

2.2Background to DriveTheory……………………….…………8

2.3Literature Examined……………………………..……….…..9

Chapter 3Analysis……………………………………………………...10

3.1Introduction…………………………………………………..10

3.2Analysis of Similar Systems…………………………………10

3.3Functional Requirements…………………………………….11

3.4Non-Functional Requirements………………………….……11

3.5Software Analysis……………………………………………12

3.6Hardware Analysis……………………………………...……12

Chapter 4Design…………………………………………………….…13

4.1Introduction……………………………………………….…13

4.2Application Architecture……………………………….……13

4.3Storyboard of the System………………………………....….14

4.4HCI Guidelines………………………………………………15

4.4.1Consistency……………………………………….….16

4.4.2Compatibility with user expectations…………….…..16

4.4.3Flexibility and control…………………………….….16

4.4.4Continuous and informative feedback………….……17

4.4.5Error prevention and correction…………………...…17

4.4.6Visual Clarity………………………………………...17

4.4.7Relevance of Information…………………………....18

4.4.8Human Processing Capacity…………………………18

4.4.9User Attitude and Anxiety…………………………..18

4.4.10A Poor Interface Can Cause…………………………18

Chapter 5Implementation………………………………………..…..19

5.1Introduction……………………………………………..….19

5.2Text-To-Speech…………………………………...………..19

5.3Speech Recognition……………………………….………..19

5.4Response Feature………………………………….………..20

5.5Media Feature……………………………………..………..20

5.6Listbuilder…………………………………………………..21

5.7Use of Variables……………………………………...…….21

5.8Testing Overview……………………………………..……22

5.9Choice of Testing Techniques…………………………...…22

5.10Static Verification………………………………………..…22

5.11Dynamic Verification…………………………………..…..22

5.12Debugging…………………………………………….……23

5.13Use of Testing Techniques…………………………………23

5.14Use of Static Verification…………………………………..23

5.15Use of Dynamic Verification………………………………24

Chapter 6Conclusions………………………………………….……26

6.1Summary……………………………………………….…..26

6.2Critical Analysis…………………………………………...26

6.3Future Work………………………………………………..27

References…………………………………………………………….….28

Appendix ASample Transcript 1 of System Execution…………..….…29

Appendix BSample Transcript 2 of System Execution…………….…..33

Table of Figures

Figure 4.1Application User Use Case Diagram………………..…….13

Figure 4.2Driving Test System Use Case Diagram……………....….14

Figure 4.3Storyboard of the system……………………………….…15

Figure 5.1RAD Prompt………………………………………………19

Figure 5.2Enter words to be recognised……………………………...20

Figure 5.3Response Feature………………………………………….20

Figure 5.4Media Feature……………………………………………..20

Figure 5.5Listbuilder and special RAD Prompt………………..…….21

Figure 5.6Example of Tcl Code…………………………………..….21

Figure 5.7Static Verification in early stage of development……..….24

Chapter 1Introduction

1.1Introduction

Through the use of dialogue systems, it is possible to communicate with computers using spoken language interaction. This has been made possible in recent years due to our increased knowledge and a number of technological developments in speech science. This involves adapting computers so that they can ‘understand’ human speech. It is a complex process and involves the integration of the various components of spoken language technology, including speech recognition, dialogue modeling, natural language processing, and speech synthesis. The CSLU Toolkit was developed in order to provide help with this process, to support basic research, development and education activities related to spoken language systems and human computer interfaces. When developing a dialogue system there is a range of strategies to choose from for each different stage of the system. In order to choose the correct strategies the following must be considered.

  • The inherent properties and limitations of the strategy.
  • How these properties relate to the actual communication that will take place, between human and machine.
1.2Aims of the Project

The aim of this project is to develop and implement an interactive speech-based application, for the purpose of providing a driving theory test. Its target user is anyone preparing for their test. The system is an example of the power of speech-based systems and the usefulness of such applications in a real world environment.

The objective is to develop an effective and efficient communication system between the computer and the user. The interface will be developed using the CSLU Toolkit. The interface should be user-friendly and easy to understand. Questions should be clearly audible and should be stated in a simple, direct manner. The system should offer easy to follow directions to the user at all times.

1.3System Overview

The user will be able to register their name with the system, which allows a personalised interaction. They will then have the choice to proceed to the test or listen to the instructions. All interaction is conducted through speech, the user listens to what the system says through the speakers, and replies appropriately through the microphone. The user can choose to answer ten, twenty, thirty, or forty questions. The questions are asked in a random order each time. After they have completed their chosen question set, they will be given their test result. They may then attempt the questions again or else exit the system. At any time during the test, the user can return to the help menu or close the application if they wish. Certain images are displayed depending on the current context, and these help to reinforce the overall structure of the system. The captioning feature of the CSLU Toolkit is utilised so that the user can read each question as well as listen to it being read out. All speech is accompanied by an animated face, through the use of the CUAnimate feature of the Toolkit.

Chapter 2Background

2.1Introduction

Speech can be an efficient, natural and easy way for communicating between humans and computers. The Center for Spoken Language Understanding (CSLU), has developed a multi-purpose toolkit that is designed to enable research into the development of interactive language technology. This toolkit includea a development tool, Rapid Application Development (RAD). RAD is one of the most common methods of speech-based systems development in the world today. The CSLU Toolkit represents an effort to make the core technology and fundamental infrastructure accessible and easy to use in speech recognition development. Speech technology is quickly becoming a reality in many kinds of computerised systems. Among some of its most useful applications we can find learning and test applications, for example a driving theory test.

2.2Background to DriveTheory

DriveTheory is a speech-driven interactive system. As such, it builds on similar systems which have been developed in the past. Although visual aids are displayed while it is running, interaction is completely voice-based. Therefore, for example, there are no objects that need to be clicked on or images that need to be selected. This decision was made as it encourages the user to focus on what they are hearing, without having to worry about using the mouse to navigate through the system. Undoubtedly, interaction through a combination of speech and mouse is vital in some applications, however for this particular one we felt speech alone most appropriate.

The decision to build a driving theory test application was based on a number of factors. These included the fact that the system would be useful in a real-world context. It builds on some of the papers that have been discussed in class. As the system is speech-driven, the background focuses on papers which have discussed speech interaction in some way.

2.3Literature Examined

The paper “Computational models for integrating linguistic and visual information” (Srihari, 1995), was considered. This discussed the relationship between speech and visual information, and their combined ability to convey information. This aided in the development of DriveTheory, as it allowed us to gain a better understanding of how we should combine the different speech-based sections with the accompanying graphical displays. The images are used within DriveTheory at appropriate junctures in order to reinforce the user’s understanding of the system structure.

An investigation of the paper discussing SmartKom (Wahlster et al, 2001) was carried out. This was interesting because of its use of an interface agent, through which the user interacts with the system. The idea of an agent was suited to DriveTheory, as it offered a viable means of providing a speech-based interface.

Chapter 3Analysis

3.1Introduction

The aim the analysis process is to discuss the requirements of the system. This relates to what the finished piece of software must actually do. The underlying structure of the application will then be based on this. This chapter examines the user requirements of DriveTheory. This is achieved by analysing what a user will typically need from such a system. It looks at user expectations, user experience and the approach taken by similar systems. In order to gauge the requirements accurately the target user type was identified. The system should be suitable for use by non-computer literate users. As it is designed as an aid to people preparing for their driving theory test it would have a broad user base.

3.2Analysis of Similar Systems

A number of similar systems were evaluated during the Analysis stage of development. This was done in order to gain an understanding of users’ past experiences. This helped us to understand the features that DriveTheory would require.

The Driving Standards Agency (DSA) offers an online theory test (DSA, 2004). The questions cover a variety of topics relating to road safety. As in the real test, these tests use a mix of questions drawn from all sections of the question bank to test the user’s knowledge.

There is another online driving theory test provided by ‘2 Pass’, (2 Pass, 2004). It again offers a bank of theory questions. Each question is read by the user, and an answer is selected from a list of possibilities. The system is navigated through a combination of mouse and keyboard.

It was not possible to find any other speech-based driving test systems. However, a number of systems which were keyboard and mouse driven were examined. The results of the analysis of these systems was combined with the experience gained from studying a range of different speech-based systems, in order to come up with a set of requirements for DriveTheory.

3.3Functional Requirements

These are the requirements that relate to the system’s actual functionality. The following have been identified:

  • The system presents the user with a choice of either ten, twenty, thirty, or forty questions that must be answered
  • Questions are asked in a random order
  • Questions must be spoken and, for extra clarity, be displayed in text form
  • The user is presented with four possible answers and must respond with either ‘A’, ‘B’, ‘C’, or ‘D’
  • The user must be informed if they have answered correctly or incorrectly
  • If an incorrect answer has been given, the system must respond with the correct answer
  • After all questions have been answered, the total number of correct and incorrect answers must be announced
  • The user must then have the choice of either trying the test again or exiting the program
  • The user must be able to exit the program at any time when they are answering questions
  • The application instructions must be available to the user at all times

3.4Non-Functional Requirements

These are requirements which relate to important areas in which the application must perform, but that are not directly linked to the system’s functionality.

  • The system must be user-friendly
  • It must appear personalised to each user
  • It should be easy to use
  • It should present all questions and instructions to the user in a simple, clear fashion

3.5Software Analysis

As the proposed application was to be primarily speech-driven, the decision was made to build it using the CSLU Toolkit (CSLU Toolkit, 2004). This is a widely-used development package, which provides a wide range of useful tools and features. It was first started in 1992, and has progressed and developed to meet the needs of its growing user base. It is freely available and is widely used in research projects. Its popularity is increased due to the fact that it is easy to use, while remaining a very powerful system. A number of commercial systems have also been developed using the Toolkit.

The CSLU Toolkit is composed of audio tools, display tools, speech recognition facilities, speech generation (text-to-speech) services, and animated faces (CUAnimate).

At a high-level, the Toolkit is made up of Tutors, Rad, SpeechView, BaldiSync. Below this is the Tcl (Tool Command Language) scripting language, text to speech, and file processing. There is then an interface which manages the communication between the Tcl code and the underlying C programming details. The whole system, at a low-level, runs using the C programming language

DriveTheory was developed using the Rapid Application Developer (RAD) portion of the CSLU Toolkit. This offers a powerful drag-and-drop environment which is easy-to-use and reduces development time. It allows for flexibility through the use of Tcl code. Tcl is a relatively easy to learn and use language. It is necessary in order to handle some of the functionality required for DriveTheory.

3.6Hardware Analysis

The system had to be available to as many people as possible. Therefore we decided to implement it on a typical desktop PC. In order to use the system a standard microphone and speaker are necessary. The system runs on:

Windows NT, Windows 2000, Windows 95/98, Windows XP.

Chapter 4Design

4.1Introduction

This chapter provides an outline of the proposed design of the system. It involves an examination of the overall application architecture, which enables an understanding of how each part of the application works together. This was achieved using Unified Modelling Language (UML) diagrams. In order to communicate the steps involved when a participant runs the program, a storyboard was designed. The Human-Computer Interaction (HCI) guidelines that are to be followed in the development of the system are also discussed.

4.2Application Architecture

The logical application architecture can best be expressed through Use Case diagrams. In the first instance, the view of the system from the user’s perspective is given.

Fig. 4.1 – Application User Use Case Diagram

In Figure 4.2 below, an overview of the system is expressed from the system’s perspective.

Fig. 4.2 – Driving Test System Use Case Diagram

4.3Storyboard of the System

The storyboard presents, in sequential fashion, the steps that occur during a typical execution of the application. This is useful in the design process as it helps to anticipate problems that could occur, allowing them to be solved before the actual implementation.

Fig. 4.3 – Storyboard of the system

4.4HCI Guidelines

“HCI is concerned with the design of computer systems that are safe, efficient, easy and enjoyable to use as well as functional.” (DTI/OU, 1990). These aspects have been taken into account in the design of DriveTheory. The aim of using HCI guidelines is to optimise performance of human and computer together as a system. The approach to using the system will be user-centred. Users should not have to adapt to the interface, the interface should be intuitive and natural for them to learn and to use. It is very important to fully understand the nature of users tasks, in this case to attempt to answer a set of driving theory questions. The user interface of a computer system is the medium through which a user communicates with the computer. The interface has a strong influence on how a user views and understands the functionality of a system.

When developing the interface design we followed a number of steps that include established HCI techniques:

4.4.1Consistency

  1. The internal consistency of the system is maintained throughout to give the user a comfortable feeling. This is reflected in the consistent style of images used and also the style and tone of the voice used.
  2. The external consistency of the system allows any user that has access to a standard PC and the CSLU Toolkit to use the application.

4.4.2Compatibility with user expectations

  1. We used a clear and friendly style for the dialogue so that the user is comfortable with it.
  2. In terms of answering the questions which the systems asks, the same guidelines apply as in the case of a manual system, i.e. a question is read out and four possible answers are given, the user must then say either ‘A’, ‘B’, ‘C’, or ‘D’.
  3. The overall structure of the application is in line with what the user would expect, as it is based on the structure of other such systems.
  4. The system needs to be accessible to people who are not computer-literate. Therefore it is important to keep the application relatively simple.

4.4.3Flexibility and control

  1. The user can choose to answer a set of either 10, 20, 30 or 40 questions.
  2. The user can view the instructions at the beginning, or if they have experience of the system, they can continue to the test section.
  3. At any time while answering questions, the user can choose to view to the instructions or exit the application.
  4. The user can perform the same action by giving different commands, e.g. to begin the test, they can say ‘test’, or ‘begin test’.
  5. When the results are read out at the end, the user has the option of retrying the test, or exiting.

4.4.4Continuous and informative feedback

  1. The user constantly receives feedback in the form of a response from the animated face.
  2. When answering questions, feedback regarding whether the answer was correct/incorrect is given. If a wrong answer has been given, the system will respond with the correct one.
  3. If the system does not understand a user’s response then it will ask the question again.

4.4.5Error prevention and correction

  1. Error prevention is evident in that the amount of actual speech that is required from the user has been deliberately limited. When a question is asked, the only required response is ‘A’, ‘B’, ‘C’, or ‘D’. Compared to a system where a whole sentence response is required, this decreases the scope for error in user input.
  2. It is clearly stated in the instructions that to close the application, the word ‘exit’ must be spoken. This helps prevent the user in that clearly marked exits are provided.

4.4.6Visual Clarity

The speech-based element contains the central functionality of the system. Therefore the visual portion is made up a series of screens which reinforce the structure of the application and help the user track their place in the system.

  1. The visual elements provide a logical sequence to the system.
  2. There is consistent use of specific areas of the screen. The animated face appears in the top-left corner, while the images appear in the centre of the screen.
  3. A title screen is included, which is displayed after the user has entered their name.

4.4.7Relevance of Information