Ch1 7 Developing Dialogues Using the CSLU Toolkit

7 Developing dialogues using the CSLU toolkit

RAD (Rapid Application Developer) is a graphically-based authoring environment for designing and implementing spoken dialogue systems. RAD supports directed dialogues using a finite state-based dialogue model. The developer selects various dialogue objects and links them together to make a dialogue graph. These objects perform functions such as speaking a prompt, recognising the user’s input, and executing actions related to the dialogue. RAD also includes animated characters and facilities for presenting media objects such as pictures and sounds. Once the dialogue graph has been constructed, it can be tested using speech input and output.

In order to show how to construct spoken dialogue systems in RAD, a series of linked tutorials will be presented that progressively introduce the main features of the toolkit. In this chapter a speech only system will be developed for accessing student information. Chapter 8 will present a multi-modal educational system for young children that makes use of the animated characters as well as media objects such as pictures and sounds.

This chapter consists of the following tutorials:

Tutorial 1: Basic functions in RAD: the Pizza application.

Tutorial 2: Developing basic functions for a student information system.

Tutorial 3: The default repair sub-dialogue.

Tutorial 4: Sub-dialogues

Tutorial 5: Digit recognition.

Tutorial 6: DTMF input.

Tutorial 7: Alpha-digit recognition

Tutorial 8: Recognition grammars

Tutorial 9: Speech output in RAD

Tutorial 10: Using Tcl in RAD.

Tutorial 11: Improving recognition and creating a dynamic recogniser.

Tutorial 12: Linking a RAD application to a database using TclODBC.

Tutorial 1. Basic functions in RAD: the Pizza application

This tutorial involves running the Pizza example and making some minor changes to the example. The Pizza application, which is provided with the CSLU toolkit, allows you to order a pizza using spoken input. The application prompts you to select a size and topping and also asks you whether you want salad with your order. The application does not connect to a database and the dialogue terminates once your order has been confirmed.

Starting RAD and loading the Pizza application

To start RAD (instructions in bold)

Start -> Speech Toolkit -> RAD

This starts the RAD system displaying a palette of dialogue objects and an empty canvas on which the dialogue graph will be developed, as shown in Figure 7.1.

Figure 7.1: The RAD canvas

Note: For the tutorials in this chapter you should disable the animated face. Click on File -> Preferences, then uncheck the box entitled ‘Animated Face’.

If this is the first time that RAD has been started on your system a dialogue box will pop-up informing you that a directory called ‘.rad’ is about to be created for you. Note the location in which that directory is being created (because the programs and data you create during these tutorials will be stored in sub-directories of that directory by default), and then click OK.

Also, when starting RAD for the first time, it is possible that you will see another small dialogue box asking you to calibrate your microphone. Click OK and follow the instructions in the ensuing dialogue boxes. You may also choose to do this calibration before running an application. To do this, Click onFile -> Preferences -> Audio and then on the Calibrate tab, and then follow the instructions that appear on the screen.

Click on File -> Open Rad Examples -> pizza to load the Pizza application. This will load the Pizza application, part of which is shown in Figure 7.2.

Figure 7.2: The pizza application

The dialogue is modelled as a graph, beginning at the ‘start’ state and proceeding through states entitled ‘size’, ‘topping’ and so on, until the ‘Goodbye’ state is reached. There is also a loop to repeat the dialogue. In this application all the states are represented using the GENERIC icon, shown in Figure 7.3.

Figure 7.3: The generic icon

Compiling and running an application

An application has to be compiled before it can be run.

Click onbuild (either on the tab at the bottom left corner of the canvas, or within the menu item System -> build). If there are any errors in the code, these will be reported at this point. Once the application has been compiled, the tab at the bottom left corner displays the word ‘Run’.

Click Run to run the application.

Note: you will need a headset or speakers to hear the system prompts and a microphone to speak your responses. Wait for the beep before speaking.

Recognition problems

There are a number of reasons why the system may fail to recognise your responses correctly and each of these should be checked in the case of poor recognition:

The microphone is not connected properly. This can be tested using the Windows Sound Recorder application.
The microphone is poor quality. You will need to obtain a microphone that is suitable for speech recognition applications.
You began to speak before the beep, so that the system did not capture some of your response.
There was too great a pause between the beep and the beginning of your response, so that the system may have stopped recording your input before you made your response.

Prompts

The application speaks a series of prompts using the built-in TTS system. To see how the prompts are specified, click on the ‘size’ state. This will open the window shown in Figure 7.4.

Figure 7.4: The prompt dialogue box

This window is used for a number of different functions. For the moment we will only examine the TTS function for prompts. By default the window displays this function on opening. The prompt that will be spoken by the TTS component is shown in the Prompt box.

Exercise 1: Changing the prompt

a) Edit the prompt so that the system will say: “Would you like a small, medium, large or gigantic pizza?”

b) Edit some of the other prompts, including the prompt in the ’Goodbye’ state.

Recognition vocabulary

If a response is required from the user, the recognition vocabulary for that response has to be specified. The ‘greeting’ state does not require a response, but responses are required at several other states, including the ‘size’ state. You can see what responses are required by moving the mouse over the icon shaped as a red arrowhead under each state. This icon is known as the ‘recognition port’. You will notice that there is no vocabulary specified for the ‘greeting’ state, but there is a word for each of the recognition ports associated with the ‘size’ state.

To examine the recognition vocabulary in more detail, click on the leftmost port under the ‘Size’ state. This brings up a window showing the recognition vocabulary for this state, as shown in Figure 7.5.

Figure 7.5: Recognition vocabulary

The words to be recognised at this state are typed into the box entitled ‘Words’. The box entitled Pronunciation gives the pronunciation of the word in using the Worldbet representation. (To see a list of these symbols, click on Help -> Worldbet symbols).

Exercise 2: Adding a pronunciation for a new word

You can add a new word to the vocabulary by entering it on a new line in the Words box (as will be shown below). For the moment, we will create a new port for the new word.

Add a new port by right clicking on the Size state and selecting ‘Add port’.

Double left click on the new port and add the word ‘gigantic’ to the vocabulary in the Words window.

Click on ‘Update All’ to generate the pronunciation, then on OK to close the Vocabulary window.

Exercise 3: Adding and linking a new state

You will need a new state in case the user responds with the word ‘gigantic’.

Move the cursor over the GENERIC recognition icon on the icon palette.

Left click and drag the icon on to the canvas until it is aligned with the state entitled ‘large’.

The new state will have a name with a number, such as ‘state6’. To rename the state to ‘gigantic’ right-click on the new state and select the ‘rename’ option.

Incorporate the state into the dialogue graph: place the cursor over the recognition port for the state ‘gigantic’, hold down the left button to create an arrowed arc and dragging the arc to the state entitled ‘topping’. Release the left button when the arc is within the ‘topping’ state.

You will now have an arc linking the ‘gigantic’ and ‘topping’ states.

Draw a similar arc to link the port entitled ‘gigantic’ on the ‘size’ state to the state entitled ‘gigantic’.

If you wish to delete a state, right-click on the state and select ‘delete’.

Build and run your dialogue, asking for a ‘gigantic’ pizza when prompted for size.

If you want to save your amended dialogue, choose File -> Save, naming your file something other than ‘pizza’, for example ‘mypizza’, so as not to overwrite the tutorial application.

Branching

This application illustrates branching within the dialogue graph after the states ‘size’, ‘topping’, ‘salad’ and ‘confirm’. However, branching only has relevance in this example following the ‘confirm’ state. At this point the system can enter the ‘Goodbye’ state and terminate with a prompt. Alternatively, it can enter the ‘again’ state from which it loops back to the ‘size’ state to begin the dialogue again, but this time without the initial greeting.

Branching after the other states is not required in this application, although this might be required if selection of a particular size or topping were to lead the dialogue down a separate dialogue path. In this simple version the recognition vocabulary for states such as ‘size’ and ‘topping’ could have been listed (with each word on a separate line) within the one recognition port, as shown in Figures 7.6 and 7.7.

Figure 7.6: The ‘size’ dialogue state

Figure 7.7: Recognition vocabulary for the ‘size’ dialogue state

Verification

In this application the system verifies the order after the values for size, topping and salad have been selected. If the user responds ‘no’, then the dialogue loops back and all the values are collected again. This is not necessarily the most efficient way to handle verification, particularly if only one of the values was incorrect. Some variations on handling verification using RAD will be presented later.

Using variables to store recognised values

RAD has a built-in mechanism for storing the values recognised by the system at each state in the dialogue. The value for a state can be referenced using the form $state(recog), where ‘state’ is the name of the state in question. An example of this can be seen if you examine the ‘confirm’ state.

Double left-click on the ‘confirm’ state.

Click on the ‘OnEnter’ tab.

You will see that the prompt consists of a conditional statement (in the scripting language Tcl):

if [string match yes $salad(recog)] {

tts "So you want a $size(recog) $topping(recog) pizza with salad, right?"

} else {

tts "So you want a $size(recog) $topping(recog) pizza, right?"

}

The use of Tcl within RAD will be discussed later. For the present the key point to note is that the words recognised by the system at the size and topping states are represented by the variables $size(recog) and $topping(recog) respectively. Note that what is returned is the word recognised by the system at the state in question. This may not be the same as the word that was actually spoken by the user.

Exercise 4: Adding a confirmation

A simpler way to provide a confirmation, provided it does not depend on some condition as in this example, would be to use the standard Prompt window, with a text such as:

So you want a $size(recog) $topping(recog) pizza?

Insert this prompt into the standard Prompt window and delete the text in the OnEnter window. This will allow the system to confirm using the variables specified in your new prompt, although of course whether or not you choose salad will not be confirmed.

Additional features: captioning and the animated face

You will have noticed that a small caption window displays the systems prompts. You can de-select this feature by clicking on File -> Preferences and clicking on the tick against the captioning feature, as shown in Figure 7.8. Other features can also be selected and de-selected in this way. For example: while the canvas is important for the developer, you may want to have a different visual display (or even no display at all) for the end-user. This can be achieved by de-selecting Canvas. Other selections within the Preferences window will be discussed in later tutorials.

Figure 7.8: Global preference box

Tutorial 2: Developing basic functions for a student information system.

In this tutorial the basic functions illustrated in Tutorial 1 will now be applied in the development of the student information system presented in chapter 6.

Functional description of the application

The Student Information System allows a user to log in using a 4 digit PIN, and ask for information about students or courses, or request reports. The system looks up the requested information and speaks it back to the user. The following is an example of a dialogue that the system will be able to conduct:

1 System:Welcome to the Student System Main Menu.

2 Please say your four digit pin.

3 User:6 5 6 6

4 System:Was that 6 5 6 6?

5 User:Yes

6 System:The system provides details on students, courses and reports.

7For students say students or press 1, for courses say courses or press 2, for reports say reports

8 User: Students

9 System: This is Student Details.

10 Say 'View details' to view existing student details

11 Say 'Add details' to add new student details

12 User: View details

13 System: Viewing student details

14 What is the student id

15 User:96050918

16 System:I have the following details: student John Scott, course code DK003, at stage 1.

17 Would you like any more information?

18 User:Courses

This dialogue includes the following functionalities:

Prompt and response (e.g. lines 2-3), also with verification (lines 4-5)
Digit recognition (lines 3-4, 14-15)
DTMF input (line 7)
Retrieval of information from a database (line 16)
Global navigation (line 18)

These and some additional functionalities (such as dealing with repair, alpha-digit recognition, and customising speech output) will be developed gradually during the course of this chapter.

Exercise 5: Create a dialogue graph

Open RAD with an empty canvas. If you already have an application on the canvas and wish to develop a new application, then select File / New.

Create a dialogue graph as shown in Figure 7.9.

Note: You should regularly save your application. Call it ‘studentsystem1.rad’ and save it in the default location provided by RAD i.e. c:\.rad\saved\

Figure 7.9: studentsystem1.rad

Add prompts and recognition vocabulary as follows:

GENERIC: welcome

TTS:

Welcome to the Student System Main Menu.

The system provides details on students, courses and reports.

To begin say one of the following: students, courses, reports.

Recognition:

Left output port: students

Middle output port: courses

Right output port: reports

GENERIC: students

TTS:

This is Student Details.

Say 'View details' to view existing student details

Say 'Add details' to add new student details

Recognition:

Left output port: view details

Right output port: add details

GENERIC: view_details

TTS:

Viewing student details

Please say the student name

Recognition:

John, David, Rosemary, Jennifer

GENERIC: confirm

TTS: was that $view_details(recog)?

Recognition:

Left output port: yes

Right output port: no

GENERIC: add_details

TTS: Adding student details

Save and test your application.

Tutorial 3: The default repair sub-dialogue

You may have noticed that the system was not always accurate in recognising what you said. Two situations are covered by the default repair sub-dialogue:

The system does not detect any speech
The speech that is detected does not match the recognition vocabulary.

These situations will be tested in the following exercise in order to introduce and explain the default repair sub-dialogue.

Exercise 6: Examining the default repair sub-dialogue

To examine the operation of the default repair sub-dialogue, do the following:

Do not speak following a prompt – in this case, the system should say: ‘Please speak after the tone’ and repeat the prompt.
Say something that is not in the recognition vocabulary for that state – in this case, if what you say matches closely to a word in the recognition vocabulary, then that matched word will be recognised. Otherwise, the system will say ‘Sorry’ and repeat the prompt.

The default repair sub-dialogue can be viewed by clicking View -> Repair default. (see Figure 7.10). If you open the various states and view the prompts, you will see how the behaviours that you have tested were programmed. You will also note that the system makes two attempts to match the input and closes the dialogue if it has been unsuccessful. Note that the sub-dialogue terminates either by reaching a state entitled ‘return: repeat’ or by reaching the ‘Goodbye’ state. In the case of ‘return: repeat’, the dialogue returns to the state from which the repair sub-dialogue was launched and repeats the actions specified in that state, such as the prompt. Later we will see some different ways of returning from a sub-dialogue.