Telecommunications Program, SIS, University of Pittsburgh

Voice Application Development with VoiceXML

Telecommunications Program

Voice Application Development with VoiceXML

Part I: Objective

The goal of this lab is to introduce you to a development of voice applications for mobile wireless devices using Voice extensible Markup Language (VoiceXML). You will learn about an overview of voice applications, VoiceXML system architecture, and basic VoiceXML commands to develop voice applications.

Part II: Equipment List

  1. A personal computer with Internet access
  2. A telephone
  3. A web server
  4. Any text editor of your choice

Part III: Introduction

Voice application for mobile wireless devices refers to an application that has a voice interface for communications between a user and an application. Using a voice interface, a user can interact with an application through audio communications (i.e., speak and hear), rather than through a keypad and a device’s screen as in a visual interface (e.g., web browsing). Examples of voice applications include voice calls from mobile phones to access driving directions, weather forecast, stock quotes, movie and restaurant listing, daily news and voice mail. From a user perspective, voice applications have advantages over visual applications in some situations, such as, when a user is accessing traffic information while driving an automobile, in which a visual interface is simply impractical. Also, a voice application is not affected by limited capabilities of mobile devices, such as small screen and small keypad.

VoiceXML is an XML-based language for creating voice applications. Just as HTML/WML describes a visual interface for a web/wap application, VoiceXML describes a voice interface for a voice application, i.e., audio inputs and outputs. Figure 1 illustrates a comparison between a voice application with VoiceXML and a visual application (e.g., web/wap browsing) with HTML.

<?xml version = ‘1.0’>
<vxml version = ‘2.0’>
<form>
<block>
Have a nice day
</block>
</form>
</vxml> / <html>
<head>
<title>
My page
</title>
</head>
<body>
Have a nice day.
</body>
</html>
VoiceXML / HTML

Voice application Visual application

Figure 1. Comparison between voice application with VoiceXML and visual application with html

The system architecture of VoiceXML applications, shown in Figure 2, is very similar to that of web applications. The VoiceXML contents (i.e., .vxml files and possibly also audio resources) are stored on a web server in the Internet. However, instead of having a browser on a phone device, VoiceXML applications use a browser on a VoiceXML gateway, called a voice browser. The VoiceXML gateway’s voice browser is responsible for interpreting VoiceXML codes into audio outputs, which would be sent over the telephone network to users. The VoiceXML gateway also incorporates several important voice technologies including Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and prerecorded audio playback. Notice that VoiceXML applications require no special software on the client devices.

Figure 2. VoiceXML system architecture.

To use a voice application, a user dials a phone number of an application. This is equivalent to an Internet user enters a URL to access the web page. This call is then received by the VoiceXML gateway, as shown in step (1) in Figure 2. Based on the dialed number, the gateway then issues a HTTP request over the Internet to an appropriate web server, as shown in step (2). The web server then responses back to the VoiceXML gateway with a corresponding VoiceXML document and possibly also some audio files, as shown in step (3). The voice browser on the gateway then interprets the VoiceXML codes and produces an audio output spoken to a user over the telephone network. At any point, a user might be required to provide an input to the application. A sequence of interactions between user and voice application continues until the program ends.

In VoiceXML, there are two forms of audio outputs: Text-To-Speech (TTS) and prerecorded audio files.

  • TTS converts VoiceXML texts to speech in a digital audio format. The resulting synthesized speech sounds robotic and occasionally difficult to understand.
  • A prerecorded audio file can be played back as an audio output to user. Usually, this form of audio output sounds more natural when compared to the TTS.

Also, there are two forms of audio inputs: Automatic Speech Recognition (ASR) and Dual Tone MultiFrequency (DTMF).

  • ASR technology enables an application to recognize input speech from a user. A user simply speaks into the phone, which will be understood and acted upon by the application.
  • DTMF system allows users to enter input to the VoiceXML application by using a telephone keypad. The digits entered by a user are collected and interpreted by the application.

Part IV: Your First VoiceXML Application

In this section, you will create your first VoiceXML application, then run and test it.

1. Hello World Application

The purpose of this application is to say “Hello World” to a caller. The <xml> element in the first line indicates that the document was written in the XML 1.0 syntax. The <vxml> element in the second line specifies what version of vxml we are running. Every VoiceXML document must start with these two elements. An example of a comment in VoiceXML is shown in the third line. The <form> element and <block> element will be explained in Part V.

<?xml version="1.0"?>

<vxml version="2.0">

<!-- This is a comment in VoiceXML -->

<form>

<block>

Hello World

</block>

</form>

</vxml>

1.1 Use any text editor to input the above code and save the document as “helloworld.vxml”.

2. Upload your VoiceXML document to a Web server

Upload your VoiceXML document (helloworld.vxml) to a web server. You can use our school web server (paradox.sis.pitt.edu) or any free web server that supports the .vxml file type. Here, suppose that we use our school web server.

2.1 Create a directory named “public_html” under your home directory.

2.2 Use the FTP program (e.g., SSH Secure File Transfer Client installed on the lab PC) to upload your helloworld.vxml file to the public_html directory that you have just created.

3. Set up a Free Developer Account with Voice Service Provider

There are several voice service providers that offer a developer program through which developers can build and test their voice applications on providers’ platforms (i.e., VoiceXML gateway) for free. Examples of these providers include Tellme Networks ( BeVocal ( VoiceGenie ( and Voxeo ( Here we will use a free service provided by Tellme Networks.

3.1 Go to and click the Join Studio button at the top left of the page to join the Tellme Developer program. Then, fill out the information in the registration form. A PIN and password will be sent to you by email after registration.

3.2 Sign in to Tellme Studio at with your email address and password (Developer ID and PIN will be used to login when you call Tellme Studio, i.e., to test the voice application).

4. Link VoiceXML application’s URL to the phone number

4.1 After signing in, you will see the MyStudio page as shown in Figure 3 (if the current page is showing a scratchpad, you need to click an Application URL tab).

Figure 3. MyStudio page

4.2 Before you can run your VoiceXML application, first you need to link your application’s URL to a telephone number (and your Developer ID) provided by the voice service provider. To do this, enter your VoiceXML application’s URL (e.g., username/helloworld.vxml) as shown in Figure 3, and then click the Update button.

4.3 After clicking the Update button, if a message “Unable to retrieve your URL for syntax checking” is shown, you may have to change an access mode of your file. To do this, at command prompt ($), type chmod 644 filename.vxml. In addition, if a message “Some errors were found in your VoiceXML” is shown, you have to correct your VoiceXML code accordingly and click the Update button again.

Note that you may change your PIN, password, and other settings by clicking an edit my preference box on the MyStudio page.

5. Run your VoiceXML application

5.1 You can run your VoiceXML application by calling the phone number 1-800-555-VXML. If you use the black telephone provided in the wireless lab, you have to first press 9 before dialing the phone number.

5.2After prompted, touchtone your developer ID and PIN. Then, you will hear “Hello World” (For a first-time user, you may be asked to set a phone autologin feature. You may say yes or no as you wish. This setting can be changed later by clicking an edit my preference box on the MyStudio page)

Part V: Basic VoiceXML Commands I

In this section, you will learn some basic VoiceXML elements for developing a simple voice application. All elements and their descriptions used in this part are listed in Table 1. Figure 4 shows a call flow of this application. First, the user places a call to the application. The application prompts the user with all possible choices that can be chosen from, and elicits for a response from the user. If a response from the user is recognizable according to a defined grammar rule in a <grammar> element, a variable “choice” in a <field> element is assigned a corresponding value. For example, if the user said “weather”, the variable “choice” is assigned a value “weather”. Then, some actions in the <filled> element are performed based on this variable’s value and conditional logics. However, if there is no response from the user, the <noinput> element outputs to the user “I didn’t hear you” and then repeats the original prompt. Similarly, if a user’s response is not recognizable, the <nomatch> element outputs to the user “I didn’t quite understand you” and then repeats the original prompt.

Table 1. VoiceXML elements used in Part V

Element Name /
Description
block / A container of (non-interactive) executable code
else / Used in <if> element
field / Declares an input field in a form
A “name” attribute in a field element specifies a variable in a dialog scope that will hold a result.
filled / An action executed when fields are filled
form / A dialog for presenting information and collecting data.
An “id” attribute in a form element specifies the name of the form. If specified, the form can be referenced within the document or from another document.
goto / Go to another item in the same dialog, or another dialog in the same or different document
grammar / Defines the set of valid expressions that a user can say or type when interacting with a voice application
if / Simple conditional logic
noinput / Catches a noinput event
nomatch / Catches a nomatch event
prompt / Outputs a computer-generated speech and prerecorded audio files
reprompt / Sends processing to the original prompt that is trying to elicit a response from the user.

Figure 4. A call flow of voice application in part V

<?xml version="1.0"?>
<vxml version="2.0">
<form id="Choices">
<field name="choice">
<prompt>
Please choose from the followings:
to check the weather, say weather.
to check the stock quotes, say stock.
to get the direction information, say direction.
</prompt>
<grammar>
<![CDATA[
[
[weather]{<choice "weather">}
[stock]{<choice "stock">}
[direction]{<choice "direction">}
]
]]>
</grammar>
<noinput>
I didn't hear you. <reprompt/>
</noinput>
<nomatch>
I didn't quite understand you. <reprompt/>
</nomatch>
<filled>
<if cond="choice=='weather'">
OK let's check the weather.
<elseif cond="choice=='stock'"/>
OK let's check the stock quotes.
<else/>
OK let's get the direction information.
</if>
</filled>
</field>
</form>
</vxml>

1. Use any text editor to input the above code and save the document as “form.vxml”.

  1. Follow the procedure in Part IV to test the application.
  • Upload the form.vxml file to the web server.
  • Link the new application’s URL to the telephone number
  • Run the application by calling the phone number 1-800-555-VXML.

Part VI: Basic VoiceXML Commands II

In part V, the voice application is implemented using only a single dialog (i.e., a form) and a single VoiceXML document (i.e., a .vxml file). Alternatively, the same application can be implemented using multiple form dialogs and/or multiple documents. In this section, you will learn to develop a voice application that consists of multiple dialogs and multiple documents. A VoiceXML element, <goto>, is introduced in this section as a mechanism to navigate among items, dialogs and documents within an application (similar to a hyperlink in html).

Table 2. New VoiceXML element used in Part VI

Element Name /
Description
goto / Go to another item in the same dialog, or go to another dialog in the same or different document.

The <goto> element is used to:

  • Transition to another item in the current form,

e.g., <goto nextitem=“checkweather”/>, to transition to another item named checkweather,

  • Transition to another dialog (i.e., form or menu) in the current document,

e.g., <goto next=“#checkstock”/> to transition to another dialog named checkstock, or

  • Transition to another document (i.e., .vxml file),

e.g., <goto next=“getdirection.vxml”/> ) to transition to another VXML document named getdirection.vxml.

<?xml version="1.0"?>
<vxml version="2.0">
<form id="Choices">
<field name="choice">
<prompt>
Please choose from the followings:
to check the weather, say weather.
to check the stock quotes, say stock.
to get the direction information, say direction.
</prompt>
<grammar>
<![CDATA[
[
[weather]{<choice "weather">}
[stock]{<choice "stock">}
[direction]{<choice "direction">}
]
]]>
</grammar>
<noinput>
I didn't hear you. <reprompt/>
</noinput>
<nomatch>
I didn't quite understand you. <reprompt/>
</nomatch>
<filled>
<if cond="choice=='weather'">
<!--Transitions to another item, named "checkweather", in this form-->
<goto nextitem = "checkweather"/>
<elseif cond="choice=='stock'"/>
<!--Transitions to another form dialog, named "checkstock"-->
<goto next="#checkstock"/>
<else/>
<!--Transitions to another voicexml document, named "getdirection.vxml", in the current directory-->
<goto next="getdirection.vxml"/>
</if>
</filled>
</field>
<block name="checkweather">
OK let's check the weather.
</block>
</form>
<form id="checkstock">
<block>
OK let's check the stock quotes.
</block>
</form>
</vxml>
<?xml version="1.0"?>
<vxml version="2.0">
<form>
<block>
OK let's get the direction information
</block>
</form>
</vxml>

1. Use any text editor to input the above two VoiceXML codes and save the documents as “goto.vxml” and “getdirection.vxml” respectively.

  1. Follow the procedure in Part IV to test the application.
  • Upload the “goto.vxml” file and the “getdirection.vxml” file to the public_html directory on the web server.
  • Link the URL of “goto.vxml” file to the telephone number.
  • Run the application by calling the phone number 1-800-555-VXML.

Part VII: Basic VoiceXML Commands III

In this part, you will build a voice application that produces a similar output as does the application in part V. However, this application will also include the DTMF (i.e., a touchtone) as another input method rather than using the ASR input method alone as in Part V. Moreover, this application will use a menu dialog instead of a form dialog as in Part V (in VoiceXML, there are two types of dialogs used for interacting with the users: form and menu). A menu dialog adds no new capabilities compared to a form dialog. It is simply a convenient shorthand to provide a series of options to the user. The new elements used in this application are listed in Table 3.

In this application, the <menu> element contains a <prompt> element to outline the user’s choices and <choice> elements to specify dtmf grammars and speech grammars and describe program control. The next attribute of <choice> element is also used as a way of navigating to a specific form based on the choice input. For example, if the user responded by pressing a “1” phone key, the application would transition to the “weather” form. Then, some actions in that specific form would be performed. For example, if transitioning to the “weather” form, the application would output to the user “OK let’s check the weather”.

Table 3. New VoiceXML elements used in Part VII

Element Name / Description
<choice> / Defines an alternative in a menu dialog. Many <choice> elements can be enclosed in a <menu> element, but exactly one of which must match.
<menu> / A dialog for choosing amongst alternative destinations [1]
<?xml version="1.0"?>
<vxml version="2.0">
<menu>
<prompt>
To check the weather, press 1 or say weather,
To check the stock quotes, press 2 or say stock,
To get the direction information, press 3 or say direction.
</prompt>
<choice dtmf="1" next= "#weather"> weather </choice>
<choice dtmf="2" next="#stock"> stock </choice>
<choice dtmf="3" next="#direction"> direction </choice>
<noinput>
I didn't hear you. <reprompt/>
</noinput>
<nomatch>
I didn't quite understand you. <reprompt/>
</nomatch>
</menu>
<form id="weather">
<block>
<prompt> OK let's check the weather </prompt>
</block>
</form>
<form id="stock">
<block>
<prompt> OK let's check the stock quotes </prompt>
</block>
</form>
<form id="direction">
<block>
<prompt> OK let's get the direction information </prompt>
</block>
</form>
</vxml>

1. Use any text editor to input the above code and save the document as “menu.vxml”.

  1. Follow the procedure in Part IV to test the application.
  • Upload the menu.vxml file to the web server.
  • Link the new application’s URL to the telephone number
  • Run the application by calling the phone number 1-800-555-VXML.

Part VIII: Pre-recorded Audio File Output Method

As discussed in Part III, TTS and prerecorded audio file are the two methods of outputting information to the user in VoiceXML. All previous applications developed so far in this lab use only TTS. This part will introduce you to a pre-recorded audio file output method.