VoiceXML Review - Columns

VoiceXML: Where Speech Meets the Web

By Rob Marchand

Welcome to First Words, VoiceXML Review's column that teaches you about VoiceXML and how you can use it. We hope you enjoy this first lesson.

VoiceXML is a technology that brings a number of useful capabilities together into the Web development space. These capabilities can be used to build rich user experiences that allow callers to access information and transaction services through a telephone. VoiceXML ties these capabilities together with a markup language that is an XML derivative.

Some of the capabilities of VoiceXML include the following:

Telephony dialog control;
Automatic Speech Recognition (ASR);
DTMF (touch-tone) keypad recognition;
Text-to-speech (TTS) playback; and
Pre-recorded audio playback.

Although VoiceXML is often demonstrated by providing access to Web-based information content, the most powerful relationship to Web technology is that dynamic content generation technologies such as CGI, ASP, JSP and others can be used to construct and deliver personalized VoiceXML pages from a regular Web or Application server.

This article will focus on static VoiceXML pages in order to demonstrate some of the concepts, but readers should keep in mind that they can also use their favorite server-side technologies to build compelling dynamic applications.

Your First VoiceXML Application

If you develop software, you're probably familiar with the venerable 'Hello World' demonstration. Example 1 is a 'Hello World' program that we shamelessly borrowed from the VoiceXML Specification.

Example 1: Hello World
 
<?xml version="1.0"?>
<vxml version="1.0">
      <!--Example 1 for VoiceXML Review -->
      <form>
            <block>
                  Hello, World!
            </block>
      </form>
</vxml>

Example 1 will answer your telephone call, use text-to-speech to say "Hello world" to you, and then hang up. Not terribly exciting, but there are a few interesting things to note:

The document is obviously well-formed, using opening and closing XML-type tags;
We didn't have to do any extra work to play our message via TTS;

Comments are wrapped with "<!--" and "-->".

If you would rather provide pre-recorded audio to the user, then Example 1 would change to something resembling Example 2:

Example 2: Hello World with Pre-Recorded Audio
 
<?xml version="1.0"?>
<vxml version="1.0">
      <!--Example 2 for VoiceXML Review -->
      <form>
            <block>
                  <audio
src="http://www.voicexml.org/audio/helloworld.wav">
                        Hello, World!
                  </audio>
            </block>
      </form>
</vxml>

In Example 2, the user will hear the pre-recorded message contained in the file referred to by the URI in the tag. The text contained between the paired <audio>

The basic element in a VoiceXML page is a dialog. Dialogs gather information from the user, and take prescribed actions with that information. There are two types of dialogs: forms and menus. The examples above have a single form on the VoiceXML page. Forms are used to provide information (as in the examples found here) and to gather input. Menus allow the user to select from a menu of options.

VoiceXML uses the form-filling, form-submission model that is frequently used on the Web. After gathering input from the user, the information is often submitted to a server-side script or program for further processing, resulting in the generation of the next VoiceXML page in the conversation.

Continued...

back to the top