Welcome
to First Words, VoiceXML Review's column that teaches
you about VoiceXML and how you can use it. We hope you
enjoy this first lesson.
VoiceXML
is a technology that brings a number of useful capabilities
together into the Web development space. These capabilities
can be used to build rich user experiences that allow
callers to access information and transaction services
through a telephone. VoiceXML ties these capabilities
together with a markup language that is an XML derivative.
Some of the capabilities of VoiceXML include the following:
Telephony dialog control;
Automatic
Speech Recognition (ASR);
DTMF
(touch-tone) keypad recognition;
Text-to-speech
(TTS) playback; and
Pre-recorded
audio playback.
Although VoiceXML is often demonstrated by providing
access to Web-based information content, the most powerful
relationship to Web technology is that dynamic content
generation technologies such as CGI, ASP, JSP and others
can be used to construct and deliver personalized VoiceXML
pages from a regular Web or Application server.
This article will focus on static VoiceXML pages in
order to demonstrate some of the concepts, but readers
should keep in mind that they can also use their favorite
server-side technologies to build compelling dynamic
applications.
Your
First VoiceXML Application
If you develop software, you're probably familiar with
the venerable 'Hello World' demonstration. Example 1
is a 'Hello World' program that we shamelessly borrowed
from the VoiceXML
Specification.
Example 1 will answer your telephone call, use text-to-speech
to say "Hello world" to you, and then hang
up. Not terribly exciting, but there are a few interesting
things to note:
The
document is obviously well-formed, using opening
and closing XML-type tags;
We
didn't have to do any extra work to play our message
via TTS;
Comments are wrapped with "<!--" and "-->".
If
you would rather provide pre-recorded audio to the
user, then Example 1 would change to something resembling
Example 2:
In
Example 2, the user will hear the pre-recorded message
contained in the file referred to by the URI in the
tag.
The text contained between the paired <audio>
The basic element in a VoiceXML page is a dialog.
Dialogs gather information from the user, and take
prescribed actions with that information. There are
two types of dialogs: forms and menus.
The examples above have a single form on the VoiceXML
page. Forms are used to provide information (as in
the examples found here) and to gather input. Menus
allow the user to select from a menu of options.
VoiceXML uses the form-filling, form-submission model
that is frequently used on the Web. After gathering
input from the user, the information is often submitted
to a server-side script or program for further processing,
resulting in the generation of the next VoiceXML page
in the conversation.