Structurally
Speaking
Welcome
to First Words, VoiceXML Review's column that teaches
you about VoiceXML and how you can use it. We hope you
enjoy the lesson.
Last
month
we had a quick look at the kinds of things you can do
with VoiceXML. As you may recall, VoiceXML lets you
pull various technologies together to build a speech
application:
- Telephony
dialog control;
- Automatic
Speech Recognition (ASR);
- DTMF
(touch-tone) keypad recognition;
- Text-to-speech
(TTS) playback; and
- Pre-recorded
audio playback.
It's
great to see some code right out of the gate, but let's
not get off on the wrong foot. We're going to take a
look at the fundamental structure of a VoiceXML document,
so you can hopefully avoid many of the first-time problems
associated with writing pages in VoiceXML.
The
behavior of VoiceXML is defined by the VoiceXML
Specification, while the syntax and structure of
VoiceXML is defined by an Extensible Markup Language
(XML) Document Type Definition (DTD). You have to be
a little more careful when building a VoiceXML document
than, say, HTML, as the DTD precisely defines where
and when elements can be used within the document.
The
DTD for VoiceXML was developed by the VoiceXML Specification
to enable authors to define the syntax and structure
of valid VoiceXML documents. One of the benefits is
that almost any reasonable XML parser or editor can
be configured to process VoiceXML markup simply by accessing
the DTD.
The
Structure of VoiceXML Documents
The basic
element of a VoiceXML application is the document.
A document is the equivalent of an HTML page, and encapsulates
one or more dialogs. Execution of a dialog typically
involves presentation of information to the caller,
along with collection of input from that caller. Transition
from one dialog to another is controlled by the currently
executing dialog.
You
can think of a VoiceXML document as a collection of
containers and elements. The DTD for VoiceXML
defines which elements can appear in which containers,
as well as which containers can appear in other containers.
Containers are delineated by start and end tags, such
as <vxml> and </vxml>.
The
<vxml> tag defines the top-level container. The
<vxml> container can hold various elements, including:
- <meta>
- information about the document itself (for example,
the author);
- <var>
- variable declarations;
- <script>
- ECMAScript fragments;
- <property>
- a mechanism to control values that affect platform
or interpreter behavior;
- <catch>
- event handler definitions;
- <link>
- global grammar declarations.
You'll
see some of these elements in almost every VoiceXML
page you write, and these and others provide access
to the powerful features of VoiceXML (which we'll be
exploring in future columns.)
Of
course, these elements wouldn't be very interesting
by themselves. So the <vxml> container can also
contain dialogs: <form> and <menu>.
A form provides the framework for collecting pieces
of information from the caller, while a menu provides
a convenient way of specifying a list of options available
to the user. Menus allow the building of applications
similar to traditional Interactive Voice Response (IVR)
systems in a straightforward manner.
You
may recall our first example from last month:
Example 1: Hello World <?xml version="1.0"?> <vxml version="1.0"> <!--Example 1 for VoiceXML Review --> <form> <block> Hello, World! </block> </form> </vxml>
|
To better see the structure of this VoiceXML document,
let's have a look at a stylized version of this page:
This
gives you a bit of the flavor of how a <block>
fits within a <form> (in this example; the
extra text is a summary of attributes generated
by the tool that produces this view of VoiceXML).
In this case, if we had just placed the text inside
the form (and not in a block), the document would
be improperly structured.
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|