Volume 1, Issue 2 - Feb. 2001
   
   
 

Structurally Speaking

By Rob Marchand

Welcome to First Words, VoiceXML Review's column that teaches you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Last month we had a quick look at the kinds of things you can do with VoiceXML. As you may recall, VoiceXML lets you pull various technologies together to build a speech application:

  • Telephony dialog control;
  • Automatic Speech Recognition (ASR);
  • DTMF (touch-tone) keypad recognition;
  • Text-to-speech (TTS) playback; and
  • Pre-recorded audio playback.

It's great to see some code right out of the gate, but let's not get off on the wrong foot. We're going to take a look at the fundamental structure of a VoiceXML document, so you can hopefully avoid many of the first-time problems associated with writing pages in VoiceXML.

The behavior of VoiceXML is defined by the VoiceXML Specification, while the syntax and structure of VoiceXML is defined by an Extensible Markup Language (XML) Document Type Definition (DTD). You have to be a little more careful when building a VoiceXML document than, say, HTML, as the DTD precisely defines where and when elements can be used within the document.

The DTD for VoiceXML was developed by the VoiceXML Specification to enable authors to define the syntax and structure of valid VoiceXML documents. One of the benefits is that almost any reasonable XML parser or editor can be configured to process VoiceXML markup simply by accessing the DTD.

The Structure of VoiceXML Documents

The basic element of a VoiceXML application is the document. A document is the equivalent of an HTML page, and encapsulates one or more dialogs. Execution of a dialog typically involves presentation of information to the caller, along with collection of input from that caller. Transition from one dialog to another is controlled by the currently executing dialog.

You can think of a VoiceXML document as a collection of containers and elements. The DTD for VoiceXML defines which elements can appear in which containers, as well as which containers can appear in other containers. Containers are delineated by start and end tags, such as <vxml> and </vxml>.

The <vxml> tag defines the top-level container. The <vxml> container can hold various elements, including:

  • <meta> - information about the document itself (for example, the author);
  • <var> - variable declarations;
  • <script> - ECMAScript fragments;
  • <property> - a mechanism to control values that affect platform or interpreter behavior;
  • <catch> - event handler definitions;
  • <link> - global grammar declarations.

You'll see some of these elements in almost every VoiceXML page you write, and these and others provide access to the powerful features of VoiceXML (which we'll be exploring in future columns.)

Of course, these elements wouldn't be very interesting by themselves. So the <vxml> container can also contain dialogs: <form> and <menu>. A form provides the framework for collecting pieces of information from the caller, while a menu provides a convenient way of specifying a list of options available to the user. Menus allow the building of applications similar to traditional Interactive Voice Response (IVR) systems in a straightforward manner.

You may recall our first example from last month:

Example 1: Hello World
 
<?xml version="1.0"?>
<vxml version="1.0">
      <!--Example 1 for VoiceXML Review -->
      <form>
            <block>
                  Hello, World!
            </block>
      </form>
</vxml>

To better see the structure of this VoiceXML document, let's have a look at a stylized version of this page:

This gives you a bit of the flavor of how a <block> fits within a <form> (in this example; the extra text is a summary of attributes generated by the tool that produces this view of VoiceXML). In this case, if we had just placed the text inside the form (and not in a block), the document would be improperly structured.

back to the top

 

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).