VoiceXML Review - Columns

Welcome to First Words, VoiceXML Review's column that teaches you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Last month we had a quick look at the kinds of things you can do with VoiceXML. As you may recall, VoiceXML lets you pull various technologies together to build a speech application:

It's great to see some code right out of the gate, but let's not get off on the wrong foot. We're going to take a look at the fundamental structure of a VoiceXML document, so you can hopefully avoid many of the first-time problems associated with writing pages in VoiceXML.

The behavior of VoiceXML is defined by the VoiceXML Specification, while the syntax and structure of VoiceXML is defined by an Extensible Markup Language (XML) Document Type Definition (DTD). You have to be a little more careful when building a VoiceXML document than, say, HTML, as the DTD precisely defines where and when elements can be used within the document.

The DTD for VoiceXML was developed by the VoiceXML Specification to enable authors to define the syntax and structure of valid VoiceXML documents. One of the benefits is that almost any reasonable XML parser or editor can be configured to process VoiceXML markup simply by accessing the DTD.

The Structure of VoiceXML Documents

The basic element of a VoiceXML application is the document. A document is the equivalent of an HTML page, and encapsulates one or more dialogs. Execution of a dialog typically involves presentation of information to the caller, along with collection of input from that caller. Transition from one dialog to another is controlled by the currently executing dialog.

You can think of a VoiceXML document as a collection of containers and elements. The DTD for VoiceXML defines which elements can appear in which containers, as well as which containers can appear in other containers. Containers are delineated by start and end tags, such as <vxml> and </vxml>.

The <vxml> tag defines the top-level container. The <vxml> container can hold various elements, including:

You'll see some of these elements in almost every VoiceXML page you write, and these and others provide access to the powerful features of VoiceXML (which we'll be exploring in future columns.)

Of course, these elements wouldn't be very interesting by themselves. So the <vxml> container can also contain dialogs: <form> and <menu>. A form provides the framework for collecting pieces of information from the caller, while a menu provides a convenient way of specifying a list of options available to the user. Menus allow the building of applications similar to traditional Interactive Voice Response (IVR) systems in a straightforward manner.

Example 1: Hello World
 
<?xml version="1.0"?>
<vxml version="1.0">
      <!--Example 1 for VoiceXML Review -->
      <form>
            <block>
                  Hello, World!
            </block>
      </form>
</vxml>

To better see the structure of this VoiceXML document, let's have a look at a stylized version of this page:

This gives you a bit of the flavor of how a <block> fits within a <form> (in this example; the extra text is a summary of attributes generated by the tool that produces this view of VoiceXML). In this case, if we had just placed the text inside the form (and not in a block), the document would be improperly structured.

Structurally Speaking

By Rob Marchand

The Structure of VoiceXML Documents