VoiceXML Review - Feature Articles

Volume 1, Issue 8 - August/September 2001

Object-Oriented VoiceXML

By John Hicks

(Continued from Part 1)

Standard VoiceXML Tags

In the first layer, our Java classes correspond simply to the standard VoiceXML tags. For example, VField and VBlock for <field> … </field> and <block> … </block>, or VProperty and VMeta for <property … /> and <meta … />. We use these simple classes inside the more complex classes defined in the other layers.

VProperty and VMeta, as "unpaired" tags, without other tags inside them, derive from a base class we call VoiceXMLTag. VoiceXMLTag carries functionality that all tags share, whether paired or unpaired.

VField and VBlock, as tags used in pairs, with other tags in between, derive from a base class we call VoiceXMLTagPair. VoiceXMLTagPair derives from VoiceXMLTag, picking up functionality common to all VoiceXML tags, and adds functionality for enclosing other VoiceXML tags inside, recursively.

Finally, in this first layer we specialize the tags for their most commonly reused purposes. From VBlock, for example, we derive VTransferBlock, a block used to transfer the caller to another number. Such a block uses a <transfer> … </transfer> tag in one form or another, so VBlock carries an instance of VTransfer. That VTransfer instance, in turn, can be configured many different ways.

Vendor-Specific Data Types

A second layer of the Java hierarchy uses the standard VoiceXML tags of the first layer to invoke data types provided by vendors such as Nuance Speech Objects and Speechworks Dialog Modules and their third-party users.

Data Types of Our Own Devising

We get the biggest payoff in the third layer of the hierarchy, where we derive our own complex data types. The resulting VoiceXML runs anywhere.

One example is VLayerList: a navigable list or tree of lists, that we useagain and again in turn-key products, whether for lists of Frequently Asked Questions, lists of ingredients and steps for cooking (as first suggested by the blind who still love the kitchen), or lists of nearly any kind.

A VoiceXML Equivalent of a Class or Object

Finally, we wondered, how do you reuse VoiceXML without cut and paste? When you build in VoiceXML, where do you accumulate your team's expertise? Where do you build a library of your own best practices and proven techniques? Your own data types for rapid (drop-in, plug-in) reuse? How do you package your own involved exchanges with the caller, and pull them off the shelf the next time you need them?

How do you reuse an advanced VoiceXML construct as you would a Java or C++ class or object? What could qualify as the VoiceXML equivalent? A reusable VoiceXML part, from which you could derive further parts, in a hierarchy?

One approach is of course, the VoiceXML <subdialog> tag. A subdialog can invoke and enclose (if not encapsulate) others, in something of a hierarchy. But this takes us only so far.

VoiceXMLComposite Classes

At SpeechBrowser we devised another answer we call VoiceXMLComposite. We might have called it VoiceXMLObject or VoiceXMLClass instead. We derive new VoiceXML classes from a base class in Java called VoiceXMLComposite.

For example, our two classes VRetryRecourse and VDBCapture. Most of the VoiceXML we generate (75% or more of the lines) either handles exceptions during the call (confirm, retry, live operator, and other recourses) or provides database access during and after a successful call. VRetryRecourse supplies any given VoiceXML application with the former, VDBCapture with the latter. VDBCapture captures the results of a phone call to a backend database table, whatever the sequence of questions we've asked the caller, and whatever data types we've collected.

How do VoiceXMLComposite classes differ from the classes we discussed earlier, such as VField and VBlock? VoiceXMLComposite classes reuse many of those earlier classes. But classes such as VField and VBlock generate contiguous lines in a VoiceXML application. VoiceXMLComposite classes do not.

You can mark off the first and last tag generated by VField or VBlock inside the VoiceXML application. An instance of VDBCapture or VRetryRecourse generates lines in and for many separate parts of the VoiceXML application it serves, including the root document and the several documents (pages) that share the same root.

Functionality that appears together in a VoiceXMLComposite class, as data elements and methods of that class, will be distributed throughout the generated VoiceXML. Some of its data elements generate <var> tags in the root document or elsewhere. Some of its methods generate JavaScript and <block> and <field> tags across the application.

Subclasses of VoiceXMLComposite deliver their VoiceXML to various parts of the VoiceXML application using a VoiceXMLComposite.asString() method with parameters such as:

protected static final int kForRootDoc = 3;
protected static final int kForRootDocForm = 2;
protected static final int kForDoc = 1;
protected static final int kForDocForm = 0;

Subclasses further extend this list of parameters as well, for delivering their generated VoiceXML to larger and more complex applications. For tracing and debugging, VoiceXMLComposite classes can be asked to mark (with comments) all the lines and tags they generate across a large VoiceXML application.

Using VoiceXMLComposite, in a single place we work on some of the most demanding professional features of any VoiceXML application: its exception handling and its database backend. From that one place, all the applications we generate, and all the many sections in each application, pick up the improvements.

So What's It All For?

At SpeechBrowser we want to bring Automatic Speech Recognition to a mass market. We want to reach offices of any size and budget: agencies, foundations, non-profits, small businesses. We want to reach beyond the trading giants that dominate today. We want development prices to start at $10,000 or less.

We want to reach the blind and vision-impaired and disabled communities. We want to see the day when more of us reach the internet by phone than by keyboard, mouse, or computer screen.

For that we need all the best that the sofware industry has learned in four decades. We need XML and VoiceXML for its universal availability, and we need object-oriented languages. For us, that means VoiceXML generated from hierarchies of reusable parts, with assembly-line efficiency.

back to the top

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).