VoiceXML Review - Feature Articles

Volume 1, Issue 8 - August/September 2001

Integrating VoiceXML and an Application Server: A Case Study

By Lionel Lavallee

As part of its overall device support strategy, Hewlett-Packard's middleware division, HP Bluestone is committed to supporting voice-related content. It is also becoming increasingly apparent that the standard for expressing this content will be VoiceXML. HP Bluestone offers a state of the art J2EE platform in its Application Server and Core Services Framework.

With these two facts in mind, HP Bluestone decided to implement reference applications and provide "Trail Maps" showing integration between HP Bluestone's Universal Business Server, upcoming Core Services Framework and VoiceXML rendering systems. This project was accomplished utilizing a VoiceXML Gateway hosted by Voxeo (see Footnote 1).

In this article, we will give background as to why HP Bluestone chose this strategy, and provide the reader with all the information necessary to download and explore the functionality of these technologies.

Voice as the "Killer App"

In the United States, market penetration of small hand-held devices, and especially phone-based Internet connectivity, has lagged behind both the Pacific Rim and European market sectors. This is due in part to the universal acceptance and ubiquity of personal computers in the U.S., and to differences in usage of handheld devices.

However, just as VisiCalc transitioned the personal computer from "geek toy" into the mainstream, hands-free voice browsing will be the force that significantly moves handsets into the mainstream in the U.S. In short, voice is the next "killer app".

The Gateway

Integrators are faced with two choices when voice enabling an existing internet application or creating one from the ground up:

Hosted Voice Gateways: Connectivity to data sources and legacy systems is achieved via a voice gateway that is hosted remotely. Hosted solutions provide hardware and software for call management and speech processing, and includes the necessary telephone lines from telcos. Underlying technologies integrated into the gateway include Automated Speech Recognition (ASR), Text-to-Speech (TTS), and a VoiceXML browser and interpreter. The gateway handles call initiation, session management and VoiceXML rendering. The underlying voice technology is integrated by the provider, so potential users can focus on the underlying software strategies.
Locally Hosted Gateway: In this instance, the telephony equipment and software are maintained by the integrator. This solution offers the greatest flexibility. Most of the better gateway platforms offer "plug and play" ASR and TTS engines. Customers are free to utilize best of breed components as they see fit. Users should be aware that they will be at the mercy of telcos for phone trunk lines to the hosting site.

The difference between these solutions is one of scale. Hosted gateways relieve the implementer of the up front cost, knowledge and onus of maintaining telephony front-end hardware and software. These components can be quite costly and the requisite knowledge is by no means trivial. Hosting a gateway locally, while not for the feint of heart (see Footnote 2), allows the most flexibility and control. As an entry level solution, hosted gateways provide a relatively quick path to integration. In any case, the integration strategy remains the same; the gateway generates an HTTP request through the TCP/IP protocols, which is handled by the application server. The application server is responsible for accessing the data source tier and serving dynamic VoiceXML.

Practical Application

To demonstrate the integration of VoiceXML with an application server in a straightforward manner, we implemented a simple Airline Directory Listings application. The gateway initiates the session when an incoming call is answered. If it hasn't already done so, it gathers and caches all the dialogs through an HTTP fetch (see Footnote 3). When the submit is invoked an HTTP request is made to the application server, which serves up the content. The architectural flow is diagrammed below:

Figure 1 - Architecture

The VoiceXML flow is as follows:

The user is welcomed to the application
The user is prompted to select from one of five airlines, or say "next" or "exit" if he wishes to terminate the application.
The user's utterance is collected and compared with an inline static grammar.
If "next" is selected the final five choices are presented with a "back" option. The user may also choose to "exit" the application
Errors and unrecognized phrases are routed to an error page, which will further instruct the user and return to the last dialog. State is managed by the application.
Successful utterances are routed to a processing page, which requests the 800 number for the selected airline. The gateway generates a HTTP request through the <submit> element, the application server retrieves the number and generates dynamic content which is served back as a response. The gateway then renders the text through its TTS engine.

Below in figure 2 is a diagram of the VoiceXML flow.

Figure 2 - VoiceXML Flow

Continued...

Footnote 1: See http://www.voxeo.com/ (back to text)

Footnote 2: To get an idea of the scale of bandwidth compared to regular IP traffic, picture an internet T1 connection which is capable of servicing an entire building of internet users. That T1 connection is 100% utilized for only 23 concurrent sessions - with 1 channel of signaling (total of 24 channels). (back to text)

Footnote 3: During subsequent sessions, cached information is checked for changes and re-fetched as necessary. (back to text)

back to the top

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).