Integrating
VoiceXML and an Application Server: A Case Study
As
part of its overall device support strategy, Hewlett-Packard's
middleware division, HP Bluestone is committed to supporting
voice-related content. It is also becoming increasingly
apparent that the standard for expressing this content
will be VoiceXML. HP Bluestone offers a state of the
art J2EE platform in its Application Server and Core
Services Framework.
With
these two facts in mind, HP Bluestone decided to implement
reference applications and provide "Trail Maps"
showing integration between HP Bluestone's Universal
Business Server, upcoming Core Services Framework and
VoiceXML rendering systems. This project was accomplished
utilizing a VoiceXML Gateway hosted by Voxeo (see
Footnote 1).
In
this article, we will give background as to why HP Bluestone
chose this strategy, and provide the reader with all
the information necessary to download and explore the
functionality of these technologies.
Voice
as the "Killer App"
In
the United States, market penetration of small hand-held
devices, and especially phone-based Internet connectivity,
has lagged behind both the Pacific Rim and European
market sectors. This is due in part to the universal
acceptance and ubiquity of personal computers in the
U.S., and to differences in usage of handheld devices.
However,
just as VisiCalc transitioned the personal computer
from "geek toy" into the mainstream, hands-free
voice browsing will be the force that significantly
moves handsets into the mainstream in the U.S. In short,
voice is the next "killer app".
The
Gateway
Integrators
are faced with two choices when voice enabling an existing
internet application or creating one from the ground
up:
- Hosted
Voice Gateways: Connectivity to data sources
and legacy systems is achieved via a voice gateway
that is hosted remotely. Hosted solutions provide
hardware and software for call management and speech
processing, and includes the necessary telephone lines
from telcos. Underlying technologies integrated into
the gateway include Automated Speech Recognition (ASR),
Text-to-Speech (TTS), and a VoiceXML browser and interpreter.
The gateway handles call initiation, session management
and VoiceXML rendering. The underlying voice technology
is integrated by the provider, so potential users
can focus on the underlying software strategies.
- Locally
Hosted Gateway:
In this instance, the telephony equipment and software
are maintained by the integrator. This solution offers
the greatest flexibility. Most of the better gateway
platforms offer "plug and play" ASR and
TTS engines. Customers are free to utilize best of
breed components as they see fit. Users should be
aware that they will be at the mercy of telcos for
phone trunk lines to the hosting site.
The
difference between these solutions is one of scale.
Hosted gateways relieve the implementer of the up front
cost, knowledge and onus of maintaining telephony front-end
hardware and software. These components can be quite
costly and the requisite knowledge is by no means trivial.
Hosting a gateway locally, while not for the feint of
heart (see Footnote 2), allows the
most flexibility and control. As an entry level solution,
hosted gateways provide a relatively quick path to integration.
In any case, the integration strategy remains the same;
the gateway generates an HTTP request through the TCP/IP
protocols, which is handled by the application server.
The application server is responsible for accessing
the data source tier and serving dynamic VoiceXML.
Practical
Application
To
demonstrate the integration of VoiceXML with an application
server in a straightforward manner, we implemented a
simple Airline Directory Listings application. The gateway
initiates the session when an incoming call is answered.
If it hasn't already done so, it gathers and caches
all the dialogs through an HTTP fetch (see
Footnote 3). When the submit is invoked an HTTP
request is made to the application server, which serves
up the content. The architectural flow is diagrammed
below:
Figure 1 - Architecture
The
VoiceXML flow is as follows:
- The
user is welcomed to the application
- The
user is prompted to select from one of five airlines,
or say "next" or "exit" if he
wishes to terminate the application.
- The
user's utterance is collected and compared with an
inline static grammar.
- If
"next" is selected the final five choices
are presented with a "back" option. The
user may also choose to "exit" the application
- Errors
and unrecognized phrases are routed to an error page,
which will further instruct the user and return to
the last dialog. State is managed by the application.
- Successful
utterances are routed to a processing page, which
requests the 800 number for the selected airline.
The gateway generates a HTTP request through the <submit>
element, the application server retrieves the number
and generates dynamic content which is served back
as a response. The gateway then renders the text through
its TTS engine.
Below
in figure 2 is a diagram of the VoiceXML flow.
Figure
2 - VoiceXML Flow
Continued...
Footnote
1: See http://www.voxeo.com/
(back to text)
Footnote
2: To get an idea of the scale of bandwidth compared
to regular IP traffic, picture an internet T1 connection
which is capable of servicing an entire building of
internet users. That T1 connection is 100% utilized
for only 23 concurrent sessions - with 1 channel of
signaling (total of 24 channels). (back
to text)
Footnote
3: During subsequent sessions, cached information
is checked for changes and re-fetched as necessary.
(back to text)
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|