|
Ten
Steps to a Commercial-grade VoiceXML Application
The
VoiceXML revolution is just beginning. Only within the
past few months have VoiceXML interpreters become robust,
feature-rich and capable of supporting a carrier-grade
commercial application. Work still remains to further
improve scalability, efficiency and density, and even
to agree on a common interpretation for every tag and
attribute in the specification. This article assumes
that carrier-grade VoiceXML interpreters will continue
to mature and evolve, and focuses instead on the challenges
of developing a commercial-grade VoiceXML application
to run on the interpreter.
Superficially,
it might seem that developing a commercial-grade application
in VoiceXML would save months of development time. However,
developing a commercial-grade VoiceXML application for
the first time requires nearly as much effort as developing
an application hard-coded to a particular speech recognition
API. And, as discussed below, the two processes involve
most of the same steps. Nevertheless, the resulting
VoiceXML application is well worth the effort, possessing
a number of advantages over an equivalent hard-coded
application. VoiceXML applications are portable across
platforms, and somewhat portable across ASR vendors;
they may be interoperable with other VoiceXML applications;
and they benefit from the advantages of a distributed
http-based architecture.
Experience
On the Leading Edge
Indicast
is a premier provider of private-label voice portal
services to the telecommunications, Web, and enterprise
industries. Indicast has amassed the largest database
of professionally produced audio content from leading
brands, such as ABCNEWS.com, The Wall Street Journal,
and Associated Press, covering more than 1,000 topics.
Indicast also offers voice-activated dialing, unified
messaging services, business finder services, driving
directions, and other voice-activated telephone services.
This
comprehensive voice portal content and services suite,
combined with Indicast's innovative "playlist"
user interface design, provides a compelling voice portal
solution available on a private-label basis.
Indicast
decided early in its history to develop in 100% VoiceXML
with no proprietary extensions. The Indicast voice portal
service has been launched in the USA by Centennial Wireless
and is now commercially available, giving credence to
the fact that it is possible today to deploy real voice
applications written in 100% VoiceXML. Along the way,
Indicast has gained valuable expertise in VoiceXML development
and deployment. The following 10 step program is based
on our pioneering experiences in this area, and will
help minimize the risks associated with designing a
carrier-grade system based on an emerging standard like
VoiceXML.
The
10-Step Program
The
tasks required to design, build, and deploy a commercial
grade VoiceXML application are listed below. All of
the steps are challenging, but the two most demanding
are requirements and design, and speech recognition
tuning.
1.
Attend a course on speech recognition. A
number of the speech recognition companies offer high-quality
classes on designing and developing speech-based applications,
building and tuning grammars, and managing speech-based
software projects. For developers lacking ASR experience,
this training will save months of trial and error. Attending
such a course can be an eye opener for those uninitiated
in the subtleties of speech recognition. A list of speech
recognition companies can be found at the VoiceXML Forum's
web site (www.voicexml.org).
2.
Design your application and voice interface.
Probably the most challenging phase of developing
a commercial-grade speech application-- VoiceXML or
any other-- is arriving at a good, usable voice interface
design. Begin by enumerating all of the application
requirements, for the first few versions. Next, hire
a linguist with a background in voice interface design
to create an overall voice interaction philosophy, and
the interface designs for all components of the application.
The voice interaction philosophy should be dictated
by the type of data being accessed. For example, a unified
messaging virtual assistant like Webley should
have a strong persona, while Indicast's voice portal
delivering primarily personalized audio content should
provide a more passive or deferential/reactive voice
interface. This data-driven philosophy results in "content-specific
voice interfaces." Once the voice interface is
designed, the next step is to conduct: user studies,
observations of naïve users, and "Wizard of
Oz" experiments to determine which components are
understandable and which interactions are problematic.
Some of these studies can be done orally or on paper,
while others may require a rough prototype. The results
of these studies should be used to redesign, tune, refine,
and then redesign again. Any effort expended at this
early stage will pay off ten-fold later. If you cannot
find a linguist to help with this step, contact professional
services at one of the ASR companies and ask to purchase
some time with one of their linguists.
3.
Make use of available VoiceXML tools. A number
of VoiceXML development environments exist and can be
found via the VoiceXML Forum's web site (www.voicexml.org).
These development environments are a good way to get
started with static VoiceXML, however will probably
not yield the full-featured dynamic VoiceXML application
specified. Undoubtedly, some features will also require
hand-coding and external calls. Some VoiceXML developer
options are described below.
- VoiceXML
URL registration on a web site.
The URL of prototype static VoiceXML or a VoiceXML
generator can be registered at a web site, and the
developer can call a phone number to interact with
the VoiceXML file. A number of these web sites are
available and may or may not have logging and/or debugging
facilities.
- Web-based
development environments.
These include the features described above, but also
include various capabilities to enable developers
to build more efficiently. These tools include
VoiceXML debuggers, editors, and grammar modules.
- VoiceXML
development environments.
Several full-featured visual VoiceXML development
environments have appeared recently. These work in
much the same way that Visual Java works, with the
exception of lacking a compiler. These visual development
environments include a visual editor, a debugger,
grammar builders, grammar modules, and many other
features. Some of these packages include modules for
dynamically generating VoiceXML.
4.
Tune end points. An utterance is a
single instance of a spoken command for a particular
user, or "talker". End points are parameters
that describe the speech recognition listening window.
Examples of these parameters include the expected length
range of an utterance, expected amount of silence on
either end of the utterance, and other parameters that
describe how the utterance is processed. Tuning the
end point parameters can have a dramatic effect on the
ability of the speech recognizer to determine what voice
command was spoken. By recording a few dozen utterances
from each of a few hundred talkers (and in varying environments
from quiet to noisy), and then physically listening
to and/or transcribing the utterances, a developer can
determine an optimal listening window. Developers may
want to enlist tools and professional services from
their ASR provider to complete this step of the project,
at least the first time.
Continued...
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|