Ten
Steps to a Commercial-grade VoiceXML Application
(Continued
from Part 1)
5.
Tune grammars. Simply entering a list of
words for the recognizer to discern will yield some
level of speech recognition. However, achieving high
accuracy ASR requires the developer to invest much additional
effort to tune the grammars. Tuning tasks include: (1)
selection of multi-syllabic commands and phrases, and
their synonyms, (2) specifying alternate pronunciations
of each, (3) assuring that the voice commands are are
sufficiently dissimilar and easily discernible by the
recognizer, (4) allowing for extraneous words in the
utterances, such as "um," "please,"
"eh," etc., and (5) adding probabilities for
each word or phrase in the grammar.
6.
Tune ASR parameters. The next step starts
with collecting 10,000 or more utterances from several
hundred individual talkers. Then transcribe the utterances
so that the actual words spoken are known. (*)Pass the
recorded utterances through the ASR (recognizer) in
batch mode, including transcriptions so the recognizer
knows what spoken command was actually issued. Use ASR
tools to further adjust the recognizer parameters. There
are dozens of recognizer parameters that can be adjusted,
for example, "confidence threshold" and "signal
to noise threshold." The goal of this step is to
minimize the false negatives and false positives generated
by the recognizer. Repeat from (*) above until the ASR
accuracy cannot be further increased. This step may
also reveal some problems with the words and phrases
in the grammar. If the grammar is modified in any way,
the developer must start over and collect another 10,000
utterances for tuning the recognizer parameters. When
tuning ASR parameters for the first time, it is wise
to enlist the speech recognition vendor's professional
services.
7.
Create a VoiceXML generator. While it is
useful to write an application the first time in static
VoiceXML--for debugging and usability testing--static
VoiceXML is unlikely to yield as interesting an application.
Making database queries from within static VoiceXML
code yields a more dynamic application, but also has
a number of disadvantages. For these reasons, a program
that generates VoiceXML just-in-time may be the smartest
course. A VoiceXML generator can be modeled after an
HTML-generating middleware package, and some of the
same modules can be repurposed. The Apache webserver
with the appropriate plug-in modules is a good starting
point.
8.
Obtain a VoiceXML interpreter and platform.
High-quality VoiceXML interpreters are available today
from multiple vendors. Writing a VoiceXML interpreter
is a large project and should be undertaken only by
companies with significant resources. When selecting
a VoiceXML interpreter, carefully evaluate the interpreter
and platform's scalability and compliance with the VoiceXML
specification. To "stress test" the VoiceXML
interpreter and platform, VoiceXML application developers
may want to consider renting or buying a Hammer (a test
tool that can inject large numbers of calls into the
voice platform and interact with the application using
test scripts.) Once a VoiceXML interpreter and platform
has been selected, it should be installed and maintained
at a carrier-grade co-location facility. To make the
total package carrier-grade, failover procedures must
exist for the VoiceXML generator, database, and telephony
gateway (VoiceXML Interpreter and platform). Unlike
a web server, a commercial-grade voice application must
be available 24 hours a day, 7 days a week. There are
a number of companies that provide VoiceXML ASP services
for VoiceXML application developers who want to avoid
owning and operating a VoiceXML platform themselves.
The names of these companies can be found at the VoiceXML
Forum's web site (www.voicexml.org).
9.
Ensure that your grammar compiler is adding alternate
pronunciations.
The speech recognition grammar compiler is a powerful
tool and, if performing correctly, will add alternate
pronunciations to the words in the grammar being compiled.
The grammars are sometimes kept in a dynamic grammar
database for convenient access by the recognizer.
10.
Establish a rigorous test suite. Develop
an exhaustive written test suite that exercises all
the major and minor paths through the voice application.
Run this test suite every day, or whenever a code change
is made. Many inexplicable events occur on phone lines
so be sure to allow extra time to fix unexpected problems.
Dedicate several staff members as full-time testers.
Conclusion
Developing
a carrier-grade VoiceXML application is not an undertaking
that should be taken lightly. Many resource-intensive
tasks are required to make the application usable and
accurate. After completing the 10 steps above, the application
is only nearing a maturity point where it may be deployed
for "friendly users." Comments from these
users will instigate further tuning and refining. Be
sure to allow even more time for testing, more
testing, and patching.
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|