|
City
CarShare Reservation System: A VoiceXML Case Study
(Continued
from Part 1)
Speech
Recognition
Automatic
Speech Recognition (ASR) provides a way to capture a
caller's utterances, compare these utterances to acceptable
grammars in a meaningful way, and return an accurate
result. Generally, as the number of allowable utterances
in a grammar grows, complexity increases and recognition
accuracy decreases -- in many cases non-linearly. True
natural language processing is thus not yet practicable.
However, advances in recognition algorithms and lower
infrastructure costs, associated with both servers and
telephony minutes, have driven adoption of speech technology,
and voice recognition has come a long way in the last
few years.
Indigo
egg designed the City CarShare prompts to elicit dissimilar
caller utterances, to reduce or eliminate the need for
disambiguation. In many applications however this is
not possible, and the question arises of how to disambiguate
between rhyming or other homophonous utterances. Some
techniques for improving speech recognition include:
- Grammar
tuning techniques can reduce many types of recognition
errors. For example, cross-wording can fix utterances
that contain words which run together (creating phrases).
Also, adding representative probabilities to confusion
pairs can fix substitution errors. Finally, adding
out of grammar elements can fix false accepts and
correct rejects.
- Using
N-Best lists that return multiple results with associated
confidence levels can also provide more control in
deciding between various interpretations of a captured
utterance.
- Using
multiple interpretation results for disambiguation
can also improve accuracy and the user experience.
Technical
Issues
There
were technical challenges in the City CarShare application
as well as design challenges in creating a smooth and
usable dialog flow. We have all felt the pain of long
downloads on the Internet. In many cases, the latency
associated with large audio files and streaming media
applications make a web application frustrating to use.
These same issues are exacerbated on the telephone because
of the limited feedback mechanisms inherent to audio
interfaces. To ensure that the time the caller is required
to wait for the application is a short as possible,
indigo egg used several techniques. For quick delivery
of audio files, we set caching of all audio to fast,
and enabled prefetching as well. As many decisions as
possible are made in the VoiceXML itself, rather than
calling a server-side routine. For example, embedded
JavaScript is used to translate the return value from
the caller's date utterance into an audio file name,
so the application can repeat the date back to them
for verification. For login, we use a subdialog instead
of switching pages. This allows the dynamic content
to be separated out from the main page, which can then
be cached. The call to the server is no faster, but
a faster return is possible as the application returns
control to the original page rather than loading a new
one. Also, the caller is always informed when a database
lookup is taking place and when it is completed, so
that they are never left hanging on the line.
In
general, these are some strategies to improve application
performance that can be applied across any implementation
of VoiceXML. It should be noted, however, that each
vendor platform has its own idiosyncrasies in terms
of performance tweaking.
- Keep
as many resources as close to the VoiceXML interpreter
as possible - this alleviates the need for fetching
resources across the Internet and risking delays.
- Where
possible, cache resources to prevent network access
delays.
- When
using <submit>, use the GET method rather than
POST if the result can be cached for later use (POST
results generally expire at once).
- Where
possible, avoid extensive page transitions. One large
document usually performs better than several small
documents because of the increased server hits. Also,
transferring between forms in the current document
will most likely be faster than transferring to another
document.
- If
doing computational tasks, use JavaScript functions
instead of sending data and accessing a server to
retrieve a result.
- The
fast access and persistence of application root documents
make them useful for storing variables and preserving
the application's state while transferring between
documents.
Results
The
City CarShare reservation system has been deployed for
only a short time, but so far user feedback has been
excellent. Callers feel very comfortable with the system.
Some user comments were,
- "Great
system! Quick,
straightforward, and it even
lets you screw up (enter in incorrect info) without
much hassle!"
- "It's
frightening how much it's like talking to a real person."
Having
commercially deployed a variety of different voice applications,
including indigo egg's City CarShare application, BeVocal
has had the opportunity to see the positive results
that can be achieved with the VoiceXML standard as it
exists today. While there are many challenges ahead
in continuing to push the standard forward to provide
more sophisticated functionality to developers, VoiceXML
1.0 can solve real-world problems leading to increased
levels of service at lower costs. We hope the techniques
set forth in this article will help you increase the
usability and thus the use of your VoiceXML 1.0 applications.
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|