VoiceXML Review - Feature Articles

Volume 1, Issue 5 - May 2001

Ten Steps to a Commercial-grade VoiceXML Application

By T. Todd Elvins

(Continued from Part 1)

5. Tune grammars. Simply entering a list of words for the recognizer to discern will yield some level of speech recognition. However, achieving high accuracy ASR requires the developer to invest much additional effort to tune the grammars. Tuning tasks include: (1) selection of multi-syllabic commands and phrases, and their synonyms, (2) specifying alternate pronunciations of each, (3) assuring that the voice commands are are sufficiently dissimilar and easily discernible by the recognizer, (4) allowing for extraneous words in the utterances, such as "um," "please," "eh," etc., and (5) adding probabilities for each word or phrase in the grammar.

6. Tune ASR parameters. The next step starts with collecting 10,000 or more utterances from several hundred individual talkers. Then transcribe the utterances so that the actual words spoken are known. (*)Pass the recorded utterances through the ASR (recognizer) in batch mode, including transcriptions so the recognizer knows what spoken command was actually issued. Use ASR tools to further adjust the recognizer parameters. There are dozens of recognizer parameters that can be adjusted, for example, "confidence threshold" and "signal to noise threshold." The goal of this step is to minimize the false negatives and false positives generated by the recognizer. Repeat from (*) above until the ASR accuracy cannot be further increased. This step may also reveal some problems with the words and phrases in the grammar. If the grammar is modified in any way, the developer must start over and collect another 10,000 utterances for tuning the recognizer parameters. When tuning ASR parameters for the first time, it is wise to enlist the speech recognition vendor's professional services.

7. Create a VoiceXML generator. While it is useful to write an application the first time in static VoiceXML--for debugging and usability testing--static VoiceXML is unlikely to yield as interesting an application. Making database queries from within static VoiceXML code yields a more dynamic application, but also has a number of disadvantages. For these reasons, a program that generates VoiceXML just-in-time may be the smartest course. A VoiceXML generator can be modeled after an HTML-generating middleware package, and some of the same modules can be repurposed. The Apache webserver with the appropriate plug-in modules is a good starting point.

8. Obtain a VoiceXML interpreter and platform. High-quality VoiceXML interpreters are available today from multiple vendors. Writing a VoiceXML interpreter is a large project and should be undertaken only by companies with significant resources. When selecting a VoiceXML interpreter, carefully evaluate the interpreter and platform's scalability and compliance with the VoiceXML specification. To "stress test" the VoiceXML interpreter and platform, VoiceXML application developers may want to consider renting or buying a Hammer (a test tool that can inject large numbers of calls into the voice platform and interact with the application using test scripts.) Once a VoiceXML interpreter and platform has been selected, it should be installed and maintained at a carrier-grade co-location facility. To make the total package carrier-grade, failover procedures must exist for the VoiceXML generator, database, and telephony gateway (VoiceXML Interpreter and platform). Unlike a web server, a commercial-grade voice application must be available 24 hours a day, 7 days a week. There are a number of companies that provide VoiceXML ASP services for VoiceXML application developers who want to avoid owning and operating a VoiceXML platform themselves. The names of these companies can be found at the VoiceXML Forum's web site (www.voicexml.org).

9. Ensure that your grammar compiler is adding alternate pronunciations. The speech recognition grammar compiler is a powerful tool and, if performing correctly, will add alternate pronunciations to the words in the grammar being compiled. The grammars are sometimes kept in a dynamic grammar database for convenient access by the recognizer.

10. Establish a rigorous test suite. Develop an exhaustive written test suite that exercises all the major and minor paths through the voice application. Run this test suite every day, or whenever a code change is made. Many inexplicable events occur on phone lines so be sure to allow extra time to fix unexpected problems. Dedicate several staff members as full-time testers.

Conclusion

Developing a carrier-grade VoiceXML application is not an undertaking that should be taken lightly. Many resource-intensive tasks are required to make the application usable and accurate. After completing the 10 steps above, the application is only nearing a maturity point where it may be deployed for "friendly users." Comments from these users will instigate further tuning and refining. Be sure to allow even more time for testing, more testing, and patching.

back to the top

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).