Volume 2, Issue 3- April/May 2002

Some Questions on VoiceXML 2.0

By Matt Oshry

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org and be sure to read future issues of VoiceXML Review for the answer.

Q: While testing my VoiceXML 2.0 application, I noticed there are sometimes large gaps of dead air when I transition from page to page. What can I do about that?

A: You're wise to recognize that gaps of silence are one of a voice application developer's worst enemies. Since audio is the only sensory input a user receives when using a voice application, dead air is to be avoided, and there are several things you can do.

Transitions between VoiceXML documents involve zero or more HTTP requests between the VoiceXML interpreter and the HTTP server that hosts your voice application. Regardless of whether your VoiceXML documents are static or dynamic, you should investigate how your HTTP server facilitates caching.
By specifying a future expires time on your VoiceXML documents, you're granting the interpreter the right to hang onto a copy of the content until that expiration time elapses. Until that time, the interpreter need not
make an HTTP request for the document because expiry data that specifies a future date and time indicates that the interpreter has the latest version. VoiceXML interpreters may further optimize by storing a parsed version of the document in its cache.

You don't need to set the expiration time very far in the future to see immediate performance gains. Realize, however, that setting an expiry on a document limits your ability to make changes to the document and guarantee that a VoiceXML interpreter will pick up those changes, so be careful!

If the document to which you're transitioning pulls data from a backend database and cannot be cached, you should verify that your Web infrastructure is designed to handle the load. This means making sure your HTTP servers have an adequate connection to the Internet. You'd also be wise to provide one or more
additional HTTP servers to handle the load and to handle the possibility of failure.

Some applications will make HTTP requests to a lengthy process such as one that connects to a flight reservation system to search for available flights based upon several criteria such as date, time, origin, and destination. No amount of caching or load balancing is going to alleviate that latency. In that case,
it's time to take advantage of the fetchaudio attribute. The fetchaudio attribute allows you to fill in the gap of silence while a resource is being requested with an audio clip. According to the VoiceXML 2.0 specification, this attribute is supported on the following elements:

  • choice
  • link
  • goto
  • subdialog
  • submit

You set the value of the fetchaudio attribute to the URI of an audio clip to be played by the interpreter while it is fetching the resource bound to the element. For example, let's say you use the submit element to make a request to a CGI that performs a time-consuming process on the backend. The submit
might look like the following:

<submit next="http://www.acme-air.net/agent.cgi"
namelist="origin_city dest_city start_date return_date"

The interpreter submits the four variables in the namelist to agent.cgi while simultaneously playing the audio clip associated with the fetchaudio attribute.

Note that the interpreter does not loop the audio associated with the fetchaudio attribute, so make sure the clip is at least as long as the fetchtimeout value. The default fetchtimeout is platform dependent, but you can
set the attribute explicitly on any of the aforementioned tags. You'll also want to check with the vendor of the VoiceXML interpreter you're using to see if they support streaming media. If not, the interpreter won't start playing the audio associated with the fetchaudio attribute until it has completely fetched the audio file. If you use fetchaudio, you should definitely configure your Web server to allow the interpreter to cache this audio file.


back to the top

Copyright © 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).