VoiceXML Review - Columns

Volume 1, Issue 10 - November 2001

Answers to Your Questions About VoiceXML

By Jeff Kunins

(Continued from Part 1)

Q: I'm confused about how VoiceXML really interacts with HTTP. How are things like caching and cookies supposed to work?
A: The fundamental premise of VoiceXML is to bring the Internet architecture to the telephone. As a result, VoiceXML applications make extensive use of the HTTP protocol. VoiceXML applications use HTTP to retrieve VoiceXML, grammar, script, and audio documents.

As a result, the specific mechanisms and properties governing how VoiceXML platforms behave when the request and process documents via HTTP are critical for performance, reliability, and robustness.

The three governing principles of how VoiceXML interacts with HTTP are:

"When in doubt, it works just like the Web!" Of course, this is because VoiceXML is the Web.
"Specifically, support and follow all relevant HTTP conventions". Most VoiceXML platforms take this to at least mean HTTP response codes like redirects; leading VoiceXML platforms tend to extend this principle as far as possible to include support for things like cookies, SSL, and completely following the HTTP response headers for directing caching behavior.
"Developers can specify a preference for behavior, but platforms are generally free to bypass these preferences in favor of alternative behavior that is known to better optimize performance without disturbing functionality." Performance and reliability are ultimately what matter most, and since applications can run on multiple platforms/networks it's critical that platforms be free to make appropriate local optimization decisions.

More specifically, VoiceXML platforms follow certain prescribed (and some optional) behaviors for requesting, retrieving, processing, and caching documents via HTTP. In addition, some of these behaviors are programmatically controllable. For example:

Fetching and Initializing New Documents

Several VoiceXML elements (e.g. <link>, <submit>, etc.) specify transitions to a new VoiceXML dialog via a URI. If that URI refers to another dialog in the same document (e.g. "#top"), then a new HTTP fetch is not required and the transition proceeds immediately.
Transitions to another document trigger a new HTTP request. This request can trigger an actual HTTP request to the originating Web server, or can be fulfilled from the platform's internal cache (see "Caching" below).

Regardless of whether the document was cached or not, the newly retrieved document is processed in the following manner:

If specified, the application root document is fetched and initialized.
Any document scope variables are initialized.
Any document scope scripts are executed.
The requested dialog (or the first dialog if none is specified) is initialized and execution of the dialog begins.

Caching Policies

One of the fundamental benefits of the VoiceXML architecture is the ability to cleanly separate where the application lives (the Web server) from where the interpreter/platform lives. In practice, this means that smart and effective caching policies can dramatically impact the performance of commercially deployed VoiceXML applications. This condition is further exacerbated by the fact that VoiceXML tends to reference very large documents such as long audio files and complex grammars.

VoiceXML platforms are required to adhere to the cache correctness rules of HTTP 1.1, as specified in RFC2616 (See http://www.ietf.org/rfc/rfc2616.txt?number=2616). In particular, the "Expires" and "Cache-Control" response headers must be honored. Generally speaking, this means the following:

IF (resource is not in the cache) THEN (fetch it from the server using GET)
ELSE
- IF (maxage is specified) THEN
  - IF (age of cached resource <= maxage) THEN
    - IF (age of cached resource >= Expires header) THEN
      - IF (maxstale is specified) AND ( (age of cached resource - Expires header) <= maxstale ) THEN (use the cached copy)
      - ELSE (fetch from the server using GET)
    - ELSE (use the cached copy)
  - ELSE (fetch from the server using GET)
- ELSEIF (age of cached resource >= Expires header) THEN
  - IF (maxstale is specified) AND ( (age of cached resource - Expires header) <= maxstale ) THEN (use the cached copy)
  - ELSE (fetch from the server using GET)
- ELSE (use the cached copy)

NOTE: Platforms may perform an additional optimization and perform a "GET if modified" on a cached document when the policy requires a fetch from the server. Particularly for long files

NOTE: For documents requested using protocols other than HTTP that do not support the notion of age or staleness, platforms must compute a resource's age from the time it was received and assume that resources expire immediately upon receipt.

Streaming Audio

VoiceXML 2.0 does not explicitly specify or require any behaviors for streaming audio. However, for the aforementioned reasons streaming audio can be an extremely beneficial performance optimization in practice for commercially deployed applications.

VoiceXML 2.0 specifies that platforms may at their discretion stream any audio resource as a performance optimization.

Future versions of VoiceXML are likely to include a streaming attribute and/or property that will enable developers to indicate their preference for streaming behavior.

back to the top