Answers
to Your Questions About VoiceXML
(Continued
from Part 1)
Q:
I'm confused about how VoiceXML really interacts with
HTTP. How are things like caching and cookies supposed
to work?
A: The fundamental premise of VoiceXML is to
bring the Internet architecture to the telephone. As
a result, VoiceXML applications make extensive use of
the HTTP protocol. VoiceXML applications use HTTP to
retrieve VoiceXML, grammar, script, and audio documents.
As
a result, the specific mechanisms and properties governing
how VoiceXML platforms behave when the request and process
documents via HTTP are critical for performance, reliability,
and robustness.
The
three governing principles of how VoiceXML interacts
with HTTP are:
- "When
in doubt, it works just like the Web!"
Of course, this is because VoiceXML is the Web.
- "Specifically,
support and follow all relevant HTTP conventions".
Most VoiceXML platforms take this to at least mean
HTTP response codes like redirects; leading VoiceXML
platforms tend to extend this principle as far as
possible to include support for things like cookies,
SSL, and completely following the HTTP response headers
for directing caching behavior.
- "Developers
can specify a preference for behavior, but platforms
are generally free to bypass these preferences in
favor of alternative behavior that is known to better
optimize performance without disturbing functionality."
Performance and reliability are ultimately what matter
most, and since applications can run on multiple platforms/networks
it's critical that platforms be free to make appropriate
local optimization decisions.
More
specifically, VoiceXML platforms follow certain prescribed
(and some optional) behaviors for requesting, retrieving,
processing, and caching documents via HTTP. In addition,
some of these behaviors are programmatically controllable.
For example:
Fetching
and Initializing New Documents
Several
VoiceXML elements (e.g. <link>, <submit>,
etc.) specify transitions to a new VoiceXML dialog via
a URI. If that URI refers to another dialog in the same
document (e.g. "#top"), then a new HTTP fetch
is not required and the transition proceeds immediately.
Transitions to another document trigger a new HTTP request.
This request can trigger an actual HTTP request to the
originating Web server, or can be fulfilled from the
platform's internal cache (see "Caching" below).
Regardless
of whether the document was cached or not, the newly
retrieved document is processed in the following manner:
- If
specified, the application root document is fetched
and initialized.
- Any
document scope variables are initialized.
- Any
document scope scripts are executed.
-
The requested dialog (or the first dialog if none
is specified) is initialized and execution of the
dialog begins.
Caching
Policies
One of the fundamental benefits of the VoiceXML architecture
is the ability to cleanly separate where the application
lives (the Web server) from where the interpreter/platform
lives. In practice, this means that smart and effective
caching policies can dramatically impact the performance
of commercially deployed VoiceXML applications. This
condition is further exacerbated by the fact that VoiceXML
tends to reference very large documents such as long
audio files and complex grammars.
VoiceXML
platforms are required to adhere to the cache correctness
rules of HTTP 1.1, as specified in RFC2616 (See http://www.ietf.org/rfc/rfc2616.txt?number=2616).
In particular, the "Expires" and "Cache-Control"
response headers must be honored. Generally speaking,
this means the following:
- IF
(resource is not in the cache) THEN (fetch it from
the server using GET)
- ELSE
- IF
(maxage is specified) THEN
- IF
(age of cached resource <= maxage) THEN
- IF
(age of cached resource >= Expires
header) THEN
- IF
(maxstale is specified) AND ( (age
of cached resource - Expires header)
<= maxstale ) THEN (use the cached
copy)
-
ELSE (fetch from the server using
GET)
- ELSE
(use the cached copy)
- ELSE
(fetch from the server using GET)
- ELSEIF
(age of cached resource >= Expires header)
THEN
- IF
(maxstale is specified) AND ( (age of cached
resource - Expires header) <= maxstale
) THEN (use the cached copy)
- ELSE
(fetch from the server using GET)
-
ELSE (use the cached copy)
NOTE:
Platforms may perform an additional optimization and
perform a "GET if modified" on a cached document
when the policy requires a fetch from the server. Particularly
for long files
NOTE:
For documents requested using protocols other than HTTP
that do not support the notion of age or staleness,
platforms must compute a resource's age from the time
it was received and assume that resources expire immediately
upon receipt.
Streaming
Audio
VoiceXML
2.0 does not explicitly specify or require any behaviors
for streaming audio. However, for the aforementioned
reasons streaming audio can be an extremely beneficial
performance optimization in practice for commercially
deployed applications.
VoiceXML
2.0 specifies that platforms may at their discretion
stream any audio resource as a performance optimization.
Future
versions of VoiceXML are likely to include a streaming
attribute and/or property that will enable developers
to indicate their preference for streaming behavior.
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|