VoiceXML Review - Columns

Volume 1, Issue 8 - August/September 2001

Caching and Prefetching in VoiceXML

By Rob Marchand

Welcome to First Words, VoiceXML Review's column that teaches you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Last month we got started with ECMAScript, and showed you how you can take advantage of scripting in your VoiceXML application. This month, finally, we're going to talk about caching. (And no, this isn't about cashing in on our on-line pizza franchise!)

One of the great strengths of VoiceXML is the underlying web technology upon which it is built. This means we can take advantage of existing libraries, tools, and development techniques to build Voice-enabled applications. We can also access applications on a local area network, a wide area network, or on the Internet.

However, along with this flexibility come some caveats. Many voice applications will make use of pre-recorded audio files, possibly large ones. Others might use large grammars or VoiceXML pages. While we have been conditioned to accept delays and 'jitter' in the visual web environment, this behavior isn't acceptable in the voice user interface domain.

As a result of this, the developers of the original VoiceXML Specification put some thought into how to provide a number of features to the VoiceXML developer. These features include:

Support for caching within the VoiceXML context;
Localized control of caching and file prefetch with tag attributes;
Scoped control of caching and file prefetch with properties;

Caching allows the reuse of files that have a copy stored 'closer' to the VoiceXML gateway than the actual resource from the network or Internet. Caching reduces network-related latencies (for file delivery) and load (on the origin server).

Prefetching allows the early retrieval of a file in anticipation of its need by the VoiceXML gateway. This can reduce or eliminate latencies perceived by the user, at the expense of possibly fetching files that aren't needed, or that may be inconsistent with what is desired (if the file changes over time, for example).

How Do Caching and Prefetching Work?

As the voice user interface user can't control 'page refresh' the same way a web browser user might, it is up to the application developer to do so. This can be done using elements defined by the VoiceXML specification authors.

The application developer can control when files are fetched using the prefetch attribute or properties. The prefetch attribute can take the values:

safe - indicating that the file should be fetched when it is needed by the VoiceXML page;
prefetch - indicating that the platform should attempt to prefetch files when the page is being compiled, or as the page begins execution; or
stream - indicates the file may be large, and that the platform should begin processing the data as it arrives (for example, audio data delivered with prefetch='stream' may begin playback immediately rather than waiting for the entire file).

Using a prefetch policy means that it is more likely a resource will be present locally or in the cache when it is needed, but may lead to fetching of unneeded files, or inconsistencies due to files being fetched too soon (if they change over time, for example).

And now the nuts and bolts.

A few definitions:

Origin server: the web server that serves the original document to the VoiceXML gateway;
Cache: the VoiceXML cache which will store cached documents:
Document: any VoiceXML component which may be reference by a URI;
HTTP request: one of get, post, or get-if-modified (for the purposes of this discussion).

HTTP requests sent with the post method are never eligible for caching. Requests sent with the get or get-if-modified methods will be eligible for caching, depending upon the VoiceXML caching parameters, and assorted HTTP information that the origin server provides in the response to the VoiceXML gateway (things like expiry time, cache control headers, and the like). Note that these may be eligible for caching even in the event that the get request includes parameters, and the response is generated dynamically (watch out for this one!).

The two types of caching in VoiceXML behave, then, as follows.

For fast caching:

If the document exists in the cache, and is not expired, then use the cached copy;
If the document is not in the cache, or has expired, then fetch the document with a get; optionally use a get-if-modified (which will only transfer the document if it has changed since the last time the cached copy was retrieved).

For safe caching:

If the document is in the cache and has not expired, fetch the document from the origin server using get-if-modified;
If the document is expired, or is not present in the cache, then fetch the document using get; optionally use a get-if-modified if the document is in the cache and has expired.

The default behavior for caching is fast. The default for prefetch is safe.

One point of note is that the cache still manages some components of caching; in particular, it may choose to refresh a file from the origin server if the expiry time has passed. Therefore, in VoiceXML 1.0, you can't control everything you might wish to control.

Continued...

back to the top