VoiceXML Review - Columns

Volume 5, Issue 4 - July/Aug 2005

First Words

Welcome to “First Words” – the VoiceXML Review’s column to teach you about VoiceXML and how you can use it. We hope you enjoy the lesson.

VoiceXML 2.1

In this lesson, we’re going to continue investigating VoiceXML 2.1.

You may recall that as VoiceXML platform vendors and application developers began to widely deploy VoiceXML applications, they began to identify potential future extensions to the language. The result of this experience is a collection of field-proven features that are candidates for addition to the VoiceXML language. These features are being proposed as part of VoiceXML 2.1.

VoiceXML 2.1 has recently advanced to the Candidate Recommendation state. Here is a pointer:

http://www.w3.org/TR/2005/CR-voicexml21-20050613/

Note: if you’re reading this article after VoiceXML 2.1 has been finalized and published as a full Recommendation, you should spend a few minutes tracking down the final specification rather than this link, as the specification may have undergone minor changes.

The new features proposed for VoiceXML 2.1 are based on feedback from application developers and VoiceXML platform developers. The features we’ve covered already include:

Referencing Grammars Dynamically – Generation of a grammar URI reference with an expression;
Referencing Scripts Dynamically – Generation of a script URI reference with an expression;
Recording user utterances while attempting recognition – Provides access to the actual caller utterance, for use in the user interface, or for submission to the application server.
Adding namelist to <disconnect> - The ability to pass information back to the VoiceXML platform environment (for example, if the application wishes to pass results to a CCXML session related to the call)
Using <mark> to detect barge-in during prompt playback – Placement of ‘bookmarks’ within a prompt stream to identify where a barge-in has occurred;
Concatenating Prompts Dynamically using <foreach>

Here are the links to the previous articles in this series:

http://www.voicexmlreview.org/Sep2004/columns/sep2004_first_words.html
http://www.voicexmlreview.org/Nov2004/columns/nov2004_first_words.html
http://www.voicexmlreview.org/Feb2005/columns/Feb2005_first_words.html
http://www.voicexmlreview.org/Apr2005/columns/Apr2005_first_words.html
http://www.voicexmlreview.org/Jun2005/columns/Jun2005_first_words.html

This issue, we’re going to look at:

Using <data> to fetch XML information without requiring a dialog transition;
This is perhaps one of the most powerful new features in VoiceXML 2.1.

Fetching XML Information with <data>

VoiceXML 2.0 is built around the HTTP request/response model, where most interesting applications are ‘personalized’ or dynamically built by the web server. Information from previous input will often be used to construct following pages, requiring a round-trip to the web server to construct the page in each case.

VoiceXML 2.1 standardizes a commonly implemented extension to VoiceXML 2.0 – the element – which allows a VoiceXML 2.1 application to make a synchronous request for information without having to leave the currently executing VoiceXML page.

There are a few reasons why this is very interesting:

Improved decoupling of the presentation layer, and business logic;
Different development models are now useable – for example, AJAX-like applications can be built;
VoiceXML pages become much more cacheable (they can be static VoiceXML, rather than being dynamically generated using server-side technologies);
Application server load is typically reduced, due to the reduced requirement for dynamic page generation;
Particular security models are more easily verified (due to the improved separation of the voice interface software and the data model);
src – A URI that will return the XML data of interest;
name – The ECMAScript variable which will expose the DOM data defined by the XML returned from the application server;
srcexpr – An alternative to ‘src’ – in this case, an ECMAScript expression that evaluates to a URI;
method – The HTTP method by which to make the request – either ‘get’ or ‘post’.
namelist – A list of in-scope ECMAScript variables to be sent to parameterize the request;
enctype – The HTTP request encoding type for the request;
fetchaudio – Audio to be played while fetching the data;
fetchhint – Should we try to prefetch the data or not?
fetchtimeout – How long we should wait for the data;
maxage – If we are willing to use cached data, the maximum age that it can have;
maxstale – If we are willing to use cached data, how stale it can be;

For fetchaudio, fetchhint, fetchtimeout, maxage, and maxstale, please see Section 6.1 of the VoiceXML 2.0 specification (http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml6.1). There are also several properties (datafetchhint, datamaxage, datamaxstale) that can be used to set default settings for documents fetched by the <data> tag.

The <data> tag can be used in executable content or as a child of <form> or <vxml> (just like the <var> tag for declaring variables).

What Can Go Wrong?

As always, there are some restrictions in what we can do. Only one of ‘src’ or ‘srcexpr’ can be used (which makes sense). We also need to be prepared in our application to handle a problem with retrieving the data. This might be seen as a timeout (the application server isn’t answering), or as an authorization issue (the application doesn’t think we should be able to get the data). These will result in the usual types of errors generated for this situation in VoiceXML.

In addition to this, the returned data should be well-formed XML. The VoiceXML 2.1 specification allows for other data types as well, but the specification itself focuses on XML. There are also errors that can be generated as a result of mapping the returned XML into a DOM object. And one final note – the DOM object is read-only, so if you want to manipulate the returned data, you will have to make a copy of it.

An Example

By sending the parameters defined in the ‘namelist’ to the URI specified by ‘src’ or ‘srcexpr’, our local variable defined by ‘name’ will receive a DOM object specified by the XML data that is returned from the application server.

Here is a brief example from the VoiceXML 2.1 specification. The first snippet is an XML document which might be returned by an application server providing an interface to a stock quotation service.

<?xml version="1.0" encoding="UTF-8"?>
<quote>
   <ticker>F</ticker>
   <name>Ford Motor Company</name>
   <change>1.00</change>
  <last>30.00</last>
</quote>

This tells us that the quote for ticker symbol ‘F’, representing the Ford Motor Company, has changed by $1.00, and that the last quote was $30.00. Note that this is only data, with no user interface information embedded at all.

The next snippet shows a VoiceXML 2.1 fragment that first of all retrieves this fragment (using the tag), and then assigns part of the information to a local variable ‘price’.

&ldata name="quote" src="quote.xml"/>
<script><![CDATA[
var price = quote.documentElement.getElementsByTagName("last").item(0).firstChild.data;
]]></script&g

Note that to extract the right bit of data from the DOM tree, we need to use the ECMAScript binding to the DOM. We look up the right element (getElementsByTagName) and retrieve the data using the appropriate DOM properties. It is a little wordy, but pretty straightforward once you get used to it. It can also be useful to define ECMAScript functions to encapsulate the actual navigation of the DOM tree. There is a great example in the VoiceXML 2.1 specification – see the link below.

This is a trivial example. However, imagine that we were sending parameters along with the request, allowing the application server to perform a database lookup or host query in order to generate the XML data. We now have dynamic application – because the data is dynamic – while using the same voice user interface page to manage this data.

Summary

The <data> tag provides a method for accessing XML data from a VoiceXML page, without requiring a page transition or a dynamic page generation by the application server. The <data> tag is also particularly useful when combined with some of the other additions in VoiceXML 2.1 – namely <foreach> and the addition of URI expressions to <grammar> and <script>. The combination of these features allows the use of static VoiceXML for the management of the voice user interface, while the ‘customization’ of the interface is done by retrieving caller specific data using the <data> tag.

Here is the direct link to the description of the tag:

http://www.w3.org/TR/2005/CR-voicexml21-20050613/#sec-data

You’re going to want to have a look at this as well:

http://www.w3.org/TR/2005/CR-voicexml21-20050613/#sec-data-dom

This is a reference describing how the DOM is mapped into the ECMAScript name space. It is a good idea to become familiar with the structure of the DOM and the ECMAScript functions available for navigating the DOM tree. This will allow you to try out different development models based on VoiceXML 2.1, and to build better applications using the tag.

Brad Porter has written a detailed article regarding how the tag can be used to support development of AJAX applications in VoiceXML as part of last VoiceXML Review issue as well. You can read that article at:

http://www.voicexmlreview.org/Jun2005/columns/Jun2005_speak_listen.html

Well, that wraps up our overview of VoiceXML 2.1. We’ll be digging into some other interesting topics next issue. As always, if you have questions or topics for VoiceXML 2.0 or 2.1, drop us a line!

back to the top