Volume 5, Issue 4 - July / Aug 2005
 
   
   
 

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org and be sure to read future issues of VoiceXML Review for the answer.

By Matt Oshry

Q: I've been tasked with presenting the user with a list of items. When they hear an item in which they're interested they should be able to say "tell me more" to obtain more information about the item. How might I implement that?

A: There are several ways to implement a 'pick list' in VoiceXML. If you have access to an interpreter that implements the features described in the VoiceXML 2.1 specification, you can use a combination of the <foreach> and <mark> tags to implement a pick list.

The following example declares a list of fruits and vegetables in the ECMAScript array aItems. Each item in the array is an ECMAScript object with a few properties - 'id', 'name', and 'detail'. The 'name' and 'detail' properties are use for queuing prompts in the 'picklist' and 'details' dialogs. The 'id' property uniquely identifies the item and is used to name the <mark> that is executed for that item. We also use the id in the <filled> of the 'item' <field> to determine which item was selected.

A pause is included between each item to give the user the opportunity to say the magic phrase, 'tell me more' which causes the interpreter to execute the 'details' dialog.

<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<script> 
<![CDATA[
var idPicked = null; // the id of the selected item

// a list of items
var aItems = [{'id' : 'i0', 'name' : 'eggplant', 
   'detail' : 
   'a purple vegetable best breaded, fried, and smothered in marinara and mozzarella'}, 
 {'id' : 'i1', 'name' : 'endive', 
   'detail' : 
   'a tangy, tender vegetable known as white gold by belgians the world over'}, 
 {'id' : 'i2', 'name' : 'mango', 
   'detail' : 
   'a large oval tropical fruit having smooth skin, juicy aromatic pulp, and a large hairy seed'},
 {'id' : 'i3', 'name' : 'papaya', 
   'detail' : 
   'a pear-shaped fruit with yellow skin and bright orange flesh ' +
 'with small black seeds clustered in the center' }];

// given an id, return the corresponding object in the list
function GetItemById(id)
{
  var ret = null;
  for (var i = 0; i < aItems.length; i++) {
    if (aItems[i].id == id) {
      ret = aItems[i];
    }
  }
  return ret;
}
]]>
</script>

<form id="picklist">
  <block name="init">
  <prompt>
    here's the list of available produce.
    when you hear the one you want, say 'tell me more'.
  </prompt>
  </block>
  <field name="item">
    <property name="bargeintype" value="hotword"/>
    <property name="timeout" value="1.5s"/>
    <prompt>
      <foreach item="curItem" array="aItems">
        <mark nameexpr="curItem.id"/>
        <value expr="curItem.name"/>
        <break time="500ms"/>
      </foreach>
     </prompt>
    <grammar type="application/srgs+xml" 
      root="itemRules" mode="voice">
    <rule id="itemRules" scope="public">
      <one-of>
        <item>tell me more</item>
      </one-of>
    </rule>
    </grammar>
    <nomatch>
     Sorry. Didn't get that.
     <reprompt/>     
    </nomatch>
    <noinput>
      That's the end of the list.
      <goto next="#replay"/>
    </noinput>
    <filled>
       <!-- make sure a mark was executed -->
       <if cond="typeof item$.markname == 'string'">
         <assign name="idPicked" expr="item$.markname"/>
         <goto next="#details"/>
       <else/>
         Sorry. Didn't get that. 
         <assign name="item" expr="undefined"/>
       </if>
    </filled>
  </field>
</form>

<form id="replay">
  <field name="again" type="boolean">
    <prompt>
      Do you want to hear the list again?
    </prompt>
    <catch event="nomatch noinput">
      To hear the list again, say yes.
      Otherwise, say no.
    </catch>
    <filled>
      <if cond="again">
        <goto next="#picklist"/>
      <else/>
        <exit/>
      </if>
    </filled>
  </field>
</form>

<form id="details">
  <var name="obj" expr="GetItemById(idPicked)"/>
  <field name="action">
    <property name="bargeintype" value="hotword"/>
    <prompt>
      <value expr="obj.name"/>
      <break time="300ms"/>
      <value expr="obj.detail"/>
      <break time="300ms"/>
      To here this detail again, say repeat.
      Otherwise, say go back.
      <break time="300ms"/>
    </prompt>
    <grammar type="application/srgs+xml" 
      root="rule1" mode="voice">
    <rule id="rule1" scope="public">
      <one-of>
        <item>repeat</item>
        <item>stop</item>
        <item>go back</item>
      </one-of>
    </rule>
    </grammar>
  </field>
  <noinput>
    <!-- end of details; return to the list -->
    <goto next="#picklist"/>
  </noinput>  
  <nomatch>
    <goto next="#replay"/>
  </nomatch>
  <filled>
    <if cond="action == 'repeat'">
      <clear/>
    <elseif cond="action == 'stop'"/>
      <goto next="#replay"/>
    <else/>
      <goto next="#picklist"/>
    </if>
  </filled>
</form>

</vxml>

Q: Your example only shows text-to-speech. What if I want to improve the quality of my application by playing back recorded audio for each item in the list?

A. Because the array aItems consists of objects, you can do one of the following: a) Extend each object to include properties that store the URIs to the recorded audio for that item. b) Use the existing 'id' property not only to name the <mark> but also to represent a portion of the URI to the recorded audio files for each item.

The latter is appropriate if the recordings are located in a common location and you have control over the naming of those recordings. Let's consider the latter option in more detail.

Let's say the audio is located at the following base URI: http://audio.acmegrocer.net/produce/

For eggplant, the id is 'i0', and we need two recordings - one for the name and one for the detail. We'll name the corresponding files 'i0_name.wav' and 'i0_detail.wav'.

To reference the name recording from the item <field> of the picklist dialog, we augment

  <value expr="curItem.name"/>

as

<audio expr="'http://audio.acmegrocer.net/produce/' + curItem.id + '_name.wav'">
    <value expr="curItem.name"/>
  </audio>

The TTS emitted by the execution of the <value> tag is only played if the recording can't be fetched.

Updating the action <field> of the 'details' dialog to use name and details recordings is left as an exercise to the reader. (Or you can cheat by looking at the revised example below.)

Q: How do I allow the user to say 'stop' at any time to terminate playback of the list.

A.You simply extend the itemRules grammar to include another keyword, and add additional code in the <form> to handle the semantic result associated with that keyword.

Q. Your example included a static list of items. Can I extend the code to read back an arbitrary list of items?

A. Using the <data> tag desribed in VoiceXML 2.1 you can. The <data> tag is capable of consuming any XML data source accessible via a URL. That XML can be generated on-the-fly via a server-side script (e.g. JSP) The XML can also be stored in a static file that you might create by exporting data from a DBMS periodically. This is a good solution if the data doesn't change frequently, and the list of items is not too large.

Here's an example of what the XML document from the previous example might look like:

<?xml version="1.0" ?> 
<?access-control allow="*"?> 
<items>
  <item id="i0">
    <name>eggplant</name> 
    <detail>a purple vegetable best breaded, fried, and smothered in marinara and mozzarella</detail> 
  </item>
  <item id="i1">
    <name>endive</name> 
    <detail>a tangy, tender vegetable known as white gold by belgians the world over</detail> 
  </item>
  <item id="i2">
    <name>mango</name> 
    <detail>a large oval tropical fruit having smooth skin, juicy aromatic pulp, and a 
    large hairy seed</detail> 
  </item>
  <item id="i3">
    <name>papaya</name> 
    <detail>a pear-shaped fruit with yellow skin and bright orange flesh with small black 
    seeds clustered in the center</detail> 
  </item>
</items>

Here's an example of a data tag that fetches the XML data containing the list of items from the URI "http://data.acmegrocer.net/produce/items.xml".

<data name="dom" src="http://data.acmegrocer.net/produce/items.xml"/>

If the document at the other end of that URI is well-formed, the VoiceXML browser exposes it to your application via the variable named "dom". The variable "dom" is an ECMAScript object that implements the read-only methods and properties of the W3C Document Object Model (DOM) Level 2.

To use that data with the <foreach> tag you'll need to convert it into an ECMAScript array. Here's some code that does that. You just call ParseItems and pass in the value of the data tag's name attribute.

// Given a DOM, return an array of the 'item' elements
// parsing out the data we care about
function ParseItems(dom) {
  var aItems = [];
  var r = dom.documentElement;
  for (var i = 0; i < r.childNodes.length; i++) {
    var ch = r.childNodes.item(i);
    if (ch.nodeType == Node.ELEMENT_NODE && ch.nodeName == 'item') {
      var name = GetChildTextOf(ch, 'name');
      var detail = GetChildTextOf(ch, 'detail');
      aItems.push({'id' : ch.getAttribute('id'), 
        'name' : name, 'detail' : detail});
    }
  }
  return aItems;
}

// Get the text content of the named child
function GetChildTextOf(parent, name)
{
  var ret = null;
  for (var i = 0; i < parent.childNodes.length; i++) {
    var ch = parent.childNodes.item(i);
    if (ch.nodeType == Node.ELEMENT_NODE && ch.nodeName == name) {
      if (ch.childNodes.length > 0) {
        ret = "";
        for (var j = 0; j < ch.childNodes.length; j++) {
          var ch2 = ch.childNodes.item(j);
          if (ch2.nodeType == Node.TEXT_NODE ||
             ch2.nodeType == Node.CDATA_SECTION_NODE) {         
            ret += ch2.data;
          }
        }
      }
      break;
    }
  }  
  return ret;
}

Here are the changes you'll need to make to the VoiceXML sample code above to use dynamic data:

1) Instead of assigning a hard-coded array to aItems, initialize it as an empty array:

  var aItems = new Array();

2) Add the ParseItems and GetChildTextOf functions described above to the <script> tag.

3) Add a <data> tag to the picklist dialog at either dialog scope (within the <form> tag) or at anonymous scope within the <block> named 'init'.

  <data name="domProduce" src="items.xml"/>

4) Add an <assign> tag to the 'init' <block> (after the <data> tag) to populate the aItems array with the data fetched by the >data> tag. Assuming you set the value of the name attribute of the <data> tag to 'domProduce', the <assign> should be:

  <assign name="aItems" expr="ParseItems(domProduce)"/>

Here's the revised code:

<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<script> 
<![CDATA[
var audioBaseURI = ''; // base URI to the recordings
var idPicked = null; // the id of the selected item

// a list of items
var aItems = new Array();

// given an id, return the corresponding object in the list
function GetItemById(id)
{
  var ret = null;
  for (var i = 0; i < aItems.length; i++) {
    if (aItems[i].id == id) {
      ret = aItems[i];
    }
  }
  return ret;
}

// Given a DOM, return an array of the 'item' elements
// parsing out the data we care about
function ParseItems(dom) {
  var aItems = [];
  var r = dom.documentElement;
  for (var i = 0; i < r.childNodes.length; i++) {
    var ch = r.childNodes.item(i);
    if (ch.nodeType == Node.ELEMENT_NODE && ch.nodeName == 'item')
    {
      var name = GetChildTextOf(ch, 'name');
      var detail = GetChildTextOf(ch, 'detail');
      aItems.push({'id' : ch.getAttribute('id'), 
        'name' : name, 'detail' : detail});
    }
  }
  return aItems;
}

// Get the text content of the named child
function GetChildTextOf(parent, name)
{
  var ret = null;
  for (var i = 0; i < parent.childNodes.length; i++) {
    var ch = parent.childNodes.item(i);
    if (ch.nodeType == Node.ELEMENT_NODE && ch.nodeName == name) {
      if (ch.childNodes.length > 0) {
        ret = "";
        for (var j = 0; j < ch.childNodes.length; j++) {
          var ch2 = ch.childNodes.item(j);
          if (ch2.nodeType == Node.TEXT_NODE ||
             ch2.nodeType == Node.CDATA_SECTION_NODE) {         
            ret += ch2.data;
          }
        }
      }
      break;
    }
  }  
  return ret;
}
]]>
</script>

<form id="picklist">
  <block name="init">
  <data name="domProduce" src="items.xml"/>
  <assign name="aItems" expr="ParseItems(domProduce)"/>
  <prompt>
    here's the list of available produce.
    when you hear the one you want, say 'tell me more'.
  </prompt>
  </block>
  <field name="item">
    <property name="bargeintype" value="hotword"/>
    <property name="timeout" value="1.5s"/>
    <prompt>
      <foreach item="curItem" array="aItems">
        <mark nameexpr="curItem.id"/>
        <audio expr="audioBaseURI + curItem.id + '_name.wav'">
          <value expr="curItem.name"/>
        </audio>
        <break time="500ms"/>
      </foreach>
     </prompt>
    <grammar type="application/srgs+xml" 
      root="itemRules" mode="voice">
    <rule id="itemRules" scope="public">
      <one-of>
        <item>tell me more</item>
      </one-of>
    </rule>
    </grammar>
    <nomatch>
     Sorry. Didn't get that.
     <reprompt/>     
    </nomatch>
    <noinput>
      That's the end of the list.
      <goto next="#replay"/>
    </noinput>
    <filled>
       <!-- make sure a mark was executed -->
       <if cond="typeof item$.markname == 'string'">
         <assign name="idPicked" expr="item$.markname"/>
         <goto next="#details"/>
       <else/>
         Sorry. Didn't get that. 
         <assign name="item" expr="undefined"/>
       </if>
    </filled>
  </field>
</form>

<form id="replay">
  <field name="again" type="boolean">
    <prompt>
      Do you want to hear the list again?
    </prompt>
    <catch event="nomatch noinput">
      To hear the list again, say yes.
      Otherwise, say no.
    </catch>
    <filled>
      <if cond="again">
        <goto next="#picklist"/>
      <else/>
        <exit/>
      </if>
    </filled>
  </field>
</form>

<form id="details">
  <var name="obj" expr="GetItemById(idPicked)"/>
  <field name="action">
    <property name="bargeintype" value="hotword"/>
    <prompt>
      <audio expr="audioBaseURI + obj.id + '_name.wav'">
        <value expr="obj.name"/>
      </audio>
      <break time="300ms"/>
      <audio expr="audioBaseURI + obj.id + '_detail.wav'">
        <value expr="obj.detail"/>
      </audio>
      <break time="300ms"/>
      To here this detail again, say repeat.
      Otherwise, say go back.
      <break time="300ms"/>
    </prompt>
    <grammar type="application/srgs+xml" 
      root="rule1" mode="voice">
    <rule id="rule1" scope="public">
      <one-of>
        <item>repeat</item>
        <item>stop</item>
        <item>go back</item>
      </one-of>
    </rule>
    </grammar>
  </field>
  <noinput>
    <!-- end of details; return to the list -->
    <goto next="#picklist"/>
  </noinput>  
  <nomatch>
    <goto next="#replay"/>
  </nomatch>
  <filled>
    <if cond="action == 'repeat'">
      <clear/>
    <elseif cond="action == 'stop'"/>
      <goto next="#replay"/>
    <else/>
      <goto next="#picklist"/>
    </if>
  </filled>
</form>

</vxml>

View or download the code shown above (.zip file).
 


back to the top

Copyright © 2001-2005 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).