VoiceXML Review - Columns

Volume 3, Issue 2 - March/April 2003

VoiceXML Events

By Rob Marchand

Welcome to "First Words" - the VoiceXML Review's column to teach you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Since our last column, the VoiceXML 2.0 Candidate Recommendation has been released! The W3C published the CR on January 28th, with a minor improvement to the schema on February 20th.

http://www.w3.org/TR/voicexml20/

Be sure to have a look at it, as this provides the benchmark for implementation of a VoiceXML Interpreter, and is your reference for developing portable applications in VoiceXML. Changes between now and the time that VoiceXML 2.0 is published as a full Recommendation should will be minor.

And now, back to the task at hand….

Handling Complex Recognition Results

We have spent the last couple of issues understanding how VoiceXML maps results from form-level and field-level grammars into the fields that you're trying to collect from the caller. We return here to our earlier pizza sample (yes, I'm sick of it too - something new next issue, we promise).

We have talked about being able to reuse grammar components, and ease maintenance issues. In the sample below, we've defined a form with two fields, and a form-level grammar. This allows the same form to handle both 'directed dialog' (where each field is filled in turn, as driven by the form interpretation algorithm), and 'mixed initiative' where the user can drive the collection of the data.

Here is the sample, with discussion afterwards:

<?xml version="1.0"?>
<vxml version="1.0" >
     <form>
          <initial>
               <prompt>
                     Welcome to the Voice X M L review pizza franchise
               </prompt>
          </initial>
          <grammar type="application/srgs+xml" xml:lang="en-US" version="1.0" root="order">
               <rule id="order" scope="public">
                    <item repeat="0-1">
                         <ruleref uri="digits.grxml"/>
                         <tag> c=digits.value; </tag>
                    </item>
                    <item repeat="0-1">
                         <ruleref uri="food.grxml"/>
                         <tag> c=food.name; </tag>
                    </item>
                    <tag>orderCount=c; orderItem=i</tag>
               </rule>
          </grammar>
          <field name="orderItem">
               <grammar type="application/srgs+xml" src="food.grxml"/>
               <prompt>
               What would you like to order?
               We have pizza, drinks, salad or wings.
          </prompt>
                    <noinput>
                    Say pizza, drinks, salad, or wings.
               </noinput>
                    <nomatch>
                    You can say pizza, drinks, salad, or wings.
               </nomatch>
                    <help>
                         <reprompt/>
                    </help>
               </field>
               <field name="orderCount">
                    <grammar type="application/srgs+xml" src="digits.grxml"/>
                    <prompt>
                    How many <value expr="orderItem"/> would you like?
               </prompt>
                    <noinput>
                    I need to know how many <value expr="orderItem"/> you want.
               </noinput>
                    <nomatch>
                    Please say how many <value expr="orderItem"/> you want.
               </nomatch>
                    <help>
                    You should tell me how many items you want.
               </help>
               </field>
               <block>
                    <prompt>
               One moment while I add
                    <value expr="orderCount"/>
                         <value expr="orderItem"/>
               to your order.
               </prompt>
                    <submit next="pizzaCart.php" namelist="orderItem orderCount"/>
               </block>
          </form>
     </vxml>

Here are some things to note about this sample:

The contents of the <tag> elements are representative only - as the language for semantic interpretation has not yet been finalized, the techniques used to propagate results through grammar hierarchies, and to the VoiceXML context, will depend upon the interpreter, and perhaps the recognizer that you're using. It seems likely that some subset of ECMAScript will be used, so this example reflects that.

The fields and form-level grammars share the same component grammars;

The grammars will return values representative of the collected data (the 'interpretation'), allowing maintenance and tuning of the grammars to be isolated to those files.

We haven't done anything special with the prompts, event handlers, etc - but we should to make this a better application.

When the caller is prompted, they can respond with an input like 'five pizzas', in which case the form-level grammar will return the two slots 'orderCount' and 'orderItem'. The FIA will then use these to populate the two fields of the same names (or, as we learned in earlier issues, fields referring to those slots by name with the 'slot' attribute).

Similarly, if the caller simply responds to the prompt by saying what type of food they want, or how much they want, the appropriate field will be filled, and the FIA will drive the collection of the rest of the data.

The main points to take away from the last few articles are these:

Read (and understand!) Section 3.1.6 of the VoiceXML Candidate Recommendation. This will help you understand how to use form and field level grammars to build flexible applications.

Think about the ways fields can be filled, and the requirements for slot naming, and how ECMAScript objects are returned to the application. This will make life easier when debugging or understanding how applications are behaving.

ECMAScript is fundamental to a number of these issues (among others in VoiceXML), so it doesn't hurt to understand this as well.

Take advantage of the shadow variables that are provided in VoiceXML for things like the interpretation, utterances, and so on.

Understand how your interpreter and recognizer handle semantic interpretation. It can be a bit frustrating that there are variances in these behaviors, but understanding how these work will allow you to build a better application.

Summary

Over the last few issues, we've had a pretty close look at how grammars provide results to fields. We hope these articles have helped you understand the flexibility and power that VoiceXML gives you in this regard.

Next issue, we'll tackle a new subject, and try to help you learn more about VoiceXML. Don't forget, if there is anything you'd like to see discussed here, drop us a line at editor@voicexmlreview.org.

back to the top