Typically, we’re going to be asking a bit more out of our applications than simple prompt playing. With an ASR or IVR application, we will need to implement elements that will control dialog flow, perform recognition tasks, and handle events.

Suppose we want to create an application for ordering a pizza. The first thing you might want to find out from a caller would be their phone number. How would this be done?

Gathering user input with <field>:

The <field> item encapsulates a dialog unit which <prompt>s a user for input, recognizes the input according to the rules supplied by a <grammar>, and may <catch> any events appropriate to that portion of the dialog. Below is an example of what our phone number gathering field might look like:

<form id="getPhoneNumber">
<field name="PhoneNumber">
<prompt>What's your phone number?</prompt>
<grammar src="../grammars/phone.gram" type="application/srgs+xml" />
<help> Please say your ten digit phone number. </help>
</field>
</form>

In the above field, the field item variable “PhoneNumber” is going to be filled with the recognized caller response to the prompt, “What’s your phone number?” The allowable user response (in this case an utterance of ten digits) is defined by the grammar of type “application/srgs+xml”* residing at the relative URL specified by the src attribute as “../grammars/phone.gram”. If the user asks for “help,” a help event will be caught and the prompt, “Please say your ten digit phone number.”, will be played.

* This specifies the mime-type of the W3C’s XML Speech Recognition Grammar Format. Other supported grammar formats are specified in the W3C’s VoiceXML 2.0

Last Call Working Draft at: http://www.w3.org/TR/voicexml20/.