|
VoiceXML:
Where Speech Meets the Web
(Continued
from Part 1)
Where's
the Beef?
In
the first section, we mentioned that VoiceXML brings
together some interesting technologies. Let's bring
a few more of these components into the mix. In Example
3, we'll actually prompt the user, collect some information,
play it back to them, and then submit it to a Web server
for further processing.
Example 3: Advanced Hello World <?xml version="1.0"?> <vxml version="1.0"> <!--Example 3 for VoiceXML Review --> <form> <block> <prompt> <audio src="http://www.voicexml.org/audio/helloworld.wav"> Hello, World! </audio> <prompt> </block>
<field name="greeting">
<prompt> What say you? </prompt>
<grammar> hello | howdy | greetings | hey | password </grammar>
<help> You can say hello, howdy, greetings, hey, or password </help>
<filled> You said <value expr="greeting"/> </filled>
</field>
<block> <!--Decide whether to continue talking to this caller --> <submit next="http://www.voicexml.org/ cgi-bin/friend_or_foe.cgi"/> </block>
</form> </vxml>
|
There are a number of new events occurring in this example.
The form now contains a field, or an item of
data to be collected from the user. This field has a
number of components:
- A
prompt element, which will be played to indicate
to the user that some input is required.
- A
grammar element, which defines what is acceptable
as input in this field.
- A
help element, which defines what to do if the
user asks for help.
- A
filled element, which indicates some action
to undertake when the input is successfully gathered.
Prompt
elements can contain URIs that refer to text or pre-recorded
audio (among other things) that can be played to the
user to indicate that input is required.
Grammar elements set parameters on what the user can
say. In the case of Example 3, acceptable input consists
of one of the words "hello", "greetings", "hey", "password",
or "howdy". If the user does not provide one of these
inputs, they will be given (according to Example 3)
a system-dependent message indicating that they did
not respond, or that their response was not understood.
The user is then re-prompted, and is given another chance
to provide input. Handling of conditions such as no
input can be customized in many different ways. If the
user says 'help', then the content in the help element
will be played.
Once input gathering is successful, the actions specified
in the <filled> element are then processed.
In the case of Example 3, the user is told what he or
she has said. The processing of this field is now complete,
(as is collection of input for this form). The processing
of elements within a form is clearly defined by the
VoiceXML Form Interpretation Algorithm (FIA).
The form element contains one sub-element in addition
to the field. The submit block packages the collected
data and submits it to a Web server for further processing.
The underlying mechanism for this is exactly the same
as submitting an HTML form from a visual Web browser.
The invoked CGI program would presumably decide how
to proceed based on the user input thus far. So a sample
conversation might appear like:
Computer: Hello
world!
Computer: What say
you?
Human: Foo
Computer: I do not
understand
Computer:
What say you?
Human: Help
Computer: You can
say 'hello', 'howdy', 'greetings', 'hey', or password'.
Computer: What say
you?
Human: Howdy
Computer: You said
Howdy
[...form is submitted to Web Server...]
VoiceXML
Developer Resources
The VoiceXML Forum has developed a number of resources
that will allow those new to VoiceXML to get started.
Here are a few pointers:
A
number of VoiceXML Forum Members provide access to developer
sites and tool kits that will allow you to try out VoiceXML
for yourself. A few of these are:
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|