First Words
Welcome to “First Words” – the VoiceXML
Review’s column to teach you about VoiceXML and
how you can use it. We hope you enjoy the lesson.
VoiceXML
2.1
In
this lesson, we’re going to continue investigating
VoiceXML 2.1. As I write this, the Voice Browser Working
Group of the W3C is working hard on finalization of
the VoiceXML 2.1 specification, as part of face-to-face
meeting activities in Turin, Italy.
You may recall that as VoiceXML platform vendors and
application developers began to widely deploy VoiceXML
applications, they began to identify potential future
extensions to the language. The result of this experience
is a collection of field-proven features that are candidates
for addition to the VoiceXML language. These features
are being proposed as part of VoiceXML 2.1.
Just as a reminder, VoiceXML 2.1 has been released
as a Last Call Working Draft. Here is a pointer:
http://www.w3.org/TR/2004/WD-voicexml21-20040728/
Note:
if you’re reading this article after VoiceXML
2.1 has been finalized and published, you should spend
a few minutes tracking down the final specification
rather than this link, as the specification may have
undergone minor changes.
The new features proposed for VoiceXML 2.1 are based
on feedback from application developers and VoiceXML
platform developers
The features we looked at last issue were:
- Referencing
Grammars Dynamically – Generation
of a grammar URI reference with an expression;
- Referencing
Scripts Dynamically – Generation
of a script URI reference with an expression;
Here is a link to the article:
http://www.voicexmlreview.org/Sep2004/columns/sep2004_first_words.html
In
future issues, we’re going to look at these:
- Using <mark> to
detect barge-in during prompt playback – Placement
of ‘bookmarks’ within
a prompt stream to identify where a barge-in
has occurred;
- Using <data> to
fetch XML without requiring a dialog transition – Retrieval
of XML data, and construction of a related
DOM object, without
requiring
a transition to another VoiceXML page.
- Concatenating
prompts dynamically using <foreach> -
Building of prompt sequences dynamically
using Ecmascript;
- Adding
type to <transfer> -
Support for additional transfer flexibility (in
particular, a supervised
transfer), among other capabilities.
This
issue, we’re going to look at:
- Recording
user utterances while attempting recognition – Provides
access to the actual caller utterance, for
use in the user interface, or for submission
to the application server.
- Adding
namelist to <disconnect> -
The ability to pass information back to the VoiceXML
platform
environment (for example, if the application wishes
to pass results
to a CCXML session related to this call).
Recording User Utterances Collection of user utterances can be useful in a number
of ways. This feature allows the application to request
that the platform collect these utterances for application
use.
Utterance
recording is enabled with the ‘recordutterance’ property.
When set to ‘true’, the platform will set
three shadow variables as part of any input collection:
- recording – a
reference to the recorded audio;
- recordingsize – the
size of the recording in bytes;
- recordingduration – the
duration of the recording in milliseconds;
After
any successful input collection, these properties
are set on the form item variable – they are
always set on the application.lastresult$ object. Note
that support for this feature on the <record> and <transfer> elements
is optional (as is speech recognition support when
processing these elements).
Here is an example from the VoiceXML 2.1 Last Call
Working Draft.
<?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/2001/vxml" version="2.1"> <form> <property name="recordutterance" value="true"/>
<field name="city_state">
<prompt>
Say a city and state.
</prompt>
<grammar type="application/srgs+xml" src="citystate.grxml"/>
<nomatch>
I'm sorry. I didn't get that.
<reprompt/>
</nomatch>
<nomatch count="3">
<var name="the_recording" expr="lastresult$.recording"/>
<submit method="post"
enctype="multipart/form-data"
next="upload.cgi"
namelist="the_recording"/>
</nomatch>
</field>
</form>
<
/vxml>
|
In this example, utterance
recording is enabled within the scope of the form, by setting the ‘recordutterance’ property to ‘true’.
The form is attempting to collect a city and state from the user.
If
three ‘nomatch’ events occur (in a
row), the matching event handler above (<nomatch
count=”3”>) will be triggered. This
event handler submits the utterance to the application
server, where it presumably would be stored in a file
or database.
This
example would collect user utterances that are problematic.
Note though, that only the last utterance
in the sequence of three nomatch events would be saved
in this case. Note also the use of the ‘multipart/form-data’ encoding
in the submission – this is required by VoiceXML
2.1.
There
is an additional related property – ‘recordutterancetype’ – which
can be used to define the media type to be used for
recording the utterance. Should the requested type
not be supported by the platform, an error.unsupported.format
event will be thrown.
This capability enables a number of abilities, including
application tuning, application-server tier speaker
verification, confirmation of caller input for regulatory
purposes, among others.
Passing Data Using Disconnect VoiceXML
2.0 allows an application to return data to the VoiceXML
interpreter context using the ‘namelist’ attribute
of the <exit> element. This can be useful when
one wishes to pass data to other network elements.
Depending upon the platform in use, this might include
Computer Telephony Integration (CTI) subsystems, Call
Control XML (CCXML) interpreters, or other components.
For more information on CCXML, you might want to have
a look at:
http://www.w3.org/TR/2004/WD-ccxml-20040430/
As the CCXML specification nears completion, this
feature will provide an additional mechanism for communication
between the CCXML interpreter and VoiceXML dialogs
under its control.
VoiceXML
2.1 adds this capability to the <disconnect> element.
This is useful for particular applications and provides
a consistent mechanism for providing this data to the
interpreter context.
The
use of the ‘namelist’ attribute with <disconnect> is
very straightforward:
<disconnect namelist=”accountNumber accountType
transactionType”/>
In
this example, the ECMAScript variables listed in
the ‘namelist’ attribute will be passed
to the interpreter context. Both the <exit> and <disconnect> elements
can be used in a document. In this case, the values
from both are passed to the interpreter context for
further processing.
Summary Here are the direct links to these two new features.
http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-disconnect
http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-reco_reco
Watch for more information on VoiceXML 2.1 in our
forthcoming issues.
VoiceXML
2.1 proposes some useful additional features for
VoiceXML 2.0, based on real-world deployment experience.
We’re going to continue looking at these in the
forthcoming issues drilling down into these features.
As always, if you have questions or topics for VoiceXML
2.0 or 2.1, drop us a line!
back
to the top
Copyright
© 2001-2004 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|