|
Answers
to Your Questions About VoiceXML
In
this monthly column, an industry expert will answer
common questions about VoiceXML and related technologies.
Readers are encouraged to submit questions about VoiceXML,
including development, voice-user interface design,
and speech technology in general, or how VoiceXML is
being used commercially in the marketplace. If you have
a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org
and be sure to read future issues of VoiceXML Review
for the answer.
This
month we received a few more great questions from the
readership. It's great to see momentum begin to develop
here, and I look forward to the point where too many
questions are coming in each month to publish answers
to every one.
Q: I'm working with VoiceXML, and want to know how I
can submit recorded sound? In which format is the recorded
sound stored, and how is it matched with our written
grammar?
A: Yes, VoiceXML does support recording audio
from the caller, such as a personal voicemail message.
Once this audio has been recorded, the recording can
be played back and/or posted back to your Web server
for offline processing and permanent storage. The <record>
element in VoiceXML initiates a recording and stores
the result in a variable, which can then be used by
the <value> and <submit> elements for further
manipulation.
For
example (from the VoiceXML 1.0 spec):
<?xml version="1.0"?> <vxml version="1.0">
<form>
<record name="greeting" maxtime="10" dtmfterm="true" type="audio/wav"?
<prompt>At the tone, please say your greeting.</prompt>
<noinput>I didn't hear anything, please try again</noinput>
</record>
<field>
<prompt>Your greeting is <value expr="greeting"/></prompt>
<prompt>To keep it, say yes. To discard it, say no.</prompt>
<filled>
<if cond="confirm">
<submit next="save_greeting.pl" method="post" namelist="greeting"/>
</if>
<clear/>
</filled>
</field>
</form>
</vxml>
|
In
this example, the caller is prompted for a simple voicemail
greeting. The recording is played back to the caller
for confirmation, and given their approval is posted
back to the Web server (presumably for storage as the
caller's official greeting for this voicemail system.)
Here
are a few additional points about recorded audio and
VoiceXML:
- Audio
Formats. VoiceXML 1.0 does not specify any particular
file formats for recorded audio. The "type"
attribute allows the application developer to specify
which MIME type they would prefer. If not specified,
the recording "defaults to a platform-specific
format". VoiceXML 1.0 does not specify exactly
how a VoiceXML platform should advertise which formats
it does support, or how it should behave if a developer
requests an unsupported format. The documentation
for the VoiceXML platform you're using should provide
this information; however, most of today's commercially
available VoiceXML platforms support standard 8-bit,
8Khz RIFF-encoded Windows .WAV files.
- Grammars
active while recording. VoiceXML 1.0 specifies
the "modal" attribute for the <record>
element. If "modal" is set to 'true' (the
default), then no grammars are active while recording.
If, however, "modal" is specified as 'false',
then all appropriately scoped grammars will be active
while recording. If a grammar is matched while recording,
VoiceXML 1.0 does not explicitly specify what should
happen to the audio that was recorded thus far; implementations
may choose to discard the audio recorded thus far
and jump to the appropriate <filled> handler
for the matched grammar. That said, most if not all
commercially available VoiceXML platforms today do
not support simultaneous recording and recognition,
and do not support "modal=false" for <record>.
The VoiceXML 1.0 specification explicitly calls out
this point.
- HTTP
POST and audio data. When using the <submit>
element to POST audio data back to your Web server,
VoiceXML 1.0 does not explicitly specify how the VoiceXML
platform should send the data. Two methods in commercial
use today are HTTP multipart MIME form-data, and HTTP
URL-encoded data. Multipart form data is typically
three times smaller/faster than URL-encoded data,
though your Web server must be properly configured
to accept multipart form data. See the documentation
of your VoiceXML platform of choice for details on
how audio data is posted.
- Secure
POST of audio data. In order to securely POST
audio data over the Internet using SSL (Secure Sockets
Layer), both your Web server and your VoiceXML platform
of choice must be configured to support SSL. Not all
VoiceXML platforms support SSL for HTTP POST. That
said, this issue is typically only relevant if your
VoiceXML platform is running remotely on a different
network than your application Web servers.
Q:
Is VoiceXML only usable for telephony applications,
or can it also be used for PC (client) applications?
A:
VoiceXML 1.0 is explicitly designed for developing voice-enabled
telephony applications. That is why it includes some
elements for call control (e.g., <transfer>),
as well as the basics of voice recognition and audio
playback. However, nothing would prevent the development
of a VoiceXML interpreter/platform focused on PC applications.
For instance, many companies have begun using Web pages
and HTML as a way to develop self-contained "client
only" applications that don't explicitly require
external Internet access. A VoiceXML platform tailored
for PC applications would likely implement a subset
of the full VoiceXML 1.0 specification.
Continued...
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|