In this monthly column, an industry expert will
answer common questions about VoiceXML and related technologies.
Readers are encouraged to submit questions about VoiceXML,
including development, voice-user interface design,
and speech technology in general, or how VoiceXML is
being used commercially in the marketplace. If you have
a question about VoiceXML, e-mail it to
speak.and.listen@voicexmlreview.org
and be sure to read future issues of VoiceXML Review
for the answer.
By Matt Oshry
Q: Some dialogs within my voice application
use very large or ambiguous grammars, so the recognition
can be tricky. In those situations I may need to confirm
with the user that my application received the user's
intended response from the recognizer. How do I decide
when to confirm the selected response with the user?
A:
The technique you're referring to is called "confidence-based
confirmation", and there are several ways to implement
it in VoiceXML. Let's start with a simple dialog that
requests a city and state from the user:
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" > <form> <field name="where"> <prompt>Say a city and state.</prompt> <grammar type="application/srgs+xml" mode="voice" src="citystate.grxml"/> <noinput> Sorry. I didn't hear you. <reprompt/> </noinput> <nomatch> Sorry. I didn't get that. <reprompt/> </nomatch> <filled> <log>where = <value expr="where"/></log> </filled> </field> </form> </vxml> |
If
the filled element gets executed, we know that the
recognizer was at least as confident as the value of
the confidencelevel property. Otherwise a nomatch
event is thrown. According to section 6.3.2 of the
VoiceXML 2.0 specification, the default confidencelevel
is 0.5, but you can tweak the value to be as low
as 0.0 or as high as 1.0. Assuming you haven't modified
the property, we know the confidence score for the
result is somewhere between 0.5 and 1.0. We can determine
the exact value by checking either of the following:
where$.confidence
application.lastresult$.confidence
Using
the confidence score from lastresult$ can get tricky
if your grammar fills multiple slots, and 2.3.1 of
the VoiceXML 2.0 specification states
that the "distinction between field and utterance
level confidence is platform dependent", so we'll
utilize the confidence shadow variable of the field for
maximum portability across Voice Browser implementations.
Now
that we know how to access the confidence score,
we need to pick a threshold. When the confidence
score exceeds the threshold, we'll assume the recognizer
is correct; otherwise, we'll confirm the recognizer's
selection
with the user. For this example I'll choose 0.75,
but you should consult your resident speech expert,
since the threshold will vary depending on the grammar
you're using.
Here's
an implementation of a confidence-based confirmation
dialog that leverages the Form Interpretation Algorithm
(FIA) by embedding the confirmation dialog in the
same form as the field that collects the city and
state. The cond attribute of the confirm field controls
whether or not the interpreter executes it.
If the cond attribute evaluates to false, FIA skips
the confirm field and selects and executes the block.
if the cond attribute evaluates to true, FIA selects
the confirm field. If the user says 'no' in response
to the confirm field, execution of the clear resets
the guard condition on the where and confirm fields
making them eligible for selection again during the
next iteration of the FIA's main loop. In fact, the
block will not be executed until either
1)
the form item variable where is filled, and the associated
confidence score is greater than or equal to 0.75,
or
2)
the user says 'yes' in response to the confirm field's
prompt.
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" >
<catch event="noinput nomatch"> Sorry. I didn't get that. <reprompt/> </catch>
<form> <field name="where"> <prompt>Say a city and state.</prompt> <grammar type="application/srgs+xml" mode="voice"
src="citystate.grxml"/> </field>
<field name="confirm" cond="typeof(where) != 'undefined' && where$.confidence < 0.75"> <prompt> I heard you say <value expr="where"/>, is that correct? </prompt> <grammar type="application/srgs+xml" src="yesno.grxml"/> <filled> <if cond="confirm == 'no'"> <clear namelist="confirm where"/> </if> </filled> </field>
<block> <submit next="listing.cgi" namelist="where"/> </block>
</form> </vxml> |
What
if the recognizer never obtains a confidence score
greater than or equal
to 0.75, and the user repeatedly responds 'no' to the
confirm dialog? After the second and certainly the
third confirmation attempt, the user is likely to
give up. You can preempt this situation by keeping
track
of the number of times the user is asked to confirm
her choice and take an appropriate action, for example,
transferring the user to your call center. If you don't
have that luxury, you can attempt to obtain
the information from the user in a different way -
for example, via DTMF ("type the first few letters..."),
or by presenting a list of choices.
Here's
a slightly modified version of the previous example
that keeps track of confirmation attempts using a
variable named confirmCount. Because the confirmCount
variable
is declared at form scope, it automatically gets
initialized each time you enter the form. When the
confirmCount reaches 2 in the confirm dialog, the
application throws the event "com.yourcompany.yourapp.transfertoagent".
The handler for this event presumably navigates to
a form containing a transfer element that whisks
the user off to a customer care representative.
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" >
<catch event="noinput nomatch"> Sorry. I didn't get that. <reprompt/> </catch>
<form> <var name="confirmCount" expr="0"/>
<field name="where"> <prompt>Say a city and state.</prompt> <grammar type="application/srgs+xml" mode="voice"
src="citystate.grxml"/> </field>
<field name="confirm" cond="typeof(where) != 'undefined' && where$.confidence < 0.75"> <prompt> I heard you say <value expr="where"/>, is that correct? </prompt> <grammar type="application/srgs+xml" src="yesno.grxml"/> <filled> <if cond="++confirmCount < 2"> <clear namelist="confirm where"/> <else/> <throw event="com.yourcompany.yourapp.transfertoagent"/> </if> </filled> </field>
<block> <submit next="listing.cgi" namelist="where"/> </block>
</form> </vxml> |
You can reuse the confirm field anywhere you require confidence-based
confirmation
by doing the following:
- Copy and paste the confirm field into the form after
the field that requires confidence-based confirmation.
- Replace
the use of the variable 'where' with the value of the
name attribute of the data collection field.
The data collection field is referenced in three places:
- The
cond attribute of the confirm field
- The prompt
- The namelist attribute of the clear element
- Adjust
the confidence threshhold to an appropriate value.
- If
you track the number of confirmation attempts, be sure
to declare the counter within the form
into which you copy the confirmation field.
Q:
According
to the latest draft of VoiceXML 2.1, the markname and
marktime variables store values corresponding to the
mark that was last executed "before
barge-in occurred or the end of audio playback occurred." If
the user listens to the prompt in its entirety, how can
my application accurately detect the user's reaction
time if they speak during the timeout interval?
A: Your
careful reading of the specification is correct. If the
user doesn't barge-in on the prompt, the marktime property
will only reflect the interval between the last mark that
was executed and the end of audio playback. It will not
include the timeout interval. You can extend the design
of this feature setting the timeout property to zero seconds
and by adding to the end of the prompt queue a silent audio
file (e.g. timeout.wav) the duration of which is equivalent
to the desired timeout. Here's some sample code:
<field name="city"> <property name="timeout" value="0s"/> <grammar mode="voice" src="citystate.srgs"/> <prompt> <mark name="pre"/> <audio src="citystate.wav">say a city and state</audio> <mark name="timeout"/> <audio src="timeout.wav"/> </prompt> <filled> <log> markname=<value expr="city$.markname"/>, marktime=<value expr="city$.marktime"/> </log> Okay. <value expr="city"/> </filled> </field> |
|