|
N-best Recognition
Results
Welcome
to First Words, VoiceXML Review's column that teaches
you about VoiceXML and how you can use it. We hope you
enjoy the lesson.
This
month, we're going to take a quick look at N-best
results, and how to get at them from within VoiceXML.
Today's
speech recognizers are usually capable of providing
not only a single recognition result, but also a list
of candidate results, ordered by some criteria.
Why
would we want to do this? The main reason is to provide
the application with more flexibility with regards to
how it handles user interaction. By receiving a list
of results from the recognizer, the VoiceXML application
can react more reasonably in the event that the 'best'
result is not the correct one.
Consider
the situation where the user is prompted for a city
name for some purpose (say a travel destination). If
the recognizer returns 'Austin', but the user said 'Boston',
the application has a couple
of different options:
- Re-prompting
the user, potentially performing the same misrecognition;
- Constrain
the user further (Austin Texas, versus Boston Massachusetts);
- Remove
'Austin' from the grammar, and try again;
With
the availability of a list of results, the application
has another option: prompt the user with the next result
from the list. Or, if the first two results are very
close in confidence, then the application might offer
the user the option to select between the two. In any
event, this capability provides the application designer
with another tool to use when developing a user-friendly
VoiceXML interface.
N-best
in VoiceXML
VoiceXML
provides access to N-best results in a well-defined
way. We will being with our tried and true pizza interface:
<?xml version = "1.0"?>
<vxml version = "2.0">
<form>
<block>
<prompt>
Welcome to the Voice X M L review pizza franchise
</prompt>
</block>
<field name = "orderItem">
<grammar>
pizza | pizzas | pie | drinks | salad | wings
</grammar>
<grammar mode = "dtmf">
1 {pizza} | 2 {drinks} | 3 {salad} | 4 {wings}
</grammar>
<prompt>
What would you like to order? We have pizza, drinks, salad or wings.
</prompt>
<noinput>
Say pizza, drinks, salad, or wings.
</noinput>
<nomatch>
You can say pizza, drinks, salad, or wings.
</nomatch>
<filled>
You said
<value expr = "orderItem" />
</filled>
</field>
</form>
</vxml>
|
We've
slimmed this down a bit from our earlier examples to
focus in on the task at hand. As you can see from the
example, we prompt the user, and ask them what they
would like to order. Once we say something in-grammar,
the item will be read back to us.
In
order to extend this to support N-best, we need to decide
how many results we want to receive. This is done with
the maxnbest property:
<property name="maxnbest" value="5"/>
VoiceXML specifies that this defines the maximum number
of results you will receive. Your application should
be prepared to receive fewer than this number (i.e.,
anywhere from one to maxnbest results can be
returned).
N-best
results are made available in the ECMAScript object
application.lastresult$.
You
may recall that a field has an associated shadow variable
fieldname$, with component members fieldname$.confidence,
fieldname$.utterance, fieldname$interpretation, and
fieldname$.inputmode. These define the recognizer confidence,
the actual utterance (words) matched by the recognizer,
the semantic interpration returned by the recognizer,
and the input mode (DTMF or Voice) for the collected
data. The shadow variable application$.lastresult provides
similar structure, with the exception that it is an
array of objects, application$.lastresult[n] with members
application$.lastresult[n].confidence, application$.lastresult[n].utterance,
application$.lastresult[n].inputmode, and application$.lastresult[n].interpretation.
The array includes the N-best results (numbered from
0 to a maximum size of maxnbest-1). You should check
the length of the array using the application$.lastresult.length
member before performing any operations based on application$lastresult.
Note also that you can application$.lastresult.confidence,
etc., correspond to the 0th element of the application$.lastresult
array.
Continued...
back
to the top
Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|