Introduction
to the W3C Grammar Format
(Continued
from Part 1)
Combining
Grammars
Much
of the power of the W3C grammar format comes from its
ability to build a grammar through combination of subgrammars.
This capability supports, for example, the building
and reuse of grammar libraries and the componentization
of large or complex natural language grammars. For example,
to support a user who says "I want to fly from
Boston to Miami next Wednesday at 9 a.m." a grammar
author might utilize library grammars for U.S. city
names, dates and times.
This
mechanism is supported by permitting a reference from
a rule in one grammar to a public rule in another grammar.
The following example illustrates how the travel command
might be structured.
<rule id="travel" scope="public"> I want to fly from <ruleref uri="locations.xml#UScity"/> to <ruleref uri="locations.xml#UScity"/> <count num="optional">on</count> <ruleref uri="date.xml"/> <count num="optional">at</count> <ruleref uri="time.xml"/> </rule>
|
It
is common when writing a grammar to repeatedly reference
another grammar. The grammar format provides an <import>
element to avoid repeating a long URI. The import declares
a local alias for the URI.
<import uri="http://www.mygrammars.com/cities-states.xml" name="places"/>
|
When
using a rule reference with the import alias the syntax
is slightly different.
<ruleref import="places#city"/>
|
Example Grammar
The
following example is a complete grammar that uses all
the capabilities described above. The root rule is "basicCommand".
It permits users to speak commands such as "please
move the window" and "open a file".
This
example also demonstrates usage of the <example>
element, which permits a developer to document a rule
with examples of how the rule might be spoken.
<?xml version="1.0"?>
<grammar xml:lang="en" version="1.0" root="basicCommand">
<import name="polite"
uri="http://www.sayplease.com/politeness.xml"/>
<rule id="basicCommand" scope="public">
<example>please move the window</example>
<example>open a file</example>
<!-- A sequence of 3 rule references -->
<ruleref import="polite#startPolite"/>
<ruleref uri="#command"/>
<ruleref import="polite#endPolite"/>
</rule>
<rule id="command">
<example>move the window</example>
<!-- A sequence of 2 rule references -->
<ruleref uri="#action"/> <ruleref uri="#object"/>
</rule>
<rule id="action">
<one-of>
<item>open</item>
<item>close</item>
<item>delete</item>
<item>move</item>
</one-of>
</rule>
<rule id="object">
<count number="optional">
<one-of>
<item>the</item>
<item>a</item>
</one-of>
</count>
<one-of>
<item>window</item>
<item>file</item>
<item>menu</item>
</one-of>
</rule>
</grammar>
|
Advanced Features
The
W3C Speech Recognition Grammar Format includes a range
of more advanced features. Following is a summary of
those capabilities. Readers may find examples of the
usage of these features in the specification available
at the W3C web site (http://www.w3.org/TR/speech-grammar/).
- Internationalization:
Both the ABNF and XML grammar formats permit the use
of a wide range of character encodings. For example,
Shift-JIS can be used for the Japanese character sets
(Kanji, katakana, and hiragana). The grammar format
also allows the mixing of more than one language in
a grammar--or even in a single utterance.
- Tags:
A tag is an annotation to an expansion in a rule definition.
The tag does not affect the recognition performance
(what a user can say or how the recognizer performs
the recognition task). The tags are included as a
placeholder for semantic interpretation capabilities
that will be defined in a future draft of the specification.
- Special
rules: Rules are defined for "NULL",
"VOID" and "GARBAGE". These rules
have specialized uses, but these are beyond the scope
of this article.
- Weights:
Each alternative defined in a <one-of> element
can have a weight attached that indicates the likelihood
that it is spoken. A speech recognizer can use this
information to improve accuracy.
- Root
rule: A grammar may optionally define a root rule.
A rule reference to a grammar that does not indicate
a specific rule contained within the grammar is implicitly
a reference to the root rule of the grammar.
- DTMF
grammars: In addition to supporting speech recognition,
the grammar format can be used to define patterns
of DTMF input to a telephony browser.
- Conformance:
The grammar specification has a detailed conformance
statement that, in combination with the body of the
specification, indicates required behavior for a speech
recognizer or any other processor that supports the
grammar format.
Futures
The
grammar specification has not yet been finalized by
the W3C Voice Browser Working Group. The W3C processes
support and encourage public review of all draft specifications
before they are formally pronounced as "Recommendations".
The W3C Voice Browser Working Group has already received
public feedback and will continue to address comments
as they are received.
The
following are some of the areas in which the working
group is still actively evolving the specification.
- Semantic
interpretation: The current grammar format allows
a developer to define acceptable speech input but
does not indicate how to convert that input into a
computer-actionable form that can be processed by
a VoiceXML application. For example, the order "I'd
like 3 copies of Harry Potter" might be converted
to the form {book: "Harry Potter"; number:
3}. The working group is currently considering proposals
for a language that can be contained within the tags
to perform this interpretation.
- Pronunciations:
It is important in many applications for the grammar
writer to inform the recognizer of the pronunciation
of words. For example, in English, personal names,
place names and company names often have tricky pronunciations.
The current draft has no mechanism for supporting
pronunciations, thoughproposals are currently under
consideration.
- MIME
types: The W3C Voice Browser Working Group has
applied for MIME types for grammars so that Web servers
can indicate can indicate content and so that VoiceXML
browsers and speech recognizers can optimize loading.
Conclusion
The
new W3C Speech Recognition Grammar Format is a powerful
language for developing both simple grammars and natural
language grammars for use in VoiceXML applications.
The availability of a standard grammar format will increase
the interoperability of VoiceXML applications by allowing
each grammar to be authored once and reused across many
VoiceXML browsers.
References
[1] W3C Voice Browser Working Group (http://www.w3.org/Voice/)
[2] W3C Speech Recognition Grammar Format (http://www.w3.org/TR/speech-grammar/)
[3] VoiceXML 1.0 specification (http://www.w3.org/TR/voicexml/)
[4] JSpeech Grammar Format (http://www.w3.org/TR/jsgf/)
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|