Introduction
to the W3C Grammar Format
Introduction
The
W3C
Voice Browser Working Group [1] has released a draft
specification for the W3C
Speech Recognition Grammar Format [2] that promises
to enhance the interoperability of VoiceXML Browsers
and drive the portability of VoiceXML applications.
This article summarizes the key features of the draft
specification and the application of the specification
to VoiceXML application development.
The
role of grammars in a spoken dialog application is to
define for the VoiceXML browser the words and patterns
of words that a user can say at any particular point
in a dialog. For example, the following grammar allows
a caller to say the name of one of four cities: "New
York," "Sydney," "Boston,"
or "Berlin."
<?xml version="1.0"?>
<grammar xml:lang="en" version="1.0">
<rule id="city" scope="public">
<one-of>
<item> new york </item> <item> sydney </item>
<item> boston </item>
<item> berlin </item>
</one-of>
</rule>
</grammar>
|
Grammar
authoring is a critical facet in the development of
robust, usable telephony speech applications. When an
application's grammars accurately model the speech input
from callers, the usability of the application is enhanced
and caller satisfaction is likely to be higher. With
the rapid growth of the speech technology market and
the increasing deployment of commercial applications,
grammar authoring is becoming an important skill for
speech developers and is increasingly becoming an area
of specialization.
The
VoiceXML
1.0 specification [3] documents the use of the Java
Speech Grammar Format (JSGF) to describe grammars but
does not mandate that browsers support JSGF. Despite
the use of JSGF in the VoiceXML 1.0 specification, the
language is agnostic to the grammar format and it is
acceptable for an application to use any grammar format
supported by a browser.
Current
deployments of VoiceXML and other speech applications
most often use proprietary grammar formats; typically,
the native format of the speech recognizer embodied
in the browser. However, with VoiceXML there is a promise
of platform interoperability for the application and
thus a compelling need to standardize upon a common
cross-platform grammar format.
The
VoiceXML 2.0 specification [see
footnote] being developed in the W3C will require
that all VoiceXML 2.0 browsers support the XML Form
of the W3C Speech Recognition Grammar Format. This
will provide a common baseline for grammar interoperability.
The W3C grammar specification is modeled on the JSpeech
Grammar Format [4], submitted to the W3C by Sun
Microsystems in June 2000. The current grammar draft
is in its Last Call release and is planned to proceed
to finalization by late 2001. The W3C process encourages
open participation and comments on the current draft
are welcome.
Two
Grammar Standards!
The
W3C Speech Recognition Grammar Format specification
embodies two equivalent languages.
- XML
Form of the W3C Speech Recognition Grammar Format:
Represents a grammar as an XML document with the logical
structure of the grammar captured by XML elements.
This format is ideal for computer-to-computer communication
of grammars because widely available XML technology
(parsers, XSLT, etc.) can be used to produce and accept
the grammar format.
- Augmented
BNF (ABNF) Form of the W3C Speech Recognition Grammar
Format: The logical structure of the grammar is
captured by a combination of traditional BNF (Backus-Naur
Form) and a regular expression language. This format
is familiar to many current speech application developers,
is similar to the proprietary grammar formats of most
current speech recognizers and is a more compact representation
than XML. However, a special parser is required to
accept this format.
Grammars
written in either format can be converted to the other
format without loss of information (except formatting).
The two formats co-exist because the Working Group found
it important to support both computer-to-computer communication
format and a more familiar human-readable format (but,
as with all decisions reached by a committee, there
is a spectrum of opinion on these matters).
Importantly,
the Working Group has decided that the XML Grammar Format
is the required grammar format for VoiceXML 2.0;
that is, all compliant VoiceXML 2.0 browsers will be
required to support the XML Grammar Format. Support
for the ABNF format is recommended, but optional.
As
a result, the XML language is used for most examples
in this article. For examples of ABNF see the W3C specification
(http://www.w3.org/TR/speech-grammar/).
Basic
Grammar Document
The
body of a grammar defines a set of rules. Each
rule has a name and that name must be unique within
the grammar. The scope of each rule is declared as either
public or private. A public rule may be
activated for recognition; for example, when referenced
by a <grammar> element in VoiceXML. A public rule
may also be imported into other grammars. All
non-public rules are private. Private rules can be referenced
only by other rules within the same grammar but they
can reference public rules imported from other grammars.
This public/private distinction should be familiar to
Java developers.
Most
importantly a rule defines an expansion that
declares how the rule is expanded into words, references
to other rules and patterns of words and references.
Rules
and Tokens
Words,
or more precisely tokens, are the basic units
of a grammar and indicate those things that a user can
say. Any token is a legal expansion in a rule definition.
If a token contains white-space (e.g., "Rio de
Janeiro") it should be contained in quotes. Sequences
of individual tokens are separated by white space and
the sequence is a legal expansion. Tokens can be enclosed
in a <token> element that may be used to indicate
the language of the contained token. For example:
hello new york "Rio de Janeiro" to be or not to be <token xml:lang="fr">francois corriveau</token> <!-- French -->
|
A
rule reference is a legal expansion and is represented
by a <ruleref> element. A rule reference is equivalent
to a non-terminal reference in a traditional grammar.
The referenced rule is provided by a URI. The referenced
rule may be local to the grammar, in which case the
URI is of the form "#rulename". The referenced
rule may be any public rule of another grammar in which
case a relative URI or absolute URI is used. The <ruleref>
element is always an empty element (contains no text
or other elements).
<ruleref uri="#city"/> <ruleref uri="../locations.xml#city"/> <ruleref uri="http://myexample.com/grammars/locations.xml#city"/>
|
Logical
Operations
A
sequence of legal expansions is itself a legal expansion.
The sequence may be surrounded in an <item> element
or other elements such as <count> or <rule>.
As mentioned previously, tokens in sequence should be
separated by white space. Sequential elements other
than tokens (the <token>, <ruleref>, <item>,
<count> and <one-of> elements) do not require
white-space separation. The following are each examples
of sequences:
phone home call the "Rio de Janeiro" office call<ruleref uri="#location"/> <item>call <ruleref uri="#location"/></item> <count num="optional">please</count> call home
|
The
<one-of> element is used to declare a set of alternative
expansions. The <one-of> element must contain
one or more <item> elements, each of which declares
one of the alternatives. In the following example, each
alternative is a single token but any legal expansion
can be contained within the item.
<one-of> <item> new york </item> <item> sydney </item> <item> boston </item> <item> berlin </item> </one-of>
|
The
<count> element indicates that the expansion it
contains might be optional (zero of one occurrences),
or may occur zero-or-more or one-or-more times.
this is <count num="optional">not</count> good this is <count num="0+">very</count> good
|
Continued...
Footnote
1:
While we fully expect that the dialog language from
the W3C will be called VoiceXML 2.0, it's not official
until we have the first public document from the W3C
using this name.
(return to text)
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|