VoiceXML Review - Feature Articles

Volume 1, Issue 4 - April 2001

Introduction to the W3C Grammar Format

By Andrew Hunt

(Continued from Part 1)

Combining Grammars

Much of the power of the W3C grammar format comes from its ability to build a grammar through combination of subgrammars. This capability supports, for example, the building and reuse of grammar libraries and the componentization of large or complex natural language grammars. For example, to support a user who says "I want to fly from Boston to Miami next Wednesday at 9 a.m." a grammar author might utilize library grammars for U.S. city names, dates and times.

This mechanism is supported by permitting a reference from a rule in one grammar to a public rule in another grammar. The following example illustrates how the travel command might be structured.

<rule id="travel" scope="public">
	I want to fly
	from <ruleref uri="locations.xml#UScity"/>
	to <ruleref uri="locations.xml#UScity"/>
	<count num="optional">on</count> <ruleref uri="date.xml"/>
	<count num="optional">at</count> <ruleref uri="time.xml"/>
</rule>

It is common when writing a grammar to repeatedly reference another grammar. The grammar format provides an <import> element to avoid repeating a long URI. The import declares a local alias for the URI.

<import uri="http://www.mygrammars.com/cities-states.xml"
		name="places"/>

When using a rule reference with the import alias the syntax is slightly different.

<ruleref import="places#city"/>

Example Grammar

The following example is a complete grammar that uses all the capabilities described above. The root rule is "basicCommand". It permits users to speak commands such as "please move the window" and "open a file".

This example also demonstrates usage of the <example> element, which permits a developer to document a rule with examples of how the rule might be spoken.

<?xml version="1.0"?>


<grammar xml:lang="en" version="1.0" root="basicCommand">

<import name="polite"
        uri="http://www.sayplease.com/politeness.xml"/>

<rule id="basicCommand" scope="public">
  <example>please move the window</example>
  <example>open a file</example>

  <!-- A sequence of 3 rule references -->
  <ruleref import="polite#startPolite"/>
  <ruleref uri="#command"/>
  <ruleref import="polite#endPolite"/>
</rule>

<rule id="command">
  <example>move the window</example>
  <!-- A sequence of 2 rule references -->
  <ruleref uri="#action"/> <ruleref uri="#object"/>
</rule>

<rule id="action">
  <one-of>
    <item>open</item>
    <item>close</item>
    <item>delete</item>
    <item>move</item>
  </one-of>
</rule>

<rule id="object">
  <count number="optional">
    <one-of>
      <item>the</item>
      <item>a</item>
    </one-of>
  </count>
  <one-of>
    <item>window</item>
    <item>file</item>
    <item>menu</item>
  </one-of>
</rule>
</grammar>

Advanced Features

The W3C Speech Recognition Grammar Format includes a range of more advanced features. Following is a summary of those capabilities. Readers may find examples of the usage of these features in the specification available at the W3C web site (http://www.w3.org/TR/speech-grammar/).

Internationalization: Both the ABNF and XML grammar formats permit the use of a wide range of character encodings. For example, Shift-JIS can be used for the Japanese character sets (Kanji, katakana, and hiragana). The grammar format also allows the mixing of more than one language in a grammar--or even in a single utterance.
Tags: A tag is an annotation to an expansion in a rule definition. The tag does not affect the recognition performance (what a user can say or how the recognizer performs the recognition task). The tags are included as a placeholder for semantic interpretation capabilities that will be defined in a future draft of the specification.
Special rules: Rules are defined for "NULL", "VOID" and "GARBAGE". These rules have specialized uses, but these are beyond the scope of this article.
Weights: Each alternative defined in a <one-of> element can have a weight attached that indicates the likelihood that it is spoken. A speech recognizer can use this information to improve accuracy.
Root rule: A grammar may optionally define a root rule. A rule reference to a grammar that does not indicate a specific rule contained within the grammar is implicitly a reference to the root rule of the grammar.
DTMF grammars: In addition to supporting speech recognition, the grammar format can be used to define patterns of DTMF input to a telephony browser.
Conformance: The grammar specification has a detailed conformance statement that, in combination with the body of the specification, indicates required behavior for a speech recognizer or any other processor that supports the grammar format.

Futures

The grammar specification has not yet been finalized by the W3C Voice Browser Working Group. The W3C processes support and encourage public review of all draft specifications before they are formally pronounced as "Recommendations". The W3C Voice Browser Working Group has already received public feedback and will continue to address comments as they are received.

The following are some of the areas in which the working group is still actively evolving the specification.

Semantic interpretation: The current grammar format allows a developer to define acceptable speech input but does not indicate how to convert that input into a computer-actionable form that can be processed by a VoiceXML application. For example, the order "I'd like 3 copies of Harry Potter" might be converted to the form {book: "Harry Potter"; number: 3}. The working group is currently considering proposals for a language that can be contained within the tags to perform this interpretation.
Pronunciations: It is important in many applications for the grammar writer to inform the recognizer of the pronunciation of words. For example, in English, personal names, place names and company names often have tricky pronunciations. The current draft has no mechanism for supporting pronunciations, thoughproposals are currently under consideration.
MIME types: The W3C Voice Browser Working Group has applied for MIME types for grammars so that Web servers can indicate can indicate content and so that VoiceXML browsers and speech recognizers can optimize loading.

Conclusion

The new W3C Speech Recognition Grammar Format is a powerful language for developing both simple grammars and natural language grammars for use in VoiceXML applications. The availability of a standard grammar format will increase the interoperability of VoiceXML applications by allowing each grammar to be authored once and reused across many VoiceXML browsers.

References

[1] W3C Voice Browser Working Group (http://www.w3.org/Voice/)

[2] W3C Speech Recognition Grammar Format (http://www.w3.org/TR/speech-grammar/)

[3] VoiceXML 1.0 specification (http://www.w3.org/TR/voicexml/)

[4] JSpeech Grammar Format (http://www.w3.org/TR/jsgf/)

back to the top

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).