|
Developing
Multimodal Applications using XHTML+Voice
Executive Summary
On
the Internet, people use browsers to visit Web sites,
access documents from networks, and fill out forms. With
this growing capability to retrieve information, communications
between users and their devices is receiving more attention.
As devices become smaller, other means of input -- in
addition to keyboard or tap screen -- are becoming necessary.
Small handheld devices, including cell phones and PDA’s,
now contain sufficient processing power to handle multiple
tasks. On some devices it is difficult to perform these
tasks using only keyboard, stylus, or handwriting recognition.
This has lead to a new application technology called multimodal,
the use of multiple methods of communication between the
user and a device. These methods include keypad, touch
or tap screen, handwriting recognition, and voice recognition.
This paper illustrates the basic structure and contents
of an XHTML+Voice multimodal application, describing its
fundamental building blocks. It is intended for those
who are familiar with XHTML, VoiceXML, and HTML.
Each of the building blocks is described and coding samples
are provided. A multimodal implementation of a hypothetical
Pizza Order Form application is presented as an example.
The Structure of an XHTML+Voice Application
A basic XHTML+Voice multimodal application consists of
a Namespace Declaration, Visual Part, Voice Part, and
a Processing Part. Figure 1 illustrates these components
and their relationship to each other.
Namespace Declaration
The Namespace Declaration for a typical XHTML+Voice application
is written in XHTML, with additional declarations for
VoiceXML, and XML-events. Figure 2 is an example of the
namespace declaration for an XHTML+Voice application.
<?xml version="1.0" encoding="iso-8859-1"
?> <!DOCTYPE html PUBLIC "-//W3C/DTD
XHTML+Voice 1.0/EN" "xhtml+voice.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:vxml="http://www.w3.org/2001/vxml"
xml:lang="en_US" > |
Figure 2 -- Namespace declaration
Visual Part
The Visual Part of an XHTML+Voice application is XHTML
code that is used to display the various form elements
to the device’s screen, if available. This can be
ordinary XHTML code and may include check boxes and other
form items that are found For example, Figure 3 displays
the pizza size choices and their appropriate radio buttons.
Figure 4 illustrates a typical form using XHTML+Voice.
<b>Size:</b><br/>
<input
type="radio" name="size" id="sizeSmall"
ev:event="focus" ev:handler="#voice_size"/>
Small
12"
<input
type="radio" name="size" id="sizeMedium"
ev:event="focus" ev:handler="#voice_size"/>
Medium
16"
<input
type="radio" name="size" id="sizeLarge"
ev:event="focus" ev:handler="#voice_size"/>
Large
22" |
Figure 3 -- Visual part of a multimodal application
Voice Part
The Voice Part of an application is the section of code
that is used to prompt the user for a desired field within
a form. This VoiceXML code utilizes an external grammar
to define the possible field choices. If there are many
choices, or a combination of choices is required, the
external grammar can be used to handle the valid combinations.
For example, to select the vegetable toppings for a pizza,
there are multiple ways to say the selections. The VoiceXML
code in Figure 5 is used with the vegtoppings.jsgf grammar
file to prompt the user to select vegetable toppings for
the pizza. To add additional vegetable topping choices,
modify the vegtoppings.jsgf file.
Continued...
back
to the top
Copyright
© 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|