Factors and Voice Applications
it's 2001 and most VoiceXML applications do not quite
sound like Space Odyssey's Dave and HAL. But there have
been giant leaps in speech recognition technology and
voice services over the last several years. While the
computer has not attained the sophisticated processing
power of a human, if you understand the rules of user
interface design and you know the limitations of the
technology, you can create effective human-machine speech
you want to design a user interface for speech applications?
Can you make it appear natural, conversational, and
maybe even fun? This article will introduce some basic
procedures for bringing human factors design principles
to the design of voice applications.
factors refers to the impact of human cognition and
behavior on how systems are used. In this article, my
focus will be on human machine behavior when the application
domain is a speech recognition application, in particular,
speaker independent applications run over a telephone
least three things define a good user experience:
The application must be easy and efficient to use.
2. The application must provide something useful.
3. The application should be compelling.
user interface (UI) design shapes the quality of the
user experience. Following a user centered design methodology
will help toward the creation of a good user experience.
the goal of user centered design is to ensure that applications
are easy to use, the application should be designed
from the perspective of the end user, not from the perspective
of the programming language or platform.
any user interface we must deal with a human who is
often less predictable than a machine. In a speech user
interface human behavior is even less predictable than
with other technologies because the inputs are unconstrained.
The user can say anything. In contrast, an IVR or a
web application constrains the possible inputs via the
telephone keypad or PC keyboard. In addition, with a
speech application, the machine recognition is not going
to be 100% accurate and the recognition will not be
consistent across user populations and environments
(e.g., background noise levels).
centered design methods can be used to aim design toward
the goal of identifying the range of user speech behaviors
and toward constraining the user interface as much as
1 - User Centered Design for Voice Applications
following sections define a user centered design methodology
for speech applications. Of course, depending on the
needs of the project (e.g., complexity, time frame,
etc.), one might be tempted to cut corners. But each
of the steps should be addressed at least to some degree.
practical contextual questions include: What type of
service is it? Who will be using it? What tasks are
they trying to complete? What information are they trying
to retrieve? Where will they be (physically)? How risky
are the transactions? If the answers to these questions
are not known, then some of the design decisions may
not be optimal.
Use case analysis: Identify the context and
user scenarios that are likely to be the most common.
When an application tries to do too many things, the
primary uses may be hard to find. With a narrow focus,
the application design will have a better chance of
meeting a user's needs. By understanding the user
needs, the design can be tailored to meet them.
Review market research: When you build an application
that is similar to an existing product, an analysis
of those products will serve as a starting point from
which improvements can be made. This research can
provide insights into what features people use and
what commands make the interaction successful.
practical contextual questions include: What is the
structure of the flow? What prompts will elicit the
desired spoken commands? Will there be help upon request?
How will errors be handled? Are there several applications
that will be used together? Answers to these questions
guide the initial design.
Initial Design: The representational form of
the design can assume many different formats. One
that is easy to review, develop from, and test to,
is a flow diagram. An initial design will show all
of the application states, define the grammars or
option lists for each state, and define how help should
be handled. It will define how to handle system misrecognitions
and cases where no speech energy is detected. It will
specify how many of the various error types the application
will permit and what useful message the application
prompt will play back for each error.
The initial design will also specify what command
words will be supported by the application and whether
or not they are context specific or universal in nature.
For example, "Chicago, Illinois" may be
context specific because it is only accepted when
at the prompt for a location. "Main Menu"
could be accepted anywhere and may always return you
to the application's main menu. "Help" may
be accepted anywhere, but could go to a message that
is context-sensitive to the state from which it is
Design review: In a design review, the user
centered designer will meet with the marketing team
and the development team. The marketing team needs
to verify the feature set. The developer needs to
assess whether the design is feasible and to assess
Revise design: The result of a good design
review is the input needed for a revised design.
A Some practical contextual questions include: When
prompted, do users say what you intended? Can users
complete the important tasks without errors? Is the
recognition accuracy acceptable? This part of the user
centered design process may be the most important one
yet the one frequently overlooked. This phase demonstrates
whether user utterances conform to the design and whether
the utterances are recognized by the system.
Prototype: Like a design review, an application
prototype provides an opportunity to review the design,
this time from a usage perspective. It is common for
even a carefully constructed initial design to miss
some important cases. It is also common to learn that
some of the grammars or option lists contain words
that the recognizer does not recognize with acceptable
accuracy. It is also instructive to observe that some
voice prompts elicit user commands that are not covered
in the grammar.
When building the prototype, it is faster and cheaper
to use text-to-speech (TTS) instead of stored speech,
at least initially. The TTS will allow you to make
quick changes to the design. After the design seems
stable, it may be appropriate to upgrade to recorded
prompts from TTS.
Test: In a more formal evaluation, the usability
test is one where prospective users of the service
actually place calls to the prototype application
to assess its ease of use, error handling, and overall
acceptability. Data are collected and analyzed to
obtain an empirical basis for the next round of design
Revise design: Similar to the paper design
review, the usability test will almost always identify
areas where the application design needs to be improved.
What can be tricky here is that by fixing one problem,
it is possible to create a new problem.
Prompt and Grammar Tuning: This step will make
or break the success of the application. It is imperative
to ensure that allowable words are not acoustically
so similar that the system might confuse them by mistake.
It is also imperative to ensure that the prompts elicit
allowable utterances. By establishing a goal and taking
measurements, the application can be tuned toward
reaching the goal.
to the top
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
Industry Standards and Technology Organization