VoiceXML Review - Feature Articles

Volume 1, Issue 6 - June 2001

Human Factors and Voice Applications

By Ed Halpern

The Promise

So it's 2001 and most VoiceXML applications do not quite sound like Space Odyssey's Dave and HAL. But there have been giant leaps in speech recognition technology and voice services over the last several years. While the computer has not attained the sophisticated processing power of a human, if you understand the rules of user interface design and you know the limitations of the technology, you can create effective human-machine speech dialogues.

The Practicalities

Do you want to design a user interface for speech applications? Can you make it appear natural, conversational, and maybe even fun? This article will introduce some basic procedures for bringing human factors design principles to the design of voice applications.

Human factors refers to the impact of human cognition and behavior on how systems are used. In this article, my focus will be on human machine behavior when the application domain is a speech recognition application, in particular, speaker independent applications run over a telephone network.

At least three things define a good user experience:

1. The application must be easy and efficient to use.
2. The application must provide something useful.
3. The application should be compelling.

The user interface (UI) design shapes the quality of the user experience. Following a user centered design methodology will help toward the creation of a good user experience.

User Centered Design

Since the goal of user centered design is to ensure that applications are easy to use, the application should be designed from the perspective of the end user, not from the perspective of the programming language or platform.

In any user interface we must deal with a human who is often less predictable than a machine. In a speech user interface human behavior is even less predictable than with other technologies because the inputs are unconstrained. The user can say anything. In contrast, an IVR or a web application constrains the possible inputs via the telephone keypad or PC keyboard. In addition, with a speech application, the machine recognition is not going to be 100% accurate and the recognition will not be consistent across user populations and environments (e.g., background noise levels).

User centered design methods can be used to aim design toward the goal of identifying the range of user speech behaviors and toward constraining the user interface as much as possible.

Figure 1 - User Centered Design for Voice Applications

User Centered Methods

The following sections define a user centered design methodology for speech applications. Of course, depending on the needs of the project (e.g., complexity, time frame, etc.), one might be tempted to cut corners. But each of the steps should be addressed at least to some degree.

Use Case Analysis

Some practical contextual questions include: What type of service is it? Who will be using it? What tasks are they trying to complete? What information are they trying to retrieve? Where will they be (physically)? How risky are the transactions? If the answers to these questions are not known, then some of the design decisions may not be optimal.

Use case analysis: Identify the context and user scenarios that are likely to be the most common. When an application tries to do too many things, the primary uses may be hard to find. With a narrow focus, the application design will have a better chance of meeting a user's needs. By understanding the user needs, the design can be tailored to meet them.
Review market research: When you build an application that is similar to an existing product, an analysis of those products will serve as a starting point from which improvements can be made. This research can provide insights into what features people use and what commands make the interaction successful.

User Interface Design

Some practical contextual questions include: What is the structure of the flow? What prompts will elicit the desired spoken commands? Will there be help upon request? How will errors be handled? Are there several applications that will be used together? Answers to these questions guide the initial design.

Initial Design: The representational form of the design can assume many different formats. One that is easy to review, develop from, and test to, is a flow diagram. An initial design will show all of the application states, define the grammars or option lists for each state, and define how help should be handled. It will define how to handle system misrecognitions and cases where no speech energy is detected. It will specify how many of the various error types the application will permit and what useful message the application prompt will play back for each error.

The initial design will also specify what command words will be supported by the application and whether or not they are context specific or universal in nature. For example, "Chicago, Illinois" may be context specific because it is only accepted when at the prompt for a location. "Main Menu" could be accepted anywhere and may always return you to the application's main menu. "Help" may be accepted anywhere, but could go to a message that is context-sensitive to the state from which it is requested.
Design review: In a design review, the user centered designer will meet with the marketing team and the development team. The marketing team needs to verify the feature set. The developer needs to assess whether the design is feasible and to assess efficiency considerations.
Revise design: The result of a good design review is the input needed for a revised design.

Evaluation

A Some practical contextual questions include: When prompted, do users say what you intended? Can users complete the important tasks without errors? Is the recognition accuracy acceptable? This part of the user centered design process may be the most important one yet the one frequently overlooked. This phase demonstrates whether user utterances conform to the design and whether the utterances are recognized by the system.

Prototype: Like a design review, an application prototype provides an opportunity to review the design, this time from a usage perspective. It is common for even a carefully constructed initial design to miss some important cases. It is also common to learn that some of the grammars or option lists contain words that the recognizer does not recognize with acceptable accuracy. It is also instructive to observe that some voice prompts elicit user commands that are not covered in the grammar.

When building the prototype, it is faster and cheaper to use text-to-speech (TTS) instead of stored speech, at least initially. The TTS will allow you to make quick changes to the design. After the design seems stable, it may be appropriate to upgrade to recorded prompts from TTS.
Usability Test: In a more formal evaluation, the usability test is one where prospective users of the service actually place calls to the prototype application to assess its ease of use, error handling, and overall acceptability. Data are collected and analyzed to obtain an empirical basis for the next round of design decisions.
Revise design: Similar to the paper design review, the usability test will almost always identify areas where the application design needs to be improved. What can be tricky here is that by fixing one problem, it is possible to create a new problem.
Prompt and Grammar Tuning: This step will make or break the success of the application. It is imperative to ensure that allowable words are not acoustically so similar that the system might confuse them by mistake. It is also imperative to ensure that the prompts elicit allowable utterances. By establishing a goal and taking measurements, the application can be tuned toward reaching the goal.

Continued...

back to the top

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).