|
Turning
GUIs into VUIs:
Dialog Design Principles for Making Web Applications
Accessible By Telephone
(Continued
from Part 1)
Social
Factors
Recent
research has shown that people tend to treat computers,
television, and other new media as real people, whether
ithey're interacting with an animated figure on a computer
screen or a computer generated voice on a telephone
[1]. As a result, the voice featured on a VUI, even
if it's synthetic, is actually perceived of as an individual
with a unique personality. This has several important
consequences: First, before dialog flows or prompt wording
can be decided on, designers must understand "who"
is talking and carefully develop the character who will
be featured in the application. How friendly, efficient,
casual, chatty, young, humorous, experienced, or forgiving
is he or she? The answers to these questions depend
on the type of application and the company behind it.
Think about the difference between a stock broker versus
a music store clerk, or a major bank versus a major
Hollywood studio, or an application that gives you traffic
updates versus one that lets you change the percentage
of your 401K plan. Second, the personality must be consistent
throughout the application. It doesn't make sense for
the application to seem warm and forgiving in one state
and then cold and impatient in the next. While people
will grow to like a personality different from their
own, no one gets used to someone whose personality is
unpredictable. Finally, it's crucial to find a voice
actor talented enough to play this role consistently
and a director who can ensure that the character originally
developed for the application is the one that ends up
being portrayed in the dialog.
Linguistic
Factors
If
VUIs are inherently social, it follows that the language
they use should be as close to naturally occurring spoken
discourse as possible. However, while everyone can tell
what sounds natural and what doesn't when they hear
it, spoken discourse is much more complicated than most
people realize, and replicating it in a voice application
requires a certain amount of linguistic expertise. Let
me give a few examples of linguistic principles that
play an important role in VUI design, some general,
some more detailed.
Speaking
"Correctly"
Most
of us were taught from an early age (either directly
or indirectly) that there is a "correct" and
"incorrect" way to speak and write, and that
if we drift too far from the "correct" way,
we might as well hide our heads in shame. Perhaps you'll
remember some of the old favorites still found in English
grammar books (and grammar checkers): Don't leave your
prepositions dangling. You mustn't split infinitives.
It's "the woman whom I love" not "who
I love". You can't start a sentence with "but".
But as it turns out, many of these rules come from an
eighteenth century fad when scholars were trying to
force the structure of English to be more like Latin,
while others have no basis at all [2]. What's more,
the language we use and expect to hear in our everyday
conversations with friends, neighbors, bank tellers,
stock brokers, store clerks, and human resource managers
has never followed the rules of standard written grammar.
It has its own rules and patterns which have evolved
naturally over hundreds of years and which every speaker
intuitively follows from a very young age.
The
problem is, the pressure to be "correct" causes
many prompt writers to produce overly formal or even
stilted sounding applications as shown in the following
prompts.
Odd:
[]
Better:[]
Odd: []
Better: []
Sometimes
the clients themselves make the requests. In one extreme
case the clients were so worried about sounding "correct"
that they banned the use of contractions in the entire
application. Needless to say, the result was odd at
best. In other cases, jargon can creep into prompts,
especially speech recognition related phrases. Take
the following prompt, for example: [
]
Concerned
that callers might not realize they could use their
own voice to interact with the system, this prompt writer
decided to make it clear by using "speak your response".
But this phrase is technical jargon typically used by
engineers to describe text-to-speech output and doesn't
fit with an application directed at the general public.
In
general, VUI designers need to understand how spoken
discourse works in order to give users a quality experience.
Otherwise users are asked to interact conversationally
with a system that doesn't sound at all conversational.
Information
Structure and Word Order
Information
structure refers to the way "old" or presupposed
information and "new" or asserted information
are reflected in sentence structure [3]. For example,
in English, new information typically comes at the end
of the sentence while old information comes at the beginning.
For example, if I asked you, "Why did you hit that
guy?" your answer might be "I hit that guy
because he insulted me". However, in this context
you certainly wouldn't say "Because that guy insulted
me I hit him." This is because I hit that guy is
now the old or presupposed information and should come
in the first part of the sentence while because he insulted
me is the new or asserted information and should come
at the end. The importance of information structure
is especially clear in help prompts. For example, suppose
the user needs to know what phrase he should use in
order for the system to play the rest of a message.
Since the phrase he's looking for constitutes the "new"
information, it should come at the end of the help prompt.
[]
Putting the phrase at the beginning in this context
doesn't conform to English information structure. []
Phonetics
and Phonology
Even
the simplest voice application typically involves a
lot of prompt concatenation. And while a good ear is
indispensable, a clear understanding of intonation patterns,
stress, and the way people pronounce conversational
language helps to make the prompt boundaries disappear
when you hear the application in real time. In addition,
knowledge of these patterns makes it easier for designers
to adjust the grammars for better recognition. The following
pair of prompts shows just how important it is to pay
attention to phonetics and phonology in VUI design.
Odd: []
Better: []
Future
Directions
Language
is a dynamic and collaborative process. That is, in
any given conversation there's no way to plan what we're
going to say next until we've heard the other person's
contribution. As the conversation progresses, its participants
in turn modify what they say and how they say it to
accommodate the growing pool of shared knowledge [4,
5]. In other words, they get to know each other. However,
very few voice applications today have any built-in
mechanisms for adapting to users. As result, people
are often annoyed with the repetitive nature of even
the best applications after only a few weeks. While
a thorough discussion of ways to make VUIs more dynamic
goes beyond the scope of this article, let me say that
we can begin to mimic this behavior by simply keeping
track of certain user events and then responding with
different prompts and dialog flows accordingly. Some
of these events include the number of times logged in,
domains visited, time elapsed since last login, error
rates, and changes in user preferences.
Conclusions
VoiceXML
has simplified the deployment of voice applications
and has given dialog developers an easier way to implement
their designs. However, the principles of dialog design
have not changed. An application's usability depends
on how well its designers can ensure system flexibility
and user control and on how well they understand the
linguistic and social principles that affect the users'
perception of the voice or "character" being
portrayed. Finally, it's important to remember that
VUIs cannot completely replace web applications. Rather,
they are best when used to enhance them.
References
1. Reeves, B. & Nass, C. (1996). The media equation.
New York: Cambridge.
2. Pinker, S. (1994). The language instinct.
New York: W. Morrow and Co.
3. Lambrecht, K. (1994). Information structure and
sentence form: topic, focus, and the mental representations
of discourse referents. New York: Cambridge.
4. Karttunen, L. & Peters, S. (1975). Conventional
implicature of Montague grammar. Berkeley Linguistic
Society, 1, 266-278.
5. Clark, H.H. (1992). Arenas of language use.
Chicago: University of Chicago Press.
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|