|
What
is VoiceXML?
Introduction
VoiceXML
is a language for creating voice-user interfaces, particularly
for the telephone. It uses speech recognition and touchtone
(DTMF keypad) for input, and pre-recorded audio and
text-to-speech synthesis (TTS) for output. It is based
on the Worldwide Web Consortium's (W3C's) Extensible
Markup Language (XML), and leverages the web paradigm
for application development and deployment. By having
a common language, application developers, platform
vendors, and tool providers all can benefit from code
portability and reuse.
With
VoiceXML, speech recognition application development
is greatly simplified by using familiar web infrastructure,
including tools and Web servers. Instead of using a
PC with a Web browser, any telephone can access VoiceXML
applications via a VoiceXML "interpreter" (also known
as a "browser") running on a telephony server. Whereas
HTML is commonly used for creating graphical Web applications,
VoiceXML can be used for voice-enabled Web applications.
There are two schools of thought regarding the use of
VoiceXML:
-
As a way to voice-enable a Web site, or
-
As an open-architecture solution for building next-generation
interactive voice response telephone services.
One popular type of application is the voice portal,
a telephone service where callers dial a phone number
to retrieve information such as stock quotes, sports
scores, and weather reports. Voice portals have received
considerable attention lately, and demonstrate the power
of speech recognition-based telephone services. These,
however, are certainly not the only application for
VoiceXML. Other application areas, including voice-enabled
intranets and contact centers, notification services,
and innovative telephony services, can all be built
with VoiceXML.
By separating application logic (running on a standard
Web server) from the voice dialogs (running on a telephony
server), VoiceXML and the voice-enabled Web allow for
a new business model for telephony applications known
as the Voice Service Provider. This permits developers
to build phone services without having to buy or run
equipment.
While originally designed for building telephone services,
other applications of VoiceXML, such as speech-controlled
home appliances, are starting to be developed.
VoiceXML
Features
The
rapid growth of the Web was due largely to its open
architecture and high-level common interfaces to differing
computing resources. HTML and HTTP hide much of the
complexity of building interactive applications. Just
as an HTML developer doesn't need to know how bits paint
the screen of a web user's PC, VoiceXML shields developers
from many of the complexities of telephony platforms.
VoiceXML has features to control audio output; audio
input; presentation logic and control flow; event handling;
and basic telephony connections. These and other features
are described as follows:
-
Dialogs <menu>, <form>
- Audio
Output <prompt>
- Speech
synthesis controls (text-to-speech, or TTS) <emp>,
<pros>, etc.
- Pre-recorded
audio (files or streams) <audio>
- Audio
Input
- Speech
recognition (ASR)
- Audio
recording <record>
- Touchtone
(Dual-tone Multi-Frequency, or DTMF) <dtmf>
- Presentation
logic
- Control
flow <if>, <else>, etc.
- ECMAScript
client-side scripting <script>
- Server-side/dynamic
content generation <submit>
- Event
handling
- Bad
input <noinput>, <nomatch>
- Shorthand
<help>
- <catch>,
<throw>
- Basic
Connection Control
- Call
transfer and bridging <transfer>
- Disconnect
<disconnect>
Beyond
the scope of the language are application logic, state
management, dialog generation and sequencing, database
operations, and interfaces to legacy systems (e.g.,
"screen scraping"). These are handled by traditional
Web application programming techniques.
Architecture
A VoiceXML application consists of several components,
as shown in Figure 1:
-
Application Server: Typically a Web server,
which runs the application logic, and may contain
a database or interfaces to an external database or
transaction server.
-
VoiceXML Telephony Server: A platform that
runs a VoiceXML interpreter that acts as a
client to the application server. The interpreter
understands VoiceXML dialogs and controls speech and
telephony resources. These resource include ASR, TTS,
audio play and record functions, as well as a telephone
network interface.
-
Internet-style network: A TCP/IP-based packet
network that connects the application server and telephony
server via HTTP.
-
Telephone Network: Typically the Public
Switched Telephone Network (PSTN), but could be
a private telephone network (e.g. PBX), or VoIP packet
network. Caller: Any telephone that can connect to
the telephone network.
Figure
1: Components of a VoiceXML Application
Continued...
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|