OpenVXI:
Fostering VoiceXML via Open Source
By Brian Eberman
Introduction
SpeechWorks has provided an open source VoiceXML interpreter
since April of 2001 to reduce barriers for developers
considering building VoiceXML solutions. SpeechWorks
partnered with Carnegie Mellon University (CMU) to make
OpenVXI software available from the CMU site
(http://fife.speech.cs.cmu.edu/openvxi/index.html),
as well as to host and archive a mailing list. Since
its inception, almost 1,500 individual downloads of
OpenVXI have occurred. The mailing list has also seen
significant activity with the open source community
participating to assist with technical issues or provide
comments and suggestions for enhancements.
OpenVXI
was designed at the outset with an intent to be vendor
agnostic and technology independent. This required a
design with clear functional boundaries so that each
component could be replaced or implemented to different
ASR or TTS technologies, or even to support a broader
range of speech applications such as multi-modal implementations.
Results to date include the adoption of OpenVXI as the
VoiceXML interpreter within several major IVR platforms,
multi-modal platforms, and research systems. OpenVXI
has also been implemented with a wide range of ASR,
telephony, and management system.
1. DESIGN
Figure
1: OpenVXI Abstract Distributed System Model
Figure
1 shows an abstract distributed reference model for
a VoiceXML gateway and component servers. SpeechWorks
used this reference model in designing OpenVXI. There
are several important points to this architecture.
First,
OpenVXI is only one component of an overall platform.
OpenVXI is designed to exclusively provide VoiceXML
interpretation. Integrators then incorporate the OpenVXI
with other components to build a VoiceXML gateway. In
this abstract model, the other components of the platform
architecture are:
-
A telephony services layer which terminates the call,
including signaling, from a switch or the PSTN;
-
A call control agent that manages the call, via the
mediating telephony services layer;
-
The ASR and TTS resources that may or may not be distributed
into different processes or network servers;
-
The platform integration that mediates between the
OpenVXI and the ASR and TTS resources;
-
The application server that consists of one or more
web servers or application servers, and contains the
application logic, backend connectivity, grammars,
and prompts.
Components
separation means that the call control operates independently
from the OpenVXI interpreter. When a call is received,
it is sent from the telephony services component to
the call control layer. The call control layer then
determines the treatment of the call. The call control
agent can choose to handle the call with VoiceXML, in
which case it brings the OpenVXI into the call. The
call control agent can also force the termination of
a call by directly communicating with the platform integration
or telephony services component. Lastly, the OpenVXI
interpreter main execution function can be invoked as
a subroutine call by the call control agent. A number
of implementers have decided to invoke the OpenVXI interpreter
on a VoiceXML page by VoiceXML page basis. Tighter call
control can be achieved with this technique by escaping
back to a call handling agent or to interact with a
previously defined IVR development environment.
The
reference architecture shows that ASR and TTS technologies
can receive their audio directly from the underlying
telephony services without having it pass through the
platform integration code. This is a platform implementation
decision that is defined entirely by the developer who
is incorporating the OpenVXI. The OpenVXI defines a
set of abstract platform interfaces VXIrec, VXIprompt,
VXItel, which the developer must implement to the particular
speech and telephony technologies they are incorporating
for their platform. These are shown in detail in Figure
2.
In
order to make these interfaces generic, the OpenVXI
assumes that the platform interfaces can be implemented
to support the base W3C speech services specifications.
Namely, that an implementation of the VXIrec interface
can support the W3C Speech Recognition Grammar Specification
(SRGS), and that the VXIprompt interface is able to
support the W3C Speech Synthesis Markup Language (SSML)
specification. The OpenVXI therefore assumes that all
recognition, grammar management, prompting (including
the TTS markup), and parsing of this information can
be entirely delegated to these interfaces. Therefore,
and implementation of these interfaces must be able
to support HTTP retrieval of information from the application
server. This separation of services and the requirement
to support HTTP is directly supported by implementations
that make use of servers that support the Media Resource
Control Protocol (MRCP http://www.ietf.org).
Continued...
back
to the top
Copyright
© 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|