Elvira
- a VoiceXML Platform for Research
Introduction
Research in the field of dialogue systems often involves
creating an
experimental application, which is used for testing
new ideas, statistical
data collection, various measurements, performance tests,
etc. A significant
amount of time is spent on the creation of such an application
before the
scientific work itself can start.
This
article introduces Elvira (http://gin2.itek.norut.no/elvira/)
- a VoiceXML platform focused on the specific needs
of researchers. Elvira can be used for quick arrangements
of sophisticated research environments and allows a
quick design of a dialogue system, making it possible
to concentrate on the scientific problem itself. During
its design, special attention was paid to its flexibility
and easy extensibility, so that it can be utilized in
a wide variety of research tasks and experiments.
The development of Elvira started in the Laboratory
of Speech and Dialogue (http://www.fi.muni.cz/lsd/)
at the Faculty of Informatics, Masaryk University, Brno,
Czech Republic in January 2001. The starting impulse
was the need for a suitable tool for creating an experimental
dialogue system AudiC, which allowed visually impaired
people to program in C.
First
versions of Elvira implemented a very limited subset
of VoiceXML 1.0.
In spite of this fact, Elvira helped to successfully
finish the project. The
development of AudiC revealed many requirements for
a good VoiceXML
interpreter for research and influenced Elvira's later
design.
From
October 2001, Elvira is being developed in cooperation
with Norut IT
(http://www.itek.norut.no/), an applied research institute
located in
Tromsø, Norway. Based on experience from the
AudiC project and
influenced by the first public draft of VoiceXML 2.0,
Elvira's architecture
was completely redesigned to achieve better flexibility
and easier
extensibility.
The extensibility and flexibility of our VoiceXML platform
is achieved by
utilizing component paradigm for its design and development.
The component
based architecture of the system ensures its great modularity.
A
component can be viewed as a self-contained binary object,
which provides
its services to the outer world through a set of precisely
defined interfaces.
Elvira is a system formed of such components. The selection
of components is
done at run-time and hence Elvira can operate in dozens
various configurations
with different features and capabilities dependent on
currently used
components.
Elvira's general architecture is depicted in the following
figure, where components are represented by gray rectangles.
VoiceXML platform Elvira - system architecture
The heart of the platform is Elvira Core, which interprets
VoiceXML and
controls the other components. The Core is the only
component which is
supposed to never be replaced by a custom implementation.
Therefore, our aim
is to concentrate as many tasks as possible within the
Core to allow the other
components to be as simple as possible.
The input collection is handled by an input component.
An input component can
be able to process a voice stream delivered by the telephony
component,
another can support microphone, but there is no requirement
that the
components should support only voice input. It is possible
to use e.g.
keyboard for simulating speech and also more "exotic"
devices e.g. stylus and
handwriting recognition, touch screens, haptic devices
for handicapped people
or any combination of such devices. The output component,
which is responsible
for output generation, has similar degree of freedom.
The big diversity of devices implies a big diversity
of capabilities of different input/output components.
Some operations make sense only for some components,
e.g. prosody modeling is useful only for speech synthesis.
In order to embrace this diversity, components have
to implement only some mandatory interfaces which are
essential for correct running of the system. They can
provide extended functionality by implementing other
interfaces, if it is meaningful for the supported devices
and useful for the current application. If an interface
is not implemented, it is detected and handled by Elvira
Core.
This
principle allows users to deal only with issues relevant
for their
current work and keep things as simple as possible.
Every component is characterized by a unique name and
by a category. The names
are typically used to specify which component should
be used for a specific
task in the system (e.g. input collection and output
generation). A component
can be also selected based on its category. It is used
for instance for an
automatic support of new grammar types. Each component
ensuring grammar
analysis defines its category so that it contains the
mime-type of the
supported grammar format. When Elvira Core needs a grammar
analyzer for a
specific grammar format, it simply uses the component
with the right category.
Thus, everything that is needed to support a new grammar
type is to copy the
proper component into a location where Elvira Core can
find it.
The
same principle is used for selecting stream components
for fetching
resources according to the protocol specified in its
URI and also for
selection of a component which can handle a resource
with a specific
mime-type (the grammar components mentioned in previous
paragraph are examples
of such resource components).
Extensions for Research Purposes
Researchers in the field of human language technologies
require great
flexibility in order to be able to perform virtually
any task they need to
perform. We decided to keep the number of VoiceXML extensions
as low as possible and
rather provide a general and unified mechanism addressing
the problem. The
mechanism is implemented in Elvira Core and allows calling
external functions
written in C++ from any ECMAScript expression within
VoiceXML.
Continued...
back
to the top
Copyright
© 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|