2004 VoiceXML Forum Membership
Survey
By
Jim
Ferrans
VoiceXML Forum Technical Council Chair
Introduction
The talented and dedicated people serving on the
VoiceXML Forum's technical committees are doing many things, such as setting up open source tool
development projects, organizing an independent conformance program, creating a
developer certification program, and publishing the VoiceXML Review. More technical work is starting all the time: in addition
to the current Conformance, Education, and Tools Committees, we've just
chartered a new Accessibility Committee, and are contemplating the formation of
one or two new standards-oriented committees later this year.
The Forum's Technical Council is chartered with
coordinating this technical work for the Forum. It's our job to listen to
our membership and ensure that their needs are addressed. An
important way we do this is through regular surveys, the first of which was
completed this May. This article reports on what we learned.
The survey covered a wide variety of topics. We
asked our member companies:
- How they used VoiceXML,
- What VoiceXML features they liked and disliked,
- What they were doing with SALT,
- How
important VoiceXML conformance was to them,
- What features of voice platforms they thought were most crucial,
- What features they wanted to see in VoiceXML "V3", and
- What they thought about the emerging area of
multimodality.
The
survey took roughly 30 minutes to an hour to fill out
online. Thirty-one companies took the survey,
nearly ten percent of our membership. We promised
we would not release company-specific information outside
of the Technical Council, and permitted anonymous surveys.
We did however encourage companies to provide their
names and a contact person, to better help us in our
analysis of the results, and to help prevent "gaming"
of the survey. Only four companies chose anonymity.
Of the 27 others, there were only two Sponsors and three
Promoters: a full 22 were at the Supporter level.
There was a representative balance of large and small
companies, and companies from a wide variety of industries.
Many of the companies are not typically associated with
VoiceXML. All of this was encouraging, as we wanted
a representative cross-section of our members, not a
sample drawn only from only the most active companies.
VoiceXML Usage
The first questions asked about how the companies were
using VoiceXML. Figure
1 summarizes the responses:
Figure 1: How does your company
use VoiceXML?
Every responding company used VoiceXML in at least one
way (except one which planned on using it in the next year). The average company used it in three ways, and one company even
managed to use it in seven.
VoiceXML application development. The most
common use was application development: 25 out of the 31 companies developed
VoiceXML applications: 22 for others, and 16 for themselves.
VoiceXML platforms. A full 18 of the 31
companies provided VoiceXML platforms (hardware and/or software) to other
companies. This seemed fairly high to us.
VoiceXML training. Eleven of the companies
are involved in training. Only one focused exclusively on training.
Seventy-three percent of companies doing training also deployed platforms, 27
percent did hosting, 82 percent did application development, and 45 percent
developed tools. Training is therefore primarily an adjunct to the main
business of our respondents.
VoiceXML tools. Ten of the companies sell
VoiceXML tools. It would be interesting to probe deeper into this area and
see what kinds of tools are being deployed, whether they are used internally as
well as externally, how interested these companies would be in open source tools
efforts and so on. There was no company focused entirely on tools: every
tool vendor also had a platform product or a hosting service
VoiceXML hosting. Seven companies did
VoiceXML application hosting for other companies. Only one of usual VoiceXML
hosting powerhouses was in this list.
As an exercise, we compared our lists of companies
against Ken Rehor's excellent lists
of VoiceXML companies. Astonishingly, there was almost no overlap. Of the
fifteen platform providers for which we had names, only two were in Ken's list
of platform providers. Of the seven identified tool vendors, just one
appeared in Ken's list of tool vendors. And of the six identified hosting
providers, again only one appeared in Ken's hosting list. We concluded
that:
- VoiceXML
is proliferating much faster than we had been aware
of.
- Ken
needs a full-time research assistant.
The
overall impression we got from these responses is that
the VoiceXML market is vibrant, but still in its early
days. We have not yet seen a consolidation among
the scores of platform vendors, and companies are not
yet specializing in tools or training. Specialization
and consolidation will happen as conformance and interoperability
continue increasing.
VoiceXML Applications
We next asked companies what kinds of VoiceXML
applications they've deployed, and how many. We only wanted to count
commercial-grade applications providing real value to customers right now.
These questions were optional because quite a few companies were under NDA or
otherwise wanted to maintain confidentiality in this area.
Types: Only ten of the 25 companies that have
developed applications chose to tell us about the commercial applications they've
deployed using VoiceXML, These ten, about three percent of the Forum's
membership, reported a very substantial list:
- A customizable financial application suite for the
securities industry;
- A major brokerage firm's voice application suite;
- A railroad tracking system;
- Several web portals;
- A phone game suite;
- A voice mail application;
- A business portal automating support for a carrier's
DSL service.
- An automatic wakeup service deployed by a major
European carrier.
- Various commercial applications for the television
industry, and a PDA manufacturer;
- Health care applications;
- Telecom applications; and
- A phone banking system for a huge retail bank with 11
million customers, 2 million of whom use phone banking.
- A carrier's voice portal with two dozen services such
as news, weather, soccer updates, traffic, movie information, and a
television guide.
Number: Our survey was not of a size or scope to
find out how many voice applications are deployed in total, and what proportion are
authored in VoiceXML versus other authoring approaches. We would need a larger sample
drawn from the
entire industry to do that.
Publicly available application counts don't get
close to a definitive answer. They are too colored by individual companies
trying to look their best, and by proponents of one authoring approach
exaggerating application counts to diminish other authoring approaches.
Even the concept of application is fluid: Should only those in service
count? Is a voice portal one application or fifty? Should an
application serving two million calls a day count the same as one that serves
hundreds?
The most reliable sense of VoiceXML's impact can be found in
independent market research done by consultancies like InStat/MDR, Frost and
Sullivan, Zelos Group, Datamonitor, IDC, and Yankee Group. They are reporting very encouraging
findings this year. For instance, the Zelos Group's Dan Miller recently said that
"VoiceXML is the standard scripting language for rendering Web pages over
the telephone. VoiceXML 2.0-compliant products are already on the market from core technology, platform, development tool and hosted services providers, and there is broad industry adoption. More importantly, purchase decision-makers among the major speech-enabled enterprises, including financial services, travel, telcos, see VoiceXML compliance as a requirement. It gives them bargaining leverage across vendors and solutions providers and carries with it the promise of re-usable code and portability.”
Art Schoeller of the Yankee Group found this year that “There is a huge momentum behind VoiceXML right
now. Based on corporate requests for proposals (RFPs) and actual deployments, that is easy to see."
This
momentum is amply seen in our survey. We asked
each company to tell us how many deployed applications
were in service currently and how many they expected
to see in service a year from now. Twenty-four
companies responded. We discarded one larger company
who seemed to be gaming the system. The remaining
companies were mainly small, and their answers correlated
quite well with information publicly available on them.
They reported a total of 208 deployed VoiceXML applications
today, and expected to have
862 VoiceXML applications in service a year from now.
This better than quadrupling of the VoiceXML market
size agrees with what market researchers are finding
from wider samples.
SALT Usage
We next asked our respondents about their use of SALT
for authoring voice and multimodal applications.
The VoiceXML Forum's 333 companies are a major
part of the voice industry. Because of this, 47 of the SALT Forum's 79 members (60
percent) are in the VoiceXML Forum, while 14 percent of our members
are in the SALT Forum. These dual membership companies tend to be
larger, more active, and more serious participants in the industry.
Given this high level of cross-membership, the answers to these questions
should shed light on how much impact SALT will have.
The voice industry is quite pragmatic. Companies
are interested in meeting customer needs by deploying commercially valuable
applications and services. They see standards as a key means to this end,
but generally don't want to waste energy by getting polarized about them.
And polarization seems not to be happening. Three factors point to this.
First, SALT Forum companies are very active in the
VoiceXML Forum. After the VoiceXML Forum was restructured in late 2003 to
allow any member to participate at the board level, the original four founding board
members were joined by seven more.
Significantly, five of our new board members are from the SALT Forum: HP,
Verizon, Vocalocity, VoiceGenie, and West. After our August 2004 board elections,
our board's chairperson and vice-chairperson are from SALT Forum companies.
Second, SALT Forum companies are highly committed to
VoiceXML. I keep a list
of their recent announcements on VoiceXML, and the proportion making serious
commercial investments in VoiceXML is surprisingly high, nearly triple the
proportion investing in SALT. I found voice-related product and service
announcements for 58 of the 79 companies. Of these 58, 46 (79 percent)
made very significant commercial bets on VoiceXML. Another 8 (14 percent)
made lesser commitments to VoiceXML.
Finally, the results of our survey indicate that
SALT-oriented companies are deploying an order of magnitude more VoiceXML
applications than SALT applications.
We first asked our sample a series of questions about
how companies use SALT, mirroring the questions for VoiceXML. The results are summarized in
Figure 2.
Figure 2: How does your company
use SALT?
Not surprisingly, this reflects the same proportions we
see for VoiceXML.
Types: We asked about types and quantities of
deployed SALT applications, as we did for
VoiceXML. We could not ascertain what types of applications SALT will be used for, since none of our respondents had yet deployed a SALT application.
But we expect SALT to be used in nearly the same way as VoiceXML.
Number: Five respondents answered that they were
working on SALT applications. The total number of our sample's deployed
SALT applications in May 2004 was 0. The total number of deployed SALT
applications they expect to have by May 2005 is 42. This is contrasted with VoiceXML deployments in Figure 3.
Figure 3: VoiceXML and SALT
deployments (all respondents).
Relative use of VoiceXML and SALT.
From our data can we say that 100 percent of the markup based voice applications
are VoiceXML this year, or that next year only 4.6 percent will be SALT?
No: our sample was self-selecting and drawn from only VoiceXML Forum companies. But there
is an interesting thought experiment we can do.
Our respondents included an above average
proportion of SALT Forum members: six of 27 identified companies (22.2 percent),
relative to the 14.1 percent ratio for the full VoiceXML Forum. What if we
looked at just these six plus the four other companies reporting that they were
deploying SALT applications? The proportion of SALT applications these
SALT-oriented companies are deploying surely should represent an upper bound for
the industry as a whole. Figure 4 shows the results for the ten companies that fit
the SALT-oriented profile.
Figure 4: VoiceXML and SALT
deployments (SALT-oriented respondents).
The
data shows that, remarkably, even the companies most
interested in SALT will deploy fully 91.1 percent of
their applications in VoiceXML next year, and only 8.9
percent in SALT. Clearly there is no great fragmentation
happening in the voice industry, and the signs are that
VoiceXML will continue to dominate it.
Strengths and Weaknesses of VoiceXML 2.0
Our next series of questions tried to tease out what
features our membership would like to see in VoiceXML "V3".
These were used to prepare a short position paper we are forwarding to the W3C.
Strengths of VoiceXML 2.0. When asked an
open-ended question about VoiceXML 2.0's strengths, our respondents had these
comments. (We present only those comments mentioned by two or more
companies.)
- VoiceXML 2.0 is an open widely accepted W3C standard;
standardization means low costs, strong core technology, platform
independence, no vendor lock-in, broad developer community. [17 companies]
- It is simple, easy to use, natural, easy to develop
complex applications. [15]
- It uses the web paradigm: internet infrastructure,
separation of logic and presentation, ease of deployment, [10]
- High portability. [5]
- It results in rapid implementation, can be used for
rapid prototyping. [4]
- Powerful. [4]
- Allows switching between ASR and TTS systems. [3]
- Short and concise dialog flow, FIA. [2]
- Flexible. [2]
- Supports ECMAScript. [2]
Weakness of VoiceXML 2.0. When asked an
open-ended question about VoiceXML 2.0's weaknesses, our respondents had various
comments. (We leave in comments from only one company, as they may prompt
change requests for VoiceXML "V3").
Mentions of
features for "V3" we later explicitly asked about (discussed in the next section):
- Want more control over ASR settings. [3 companies]
- FIA too complex, non-intuitive in some cases, too
restrictive, sometimes want to define my own [3]
- No support for event-driven programming (e.g.,
asynchronous interrupts). [2]
- Want to see more call control features. [2]
- Want better CCXML integration [1].
- Need to be modularized for reusability. [1]
- Not extensible (e.g., for video output, multimodal).
[1]
Comments on VoiceXML 2.0 per se:
- Sub dialogs should be more flexible, e.g., allow
running without new execution context [2].
- Lack of support for multimodal interaction. [1]
- The W3C's VoiceXML 2.0 specification is not clear
enough: many details are not filled in. [1]
- Too much programming. [1]
- Want to see error.badfetch subtypes to aid in problem
determination. [1]
- Want to have expr as well as value in
<grammar>. [1]
- No dynamic vocabulary (want voice and text enrollment).
[1]
- Want more flexibility in accessing recognition
results. [1]
- Want better prompt control. [1]
Comments already addressed in VoiceXML 2.1:
- Want a <data> tag. [1]
- Want to record during recognition for logging and
tuning. [1]
- Want a <grammar> expr attribute for dynamic
grammar generation. [1]
Comments regarding SSML, SRGS, SISR specifications:
- The Semantic Interpretation specification (SISR) is
too complex. [1]
- Using SRGS for DTMF grammars leads to somewhat
lengthy documents. [1]
Comments about the speech and VoiceXML industry:
- Lack of VoiceXML portability due to vendor
limitations, vendor-specific extensions. [4]
- Tools are immature, we need an IDE for VoiceXML. [3]
- Grammar standards (SRGS/SISR) are not yet well
adopted by vendors. [1]
- The server-side VoiceXML generation tools from [...]
and [...] generate too many round trips and result in inefficiencies. [1]
- Want to be able to do open transcription. [1]
Weaknesses
were mentioned only half as much as strengths, and did
not cluster around any one area in particular.
Those areas that got multiple mentions are ones already
identified as areas to consider for "V3",
or are comments on the industry, not the standard.
Features Desired in VoiceXML "V3"
Features desired in VoiceXML "V3".
We also asked a guided series of questions on possible specific VoiceXML
"V3" features. The results are shown in Figure 5:
Figure 5: What features would you
most like to see in VoiceXML "V3".
Crucial features. Our respondents backed
five potential features/capabilities for "V3" very strongly:
- A high level of compatibility is important.
24 of 31 companies are highly interested in compatibility, either by having
rigorously equivalent syntax and semantics, full backwards compatibility, or
automated translation between 2.0 and "V3". Two other
companies wanted "look and feel" compatibility, while four did not
consider compatibility important (Figure 6). This was an
overwhelmingly unified response. (See Figure 6).
- The ability to communicate between a VoiceXML
session and external entities is important. 21 companies would
like to permit VoiceXML "V3" sessions to communicate with external
entities outside of the HTTP request/response model.
- Support for call control within VoiceXML remains
important. CCXML is viewed as an important standard, however 20 respondents indicated that some level of call control capability within a
VoiceXML session continues to be important.
- Additional control over low-level media is
desirable. 17 respondents want to see more control over
low-level media resources in "V3".
- Modularization.
This is viewed as a key "V3" requirement by 16 respondents.
Important features. While ailing to address
the any of preceding five items would lead to acceptance issues for "V3",
we also identified two other features that should be seriously considered.
- Speaker
verification. This is viewed as a key "V3" requirement
by 9 of 31 respondents.
- Additional
control over the FIA is desirable.
This is viewed as a key "V3" requirement by 8 respondents.
Figure 6: How much backwards
compatibility with VoiceXML 2.0 should VoiceXML "V3" have?
The
features identified as crucial are mainly those already
identified by the W3C. One key takeaway is that
"V3" should be as compatible as possible with
VoiceXML 2.0 if it is to be relevant to the industry.
Selecting a VoiceXML Platform
Next we asked what their top three factors were in
selecting a VoiceXML platform. The responses are shown in Figure 9.
Figure 7: What factors are most
important when selecting a VoiceXML 2.0 platform.
The answers seem reasonable. To do its job, a
voice platform must be reliable, use an effective speech recognizer, and be
affordable. Once these basic needs are met, it has to adhere to
standards. Below these needs come lesser ones.
We next divided the thirty companies responding to this
question into platform vendors (n=17) versus non-platform vendors (n=13).
On most factors the two groups were in close agreement, but two factors showed
interesting discrepancies:
- Platform
vendors tended to overrate capacity's importance
relative to non-platform vendors.
- Platform
vendors vastly underrated the importance of debugging
support relative to non-platform vendors.
This
suggests that platform vendors should revisit their
application debugging capabilities to ensure that they
are satisfactory.
Conformance
When asked if they authored applications for multiple
voice platforms, 16 respondents said yes, 11 no. The other seven did not
answer, many because they don't author applications. For those who
developed applications for multiple platforms (n=16), we asked how many of their
applications needed to have separate versions maintained for each
platform. Five maintained all applications for separate platforms, ten did
not need to maintain separate applications, and one was in between. We
were not able to ascertain whether or not conformance was the issue vs. other
factors such as dependence on vendor extensions, ASR tuning properties,
etc. This indicates that platform conformance, at least in the past, has
been a serious issue for some respondents.
When they were asked explicitly if interoperability was a key
issue, 11 companies said interoperability is "very
important", seven said it was "important", and six said it was
"somewhat important". Four felt it was not important, and three
didn't answer.
We then asked about specific conformance areas, and got
the results shown in Figure 8.
Figure 8: Which areas of conformance impact you and how severely?
The main factor impacting conformance was
platform-dependent features. Application developers either explicitly take advantage of them,
or are forced into using them (e.g., ASR tuning properties). The W3C is
aware of these issues, and has standardized some of the more common areas of
difference in VoiceXML 2.1 (e.g., the expr attribute on <script> and
<grammar>). It should push ahead further in this area for
"V3", for example by defining standards for speaker
verification. Platform developers should eliminate dependencies that are
not essential to their differentiation strategies.
The next highest issue was VoiceXML 2.0
conformance. This is strong justification for the VoiceXML Forum's
Conformance program, and we expect to see this issue decline in importance as
more platform vendors go through external conformance testing.
SRGS conformance was also a serious matter. The
SRGS and SISR standards took much longer to gel than we expected in the old
VoiceXML 1.0 days, and this is reflected in the level of incompatibilities
reported. We expect this issue will decline in severity too, as platform
and speech technology vendors implement the final versions of these standards.
Happily, improved conformance and interoperability are
being driven by the W3C's Implementation Report tests
for VoiceXML 2.0 and 2.1, SRGS, SSML, and SISR.
They are also starting to be pushed by the Forum's new
conformance program. And as VoiceXML applications
are ported to new platforms and hosting services, the
applications and the platforms are forced to iron out
conformance problems. It will be interesting to
see how this question is answered in our next survey.
Multimodal Applications
There is quite a bit of industry interest in multimodal
user interfaces as a compelling way to improve user experience, especially on
smaller devices. Multimodal interfaces typically combine a visual mode
with a voice mode (and perhaps other modes like touch or gesture). As yet
no standards have been established for multimodal languages, although SALT and
the VoiceXML-based X+V have been put forward.
SALT uses the W3C
standards for speech grammars (SRGS), text to speech markup (SSML), and semantic
interpretation (SISR), but does not build on the W3C's VoiceXML standard.
Whereas W3C's VoiceXML mainly uses declarative constructs to specify dialogs,
SALT relies on low-level ECMAScript programming. SALT has been critiqued
for its verbosity
and other problems, though
in some situations programming "close to the metal" has advantages.
For multimodal use, SALT needs to be combined with a
visual markup language like HTML. The situation with VoiceXML is no
different. The W3C has defined a standard visual markup and a container
called XHTML. They have also specified standard mechanisms for integrating
other kinds of markup into an XHTML container, mechanisms like modularization,
namespaces, and XML Events.
X+V is
just a straightforward use of these mechanisms to add VoiceXML markup into the
XHTML container. Leveraging the W3C's "standards stack" like this results
in a clean model-view-controller architecture, where the views (visual and
voice) are not commingled as in SALT, but independent.
We first asked companies if they were using X+V, and if
so, how were they using it (Figure 9).
Figure 9: How are you using
X+V?
This level of interest in X+V was broader
than we anticipated. Only one of the companies that authored the X+V
specification was in our sample. There were six deployed X+V applications
this year and 28 in 2005. This seems low in comparison to VoiceXML, but
entirely expected: VoiceXML can be reached from any of the three billion phones
on the planet, but there are a relative handful of devices that can support
multimodal interfaces. In a SALT ecosystem, there would be a similar
proportion of voice-only and multimodal applications.
We next decided to look at our subgroup of
SALT Forum companies and SALT developers to see what they were doing
with X+V. We expected that even though they were enthusiastically using
VoiceXML for voice-only applications, they would be planning to work mainly with
SALT for multimodal applications (Figure 10).
Figure 10: How are you using
X+V? (SALT-oriented respondents only.)
Surprisingly, six of the ten SALT-oriented companies
either use X+V today, or plan to use it over the next 12 months. This is a
sizeable level of interest.
We next asked "How important is it that a future
multimodal markup standard be based on the VoiceXML 2.0 standard?" The
results are in Figure 11, and show that of the 27 companies answering this
question, 25 thought it at least somewhat important, while only two thought it
not important.
Figure 11: How important is it
that a future multimodal markup standard build on the VoiceXML 2.0
standard?
(All respondents)
We then looked at how our SALT-oriented group answered this question (Figure 12):
Figure 12: How important is it
that a future multimodal markup standard build on the VoiceXML 2.0
standard?
(SALT-oriented respondents only.)
Interestingly,
the SALT-oriented group was unanimous that it was at
least somewhat important for the future multimodal markup
language to build on top of VoiceXML 2.0. These
results show that the voice community has a strong bias
for open, accepted international standards. There
is a clear mandate from our membership for the VoiceXML
Forum to push for a VoiceXML 2.0-based multimodal standard.
Conclusions
Tools.Our
Tools Committee has correctly seen the need to seed the growth of the VoiceXML
tools industry by starting up open source efforts, working on data logging
standards, and working on a meta-language for inter-tool communication.
Conformance: This Survey very clearly justifies
the huge efforts the Conformance Committee is putting into setting up our
conformance program. This remains one of the top goals of the Forum.
Education: The many Education Committee
activities serve important purposes. The Developer Certification program,
conference organization (including the tutorials), and VoiceXML Review e-zine
all serve to educate developers in VoiceXML and related topics. As
developers understand VoiceXML better, they tend to pressure platform providers
to fix conformance problems, for instance.
General:
SALT does not seem to be gaining much traction, so the
industry will likely not see a counter-productive standards
battle. Our membership has given us a clear mandate
to lobby for a VoiceXML-based approach to multimodal
markup.
Acknowledgements
This survey was a group effort by members of the Forum's
Technical Council and members of the IEEE ISTO organization. I would
especially like to thank Chris Cross of IBM for editing the survey questions,
and Joni Brennan of the ISTO for creating the survey web site, managing the
survey, and doing the initial collation of the results.
Rob Marchand of VoiceGenie analyzed the VoiceXML
"V3" results, and wrote the report on them for the W3C. Other
valuable input came from Dan Burnett (Nuance), Gary Jesenick (Lucent), Gerry
McCobb (IBM), Ken Rehor (Vocalocity), Cindy Tiritilli (IEEE), and Les Wilson
(IBM).
|