VoiceXML Review - Feature - VoiceXML on Auto Row

Volume 4, Issue 3 - September / October 2004

2004 VoiceXML Forum Membership Survey

By Jim Ferrans
VoiceXML Forum Technical Council Chair

Introduction

The talented and dedicated people serving on the VoiceXML Forum's technical committees are doing many things, such as setting up open source tool development projects, organizing an independent conformance program, creating a developer certification program, and publishing the VoiceXML Review. More technical work is starting all the time: in addition to the current Conformance, Education, and Tools Committees, we've just chartered a new Accessibility Committee, and are contemplating the formation of one or two new standards-oriented committees later this year.

The Forum's Technical Council is chartered with coordinating this technical work for the Forum. It's our job to listen to our membership and ensure that their needs are addressed. An important way we do this is through regular surveys, the first of which was completed this May. This article reports on what we learned.

The survey covered a wide variety of topics. We asked our member companies:

How they used VoiceXML,
What VoiceXML features they liked and disliked,
What they were doing with SALT,
How important VoiceXML conformance was to them,
What features of voice platforms they thought were most crucial,
What features they wanted to see in VoiceXML "V3", and
What they thought about the emerging area of multimodality.

The survey took roughly 30 minutes to an hour to fill out online. Thirty-one companies took the survey, nearly ten percent of our membership. We promised we would not release company-specific information outside of the Technical Council, and permitted anonymous surveys. We did however encourage companies to provide their names and a contact person, to better help us in our analysis of the results, and to help prevent "gaming" of the survey. Only four companies chose anonymity. Of the 27 others, there were only two Sponsors and three Promoters: a full 22 were at the Supporter level. There was a representative balance of large and small companies, and companies from a wide variety of industries. Many of the companies are not typically associated with VoiceXML. All of this was encouraging, as we wanted a representative cross-section of our members, not a sample drawn only from only the most active companies.

VoiceXML Usage

The first questions asked about how the companies were using VoiceXML. Figure 1 summarizes the responses:

Figure 1: How does your company use VoiceXML?

Every responding company used VoiceXML in at least one way (except one which planned on using it in the next year). The average company used it in three ways, and one company even managed to use it in seven.

VoiceXML application development. The most common use was application development: 25 out of the 31 companies developed VoiceXML applications: 22 for others, and 16 for themselves.

VoiceXML platforms. A full 18 of the 31 companies provided VoiceXML platforms (hardware and/or software) to other companies. This seemed fairly high to us.

VoiceXML training. Eleven of the companies are involved in training. Only one focused exclusively on training. Seventy-three percent of companies doing training also deployed platforms, 27 percent did hosting, 82 percent did application development, and 45 percent developed tools. Training is therefore primarily an adjunct to the main business of our respondents.

VoiceXML tools. Ten of the companies sell VoiceXML tools. It would be interesting to probe deeper into this area and see what kinds of tools are being deployed, whether they are used internally as well as externally, how interested these companies would be in open source tools efforts and so on. There was no company focused entirely on tools: every tool vendor also had a platform product or a hosting service

VoiceXML hosting. Seven companies did VoiceXML application hosting for other companies. Only one of usual VoiceXML hosting powerhouses was in this list.

As an exercise, we compared our lists of companies against Ken Rehor's excellent lists of VoiceXML companies. Astonishingly, there was almost no overlap. Of the fifteen platform providers for which we had names, only two were in Ken's list of platform providers. Of the seven identified tool vendors, just one appeared in Ken's list of tool vendors. And of the six identified hosting providers, again only one appeared in Ken's hosting list. We concluded that:

VoiceXML is proliferating much faster than we had been aware of.
Ken needs a full-time research assistant.

The overall impression we got from these responses is that the VoiceXML market is vibrant, but still in its early days. We have not yet seen a consolidation among the scores of platform vendors, and companies are not yet specializing in tools or training. Specialization and consolidation will happen as conformance and interoperability continue increasing.

VoiceXML Applications

We next asked companies what kinds of VoiceXML applications they've deployed, and how many. We only wanted to count commercial-grade applications providing real value to customers right now. These questions were optional because quite a few companies were under NDA or otherwise wanted to maintain confidentiality in this area.

Types: Only ten of the 25 companies that have developed applications chose to tell us about the commercial applications they've deployed using VoiceXML, These ten, about three percent of the Forum's membership, reported a very substantial list:

A customizable financial application suite for the securities industry;
A major brokerage firm's voice application suite;
A railroad tracking system;
Several web portals;
A phone game suite;
A voice mail application;
A business portal automating support for a carrier's DSL service.
An automatic wakeup service deployed by a major European carrier.
Various commercial applications for the television industry, and a PDA manufacturer;
Health care applications;
Telecom applications; and
A phone banking system for a huge retail bank with 11 million customers, 2 million of whom use phone banking.
A carrier's voice portal with two dozen services such as news, weather, soccer updates, traffic, movie information, and a television guide.

Number: Our survey was not of a size or scope to find out how many voice applications are deployed in total, and what proportion are authored in VoiceXML versus other authoring approaches. We would need a larger sample drawn from the entire industry to do that.

Publicly available application counts don't get close to a definitive answer. They are too colored by individual companies trying to look their best, and by proponents of one authoring approach exaggerating application counts to diminish other authoring approaches. Even the concept of application is fluid: Should only those in service count? Is a voice portal one application or fifty? Should an application serving two million calls a day count the same as one that serves hundreds?

The most reliable sense of VoiceXML's impact can be found in independent market research done by consultancies like InStat/MDR, Frost and Sullivan, Zelos Group, Datamonitor, IDC, and Yankee Group. They are reporting very encouraging findings this year. For instance, the Zelos Group's Dan Miller recently said that "VoiceXML is the standard scripting language for rendering Web pages over the telephone. VoiceXML 2.0-compliant products are already on the market from core technology, platform, development tool and hosted services providers, and there is broad industry adoption. More importantly, purchase decision-makers among the major speech-enabled enterprises, including financial services, travel, telcos, see VoiceXML compliance as a requirement. It gives them bargaining leverage across vendors and solutions providers and carries with it the promise of re-usable code and portability.” Art Schoeller of the Yankee Group found this year that “There is a huge momentum behind VoiceXML right now. Based on corporate requests for proposals (RFPs) and actual deployments, that is easy to see."

This momentum is amply seen in our survey. We asked each company to tell us how many deployed applications were in service currently and how many they expected to see in service a year from now. Twenty-four companies responded. We discarded one larger company who seemed to be gaming the system. The remaining companies were mainly small, and their answers correlated quite well with information publicly available on them. They reported a total of 208 deployed VoiceXML applications today, and expected to have 862 VoiceXML applications in service a year from now. This better than quadrupling of the VoiceXML market size agrees with what market researchers are finding from wider samples.

SALT Usage

We next asked our respondents about their use of SALT for authoring voice and multimodal applications.

The VoiceXML Forum's 333 companies are a major part of the voice industry. Because of this, 47 of the SALT Forum's 79 members (60 percent) are in the VoiceXML Forum, while 14 percent of our members are in the SALT Forum. These dual membership companies tend to be larger, more active, and more serious participants in the industry. Given this high level of cross-membership, the answers to these questions should shed light on how much impact SALT will have.

The voice industry is quite pragmatic. Companies are interested in meeting customer needs by deploying commercially valuable applications and services. They see standards as a key means to this end, but generally don't want to waste energy by getting polarized about them. And polarization seems not to be happening. Three factors point to this.

First, SALT Forum companies are very active in the VoiceXML Forum. After the VoiceXML Forum was restructured in late 2003 to allow any member to participate at the board level, the original four founding board members were joined by seven more. Significantly, five of our new board members are from the SALT Forum: HP, Verizon, Vocalocity, VoiceGenie, and West. After our August 2004 board elections, our board's chairperson and vice-chairperson are from SALT Forum companies.

Second, SALT Forum companies are highly committed to VoiceXML. I keep a list of their recent announcements on VoiceXML, and the proportion making serious commercial investments in VoiceXML is surprisingly high, nearly triple the proportion investing in SALT. I found voice-related product and service announcements for 58 of the 79 companies. Of these 58, 46 (79 percent) made very significant commercial bets on VoiceXML. Another 8 (14 percent) made lesser commitments to VoiceXML.

Finally, the results of our survey indicate that SALT-oriented companies are deploying an order of magnitude more VoiceXML applications than SALT applications.

We first asked our sample a series of questions about how companies use SALT, mirroring the questions for VoiceXML. The results are summarized in Figure 2.

Figure 2: How does your company use SALT?

Not surprisingly, this reflects the same proportions we see for VoiceXML.

Types: We asked about types and quantities of deployed SALT applications, as we did for VoiceXML. We could not ascertain what types of applications SALT will be used for, since none of our respondents had yet deployed a SALT application. But we expect SALT to be used in nearly the same way as VoiceXML.

Number: Five respondents answered that they were working on SALT applications. The total number of our sample's deployed SALT applications in May 2004 was 0. The total number of deployed SALT applications they expect to have by May 2005 is 42. This is contrasted with VoiceXML deployments in Figure 3.

Figure 3: VoiceXML and SALT deployments (all respondents).

Relative use of VoiceXML and SALT. From our data can we say that 100 percent of the markup based voice applications are VoiceXML this year, or that next year only 4.6 percent will be SALT? No: our sample was self-selecting and drawn from only VoiceXML Forum companies. But there is an interesting thought experiment we can do.

Our respondents included an above average proportion of SALT Forum members: six of 27 identified companies (22.2 percent), relative to the 14.1 percent ratio for the full VoiceXML Forum. What if we looked at just these six plus the four other companies reporting that they were deploying SALT applications? The proportion of SALT applications these SALT-oriented companies are deploying surely should represent an upper bound for the industry as a whole. Figure 4 shows the results for the ten companies that fit the SALT-oriented profile.

Figure 4: VoiceXML and SALT deployments (SALT-oriented respondents).

The data shows that, remarkably, even the companies most interested in SALT will deploy fully 91.1 percent of their applications in VoiceXML next year, and only 8.9 percent in SALT. Clearly there is no great fragmentation happening in the voice industry, and the signs are that VoiceXML will continue to dominate it.

Strengths and Weaknesses of VoiceXML 2.0

Our next series of questions tried to tease out what features our membership would like to see in VoiceXML "V3". These were used to prepare a short position paper we are forwarding to the W3C.

Strengths of VoiceXML 2.0. When asked an open-ended question about VoiceXML 2.0's strengths, our respondents had these comments. (We present only those comments mentioned by two or more companies.)

VoiceXML 2.0 is an open widely accepted W3C standard; standardization means low costs, strong core technology, platform independence, no vendor lock-in, broad developer community. [17 companies]
It is simple, easy to use, natural, easy to develop complex applications. [15]
It uses the web paradigm: internet infrastructure, separation of logic and presentation, ease of deployment, [10]
High portability. [5]
It results in rapid implementation, can be used for rapid prototyping. [4]
Powerful. [4]
Allows switching between ASR and TTS systems. [3]
Short and concise dialog flow, FIA. [2]
Flexible. [2]
Supports ECMAScript. [2]

Weakness of VoiceXML 2.0. When asked an open-ended question about VoiceXML 2.0's weaknesses, our respondents had various comments. (We leave in comments from only one company, as they may prompt change requests for VoiceXML "V3").

Mentions of features for "V3" we later explicitly asked about (discussed in the next section):

Want more control over ASR settings. [3 companies]
FIA too complex, non-intuitive in some cases, too restrictive, sometimes want to define my own [3]
No support for event-driven programming (e.g., asynchronous interrupts). [2]
Want to see more call control features. [2]
Want better CCXML integration [1].
Need to be modularized for reusability. [1]
Not extensible (e.g., for video output, multimodal). [1]

Comments on VoiceXML 2.0 per se:

Sub dialogs should be more flexible, e.g., allow running without new execution context [2].
Lack of support for multimodal interaction. [1]
The W3C's VoiceXML 2.0 specification is not clear enough: many details are not filled in. [1]
Too much programming. [1]
Want to see error.badfetch subtypes to aid in problem determination. [1]
Want to have expr as well as value in <grammar>. [1]
No dynamic vocabulary (want voice and text enrollment). [1]
Want more flexibility in accessing recognition results. [1]
Want better prompt control. [1]

Comments already addressed in VoiceXML 2.1:

Want a <data> tag. [1]
Want to record during recognition for logging and tuning. [1]
Want a <grammar> expr attribute for dynamic grammar generation. [1]

Comments regarding SSML, SRGS, SISR specifications:

The Semantic Interpretation specification (SISR) is too complex. [1]
Using SRGS for DTMF grammars leads to somewhat lengthy documents. [1]

Comments about the speech and VoiceXML industry:

Lack of VoiceXML portability due to vendor limitations, vendor-specific extensions. [4]
Tools are immature, we need an IDE for VoiceXML. [3]
Grammar standards (SRGS/SISR) are not yet well adopted by vendors. [1]
The server-side VoiceXML generation tools from [...] and [...] generate too many round trips and result in inefficiencies. [1]
Want to be able to do open transcription. [1]

Weaknesses were mentioned only half as much as strengths, and did not cluster around any one area in particular. Those areas that got multiple mentions are ones already identified as areas to consider for "V3", or are comments on the industry, not the standard.

Features Desired in VoiceXML "V3"

Features desired in VoiceXML "V3". We also asked a guided series of questions on possible specific VoiceXML "V3" features. The results are shown in Figure 5:

Figure 5: What features would you most like to see in VoiceXML "V3".

Crucial features. Our respondents backed five potential features/capabilities for "V3" very strongly:

A high level of compatibility is important. 24 of 31 companies are highly interested in compatibility, either by having rigorously equivalent syntax and semantics, full backwards compatibility, or automated translation between 2.0 and "V3". Two other companies wanted "look and feel" compatibility, while four did not consider compatibility important (Figure 6). This was an overwhelmingly unified response. (See Figure 6).
The ability to communicate between a VoiceXML session and external entities is important. 21 companies would like to permit VoiceXML "V3" sessions to communicate with external entities outside of the HTTP request/response model.
Support for call control within VoiceXML remains important. CCXML is viewed as an important standard, however 20 respondents indicated that some level of call control capability within a VoiceXML session continues to be important.
Additional control over low-level media is desirable. 17 respondents want to see more control over low-level media resources in "V3".
Modularization. This is viewed as a key "V3" requirement by 16 respondents.

Important features. While ailing to address the any of preceding five items would lead to acceptance issues for "V3", we also identified two other features that should be seriously considered.

Speaker verification. This is viewed as a key "V3" requirement by 9 of 31 respondents.
Additional control over the FIA is desirable. This is viewed as a key "V3" requirement by 8 respondents.

Figure 6: How much backwards compatibility with VoiceXML 2.0 should VoiceXML "V3" have?

The features identified as crucial are mainly those already identified by the W3C. One key takeaway is that "V3" should be as compatible as possible with VoiceXML 2.0 if it is to be relevant to the industry.

Selecting a VoiceXML Platform

Next we asked what their top three factors were in selecting a VoiceXML platform. The responses are shown in Figure 9.

Figure 7: What factors are most important when selecting a VoiceXML 2.0 platform.

The answers seem reasonable. To do its job, a voice platform must be reliable, use an effective speech recognizer, and be affordable. Once these basic needs are met, it has to adhere to standards. Below these needs come lesser ones.

We next divided the thirty companies responding to this question into platform vendors (n=17) versus non-platform vendors (n=13). On most factors the two groups were in close agreement, but two factors showed interesting discrepancies:

Platform vendors tended to overrate capacity's importance relative to non-platform vendors.
Platform vendors vastly underrated the importance of debugging support relative to non-platform vendors.

This suggests that platform vendors should revisit their application debugging capabilities to ensure that they are satisfactory.

Conformance

When asked if they authored applications for multiple voice platforms, 16 respondents said yes, 11 no. The other seven did not answer, many because they don't author applications. For those who developed applications for multiple platforms (n=16), we asked how many of their applications needed to have separate versions maintained for each platform. Five maintained all applications for separate platforms, ten did not need to maintain separate applications, and one was in between. We were not able to ascertain whether or not conformance was the issue vs. other factors such as dependence on vendor extensions, ASR tuning properties, etc. This indicates that platform conformance, at least in the past, has been a serious issue for some respondents.

When they were asked explicitly if interoperability was a key issue, 11 companies said interoperability is "very important", seven said it was "important", and six said it was "somewhat important". Four felt it was not important, and three didn't answer.

We then asked about specific conformance areas, and got the results shown in Figure 8.

Figure 8: Which areas of conformance impact you and how severely?

The main factor impacting conformance was platform-dependent features. Application developers either explicitly take advantage of them, or are forced into using them (e.g., ASR tuning properties). The W3C is aware of these issues, and has standardized some of the more common areas of difference in VoiceXML 2.1 (e.g., the expr attribute on <script> and <grammar>). It should push ahead further in this area for "V3", for example by defining standards for speaker verification. Platform developers should eliminate dependencies that are not essential to their differentiation strategies.

The next highest issue was VoiceXML 2.0 conformance. This is strong justification for the VoiceXML Forum's Conformance program, and we expect to see this issue decline in importance as more platform vendors go through external conformance testing.

SRGS conformance was also a serious matter. The SRGS and SISR standards took much longer to gel than we expected in the old VoiceXML 1.0 days, and this is reflected in the level of incompatibilities reported. We expect this issue will decline in severity too, as platform and speech technology vendors implement the final versions of these standards.

Happily, improved conformance and interoperability are being driven by the W3C's Implementation Report tests for VoiceXML 2.0 and 2.1, SRGS, SSML, and SISR. They are also starting to be pushed by the Forum's new conformance program. And as VoiceXML applications are ported to new platforms and hosting services, the applications and the platforms are forced to iron out conformance problems. It will be interesting to see how this question is answered in our next survey.

Multimodal Applications

There is quite a bit of industry interest in multimodal user interfaces as a compelling way to improve user experience, especially on smaller devices. Multimodal interfaces typically combine a visual mode with a voice mode (and perhaps other modes like touch or gesture). As yet no standards have been established for multimodal languages, although SALT and the VoiceXML-based X+V have been put forward.

SALT uses the W3C standards for speech grammars (SRGS), text to speech markup (SSML), and semantic interpretation (SISR), but does not build on the W3C's VoiceXML standard. Whereas W3C's VoiceXML mainly uses declarative constructs to specify dialogs, SALT relies on low-level ECMAScript programming. SALT has been critiqued for its verbosity and other problems, though in some situations programming "close to the metal" has advantages.

For multimodal use, SALT needs to be combined with a visual markup language like HTML. The situation with VoiceXML is no different. The W3C has defined a standard visual markup and a container called XHTML. They have also specified standard mechanisms for integrating other kinds of markup into an XHTML container, mechanisms like modularization, namespaces, and XML Events. X+V is just a straightforward use of these mechanisms to add VoiceXML markup into the XHTML container. Leveraging the W3C's "standards stack" like this results in a clean model-view-controller architecture, where the views (visual and voice) are not commingled as in SALT, but independent.

We first asked companies if they were using X+V, and if so, how were they using it (Figure 9).

Figure 9: How are you using X+V?

This level of interest in X+V was broader than we anticipated. Only one of the companies that authored the X+V specification was in our sample. There were six deployed X+V applications this year and 28 in 2005. This seems low in comparison to VoiceXML, but entirely expected: VoiceXML can be reached from any of the three billion phones on the planet, but there are a relative handful of devices that can support multimodal interfaces. In a SALT ecosystem, there would be a similar proportion of voice-only and multimodal applications.

We next decided to look at our subgroup of SALT Forum companies and SALT developers to see what they were doing with X+V. We expected that even though they were enthusiastically using VoiceXML for voice-only applications, they would be planning to work mainly with SALT for multimodal applications (Figure 10).

Figure 10: How are you using X+V? (SALT-oriented respondents only.)

Surprisingly, six of the ten SALT-oriented companies either use X+V today, or plan to use it over the next 12 months. This is a sizeable level of interest.

We next asked "How important is it that a future multimodal markup standard be based on the VoiceXML 2.0 standard?" The results are in Figure 11, and show that of the 27 companies answering this question, 25 thought it at least somewhat important, while only two thought it not important.

Figure 11: How important is it that a future multimodal markup standard build on the VoiceXML 2.0 standard?
(All respondents)

We then looked at how our SALT-oriented group answered this question (Figure 12):

Figure 12: How important is it that a future multimodal markup standard build on the VoiceXML 2.0 standard?
(SALT-oriented respondents only.)

Interestingly, the SALT-oriented group was unanimous that it was at least somewhat important for the future multimodal markup language to build on top of VoiceXML 2.0. These results show that the voice community has a strong bias for open, accepted international standards. There is a clear mandate from our membership for the VoiceXML Forum to push for a VoiceXML 2.0-based multimodal standard.

Conclusions

Tools.Our Tools Committee has correctly seen the need to seed the growth of the VoiceXML tools industry by starting up open source efforts, working on data logging standards, and working on a meta-language for inter-tool communication.

Conformance: This Survey very clearly justifies the huge efforts the Conformance Committee is putting into setting up our conformance program. This remains one of the top goals of the Forum.

Education: The many Education Committee activities serve important purposes. The Developer Certification program, conference organization (including the tutorials), and VoiceXML Review e-zine all serve to educate developers in VoiceXML and related topics. As developers understand VoiceXML better, they tend to pressure platform providers to fix conformance problems, for instance.

General: SALT does not seem to be gaining much traction, so the industry will likely not see a counter-productive standards battle. Our membership has given us a clear mandate to lobby for a VoiceXML-based approach to multimodal markup.

Acknowledgements

This survey was a group effort by members of the Forum's Technical Council and members of the IEEE ISTO organization. I would especially like to thank Chris Cross of IBM for editing the survey questions, and Joni Brennan of the ISTO for creating the survey web site, managing the survey, and doing the initial collation of the results.

Rob Marchand of VoiceGenie analyzed the VoiceXML "V3" results, and wrote the report on them for the W3C. Other valuable input came from Dan Burnett (Nuance), Gary Jesenick (Lucent), Gerry McCobb (IBM), Ken Rehor (Vocalocity), Cindy Tiritilli (IEEE), and Les Wilson (IBM).

back to the top