Volume 7, Issue 1 - April/May 2007
 
   
   
 

The 2006 VoiceXML Forum Survey

Mark Eichten and Jim Ferrans

Introduction

Last summer we surveyed our membership to see how they were using VoiceXML, what they saw for the future, and how the Forum could better help them.  We’d like to thank everyone who participated in taking the survey for their time and input.

There continues to be strong momentum towards the use of VoiceXML as the standard markup language for voice applications: the transition from proprietary Interactive Voice Response (IVR) applications is in full swing, and companies both large and small are adopting the language, and deploying VoiceXML applications on their own platforms or through hosting companies.  This is apparent in the Forum’s current membership, and through mechanisms such as this survey we hope to continue to provide our members a good venue to help them participate in our industry’s continued growth.

The survey was divided into several subcategories, and this summary is broken out in a similar fashion to frame the findings.

Who Took the Survey?

We asked people for optional contact information, and of the 41 respondents, 17 remained anonymous.  This was a much more secretive group than those that took our 2004 survey, when only four of the 31 respondents chose anonymity.  We strongly suspect that this is due to the substantial commercial uptake of the standard over the intervening two years: companies don’t want to give competitors too much information.  This trend will probably continue, since more and more VoiceXML is deployed.  In this year’s survey, 19 of the 41 respondents reported on their current and future VoiceXML applications, expecting them to more than double from 183 in 2006 to 398 twelve months later.

A wide range of respondents completed the survey this year, including both large and small companies.  The respondents were drawn from the spectrum of employees from executives to application developers as well as Forum members and non-members.  Universities and companies not yet utilizing IVR applications were also surveyed for their needs and inquiries.  Compared to our 2004 survey (http://www.voicexmlreview.org/Sep2004/features/sep2004_survey.html), the companies identifying themselves in this year’s survey tend to be much smaller and less well known.  We’ve also seen a significant increase in academic respondents this year.

Another difference from 2004 was that we opened up the 2006 survey to non-Forum members.  Twenty-nine respondents answered whether or not they were Forum members: of these, ten (34.5%) said yes, and 19 (65.5%) said no.  Of the non-members, six (20.7% of the total), said they planned to join the Forum in the “near future”.  The VoiceXML community thus appears to be expanding as the language continues to mature.  We believe the adoption rate will continue to increase.

Companies’ Use of VoiceXML

The respondents currently use VoiceXML in a multitude of ways (Figure 1).  Of the 31 companies answering this question, 18 (58.1%) develop applications for their own use, 18 (58.1%) develop applications for other companies, a full 12 (38.7%) develop VoiceXML platforms for other companies, 8 (25.8%) sell VoiceXML tools, and 7 (22.6%) host applications for other companies.

3_1

Figure 1: Companies’ use of VoiceXML

VoiceXML 2.1 is experiencing a surge of uptake.  At the time of the survey (mid-2006), 25% of respondents were already using 2.1, 35.7% were planning on starting to use it in the next six months, and another 14.3% were planning to begin using it in the next year.   The remaining 25% either had not made plans yet, or were planning to use it a year or more out.

Strengths and Weaknesses of VoiceXML

VoiceXML’s strengths mentioned, in order of frequency, were application portability due to open standards, the greater ease of developing and deploying applications using standard web infrastructure, the strength of the community growing around the standard, greater flexibility, and greater leverage from tools.

Platform portability remains a weakness.  The standard is not simple, and some features can be interpreted in various ways.  Thus moving to a new voice platform can require a non-trivial amount of porting.  Proprietary extensions are also an issue.  This is why the Forum’s platform conformance program remains crucial, and why the Conformance Committee is developing related conformance programs for SSML and SRGS.

The scope of VoiceXML gives rise to other perceived weaknesses.  One respondent points out its lack of “features to make it a full IVR/contact center programming language”.  Another mentions that it lacks “fax detection and handling”, controls “for controlling speed and volume of audio playback”, and features for communicating with external entities like a CCXML interpreter or another browser.  A third cites the inability to do fine-grained re-recognition on an utterance to do rescoring. 

These scope issues revolve around architectural principles of what properly belongs in the markup language as opposed to the web application generating that markup, how much detailed control should the developer have over the user interface, and should the markup language renderer have connections into other components.

It might help to draw on the analogy to HTML.  This would suggest that a “full IVR programming language” is probably outside of VoiceXML, eg, it is probably a metalanguage being interpreted on the web server and generating VoiceXML pages as a byproduct.  The analogy would also tell us that we should limit the amount of fine-grained control, that VoiceXML is for the great majority of cases, and that a standard programming language like C++ should be used for the exceptions.  Finally, the analogy cautions us to not be too quick to connect markup language rendering with other components, and when we do so to do it via bindings in the DOM, AJAX, and other standard mechanisms.

We’re keeping the W3C apprised of these concerns however, and it is definitely taking such feedback into account as VoiceXML 3.0 begins to take shape.  The members of the Forum’s Tools Committee are actively looking at metalanguages, as a number of companies are already working with such languages and middleware to generate VoiceXML dialogs.

Next, as VoiceXML continues to evolve, issues of backward compatibility arise.  We asked people how critical this was for VoiceXML 3.0 (see Figure 2).

4_9

Figure 2: VoiceXML 3.0 and backward compatibility

Eighteen people answered this question.  Six wanted full backward compatibility, another eight wanted “convertible compatibility” so that tools could apply any needed change or implementations could continue to process older content at run-time.  Another one wanted “look and feel” compatibility, while the remaining three didn’t feel a need for compatibility.  Comparing this question with the same one in the 2004 survey, the impression we get is that while compatibility remains a key concern, companies are slightly more flexible about changes today.

VoiceXML 2.0 Platforms

What are the criteria to use when selecting a new VoiceXML platform?  The results are given in Figure 3.  The top three criteria mentioned by the participants are: the level of adherence to the language standard, the cost per port to deploy, and the availability of a high quality speech recognizer.  The goal of the VoiceXML standard is to make the language platform agnostic, and the community is working towards this objective, but the reality that our respondents find is that vendor implementation differences and extensions are still barriers to portability.  This is something that the Forum will need to continue to push with its conformance program, though companies will probably still seek differentiation through extended features.

5_1

Figure 3: Important selection criteria for VoiceXML platforms

Comparing this against our 2004 data, the results are nearly identical.  There’s a slight sense that reliability is a bit more taken for granted, and that as a result cost/capacity issues are somewhat more important.  When we tried to tease out the issues around cost/capacity by asking about performance measurement, we found that 73.4% of respondents said that a performance benchmarking program was very important or important, and an additional 20% said it was somewhat important.  The Forum should carefully consider how to deploy a performance benchmarking program.

Speech recognizers used include: Nuance, Scansoft, Loquendo, ViaVoice, SpeechPearl, Dragon, Phonetic, and Fonix.  Speech synthesizers include Nuance, Scansoft, ViaVoice, Loquendo, Microsoft, Elan, and Fonix. 

VoiceXML 2.0 Conformance

There is great interest in our conformance efforts, and in expanding its coverage.  In addition to SRGS and SSML conformance, the Call Control eXtensible Markup Language (CCXML) for call management and the Media Resource Control Protocol for controlling speech engines are seen as integral parts of many voice platforms, and hence seem important extensions for our VoiceXML Platform Conformance efforts.

When asked about how important it is that a voice platform go through conformance testing, 66.7% said it was very important or important and a full 93.3% of respondents said it was at least somewhat important.  Slightly over half of respondents authoring VoiceXML applications report needing to support multiple voice platforms, and of these, about half need to have all their applications run cross-platform.

We then asked what areas of conformance were most crucial (see Figure 4).  The most serious impact on conformance is the presence of platform-dependent properties, object elements, and other platform-dependent features.  Platform vendors should seek to minimize these extensions (eg, by dropping support for proprietary properties where there is a VoiceXML equivalent), developers should use discipline to avoid these features, and the W3C should study these extensions carefully to see which (like speaker verification) ought to be standardized.

After platform dependencies, the next most serious concern is about how well VoiceXML interpreters conform to the VoiceXML 2.0 specification.  SRGS and SSML conformance follow next.  There was moderate concern for VoiceXML 2.1 conformance.  Relative to our 2004 survey, basic 2.0 conformance is less of a worry, while overall platform-dependence remains a key concern.  The Forum is working on an effort to increase it breadth of support for SRGS and SSML conformance as well as VoiceXML 2.0 and also for VoiceXML 2.1 as that standard reaches Recommendation status.

6_4

Figure 4: Areas of conformance and their impact

VoiceXML Tools

An advantage of a standard is that it makes the economics of related tool development more attractive.  In the case of VoiceXML, this includes basic tools like syntax-aware editors, more ambitious ones like grammar development and visual dialog development tools, and even large efforts like metalanguages.

We asked people what proportion of their applications used server software to dynamically generate VoiceXML, as opposed to serving up static VoiceXML pages.  Of those that answered, 15.4% said that “nearly all” their content was static, another 15.4% said that “most” of their content was static, and the remaining 69.2% used mostly dynamic content. 

Of nine respondents that reported using tools for authoring voice applications, two used tools that never require you to work at the VoiceXML level, while the other seven used tools that either enabled you to do some authoring at the VoiceXML level or else required you to always do so.  Two of these nine people reported that their tools generated VoiceXML at compile time, six said their tools generated the VoiceXML at execution time, and the last person said their tools generated VoiceXML at both compile time and execution time.  Of the tooling that generated VoiceXML at run time, five used a “custom run-time engine” and two used a “generic run-time engine”.

VoiceXML 2.1 adds features like the ‘data’ and ‘foreach’ elements to allow applications to use more static content (and improve efficiency). Fifteen respondents answered a question on their use: five used these elements already, three did not, and seven didn’t know.  One person pointed out that a ‘foreach’-like construct is also needed for speech grammars.

Several companies have actually gone out and developed their own metalanguages to generate VoiceXML dialogs from more abstract representations.  Three of our respondents use such languages.  It would be ideal for companies that need to use a metalanguage to use one based on an industry standard.  Thus the Forum’s Tools Committee should continue pursuing the development of a standard VoiceXML metalanguage, as this will allow the industry to benefit as a whole.

Data Logging

Another effort of the Tools Committee is a standard format for logging the data generated by the voice server.  We asked a series of questions about this area.

The respondents see the Forum’s work on items such as a Data Logging standard as key to the future growth of toolsets.  The most prominent purpose for data logging is seen as system monitoring followed by application performance tuning and application debugging.  Other relevant purposes are speech recognition performance and business intelligence (see Figure 5). 

8_1

Figure 5: Current purposes of logging data

The VoiceXML ‘log’ element is used to generate log entries by six of 15 people (40%), and another seven (46.7%) say they have plans to leverage this feature.  Only two (13.3%) of the respondents do not expect to leverage this feature, ie, their data logs are generated at the platform level exclusively, not the markup level.

Asked whether they used tools to analyze logged data, of the fourteen respondents working in this area, nine (64.3%) do, the others manually examine logs.  Do platforms aggregate logs from multiple sources into a single log?  Yes, in 75% of the cases.  Is there a need for a standard log format that would help this analysis and aggregation?  Yes, said 58.3% of the respondents.

VoiceXML Accessibility

Accessibility to voice services is an issue for some of us due to accents, hearing problems, cognitive issues, and so on.  Well-designed platforms, services, and standards can help more people make use of voice services.  To take one example, a deaf person may need to interact with a voice service indirectly via a TDD device to a human operator: if the voice service has aggressive timeouts, the experience can be quite frustrating.  The Forum has begun an Accessibility effort to help the community address these needs.

Some respondents (five of fifteen) try to make their applications accessible by users with disabilities, but three know their applications aren’t accessible, and seven others don’t know.  Some respondents report they’ve consciously used VoiceXML to open up services for people with visual impairments.  A small majority of people aware of accessibility issues are not aware of government regulations in this area.  Clearly there is a role for clear guidelines and tutorials on how to make voice applications accessible to people with disabilities: 92% of respondents want to have these, and 42% say they don’t yet have the technical know-how.  The Forum needs to publish such materials. 

As an aside, the Forum is looking for companies and individuals to help out in this effort, and the Accessibility Committee is open to anyone who would like to help the community, whether or not they work for a member company.  This is a very important area.

Multimodal User Interfaces

Multimodal user interfaces combine two or more user interface modalities.  The most common case is adding a voice modality to a visual modality, but the term also encompasses modalities such as touch screen, handwriting recognition, haptic input (via motion, eg) and output (vibrations, eg), and so on.  Well-designed multimodal applications can greatly increase usability and accessibility.

Research in this area has been going on for years, and we’re now seeing more and more commercial and standards activity, and there are some very effective prototype systems.  We expect that commercial use of multimodal interfaces will grow substantially from today’s small base.  The Forum will continue to monitor the usage and deployments of multimodal user interfaces, and we are looking for possible opportunities to aid in the growth of this area.

XHTML+Voice (X+V) is one early standard for multimodal interfaces, with the voice interface relying on VoiceXML and the visual interface handled by a web browser like Opera.  The survey asked about X+V’s use, and five of fourteen respondents are either working with it now or plan to in the next year. 

More generally, six respondents are working in multimodal, and they would like to see these kinds of interfaces on a wide variety of devices, even tablets, kiosks, and desk phones.  Nine people had opinions on whether multimodal systems should leverage VoiceXML: four thought it very important, one important, two somewhat important, and two not important.

Activity in multimodal interfaces of all types is still relatively low among this group of respondents.  We think this indicates that people are waiting for standards to emerge, for tools and platforms to become available, and for a few signature multimodal experiences to appear and ignite commercial interest (mobile search, transcription of email and SMS messages, directory assistance, eg).

Development Practices

The VoiceXML developer community continues to grow, and with this growth the use of the language continues to evolve.  Through this evolution developers continue to try to develop portable applications by not using any proprietary extensions.  They are deploying more dynamically generated code: code generated at run-time instead of being statically hard coded. 

There is a split down the middle between respondents that reuse code and those that always develop new code.  A little over half, 53.8% of respondents, feel that the Forum should not create reusable components for everyone’s use such as: grammars, dialogs, packaged applications, however, 46.2% thought it would be a good idea and those components could be placed in the framework of the VoiceXML Forum’s website.

VoiceXML Forum Website

We next asked about our website.  Respondents see the VoiceXML Forum’s website as providing a valuable networking tool.  Yet it could be improved upon.

Respondents currently aren’t visiting the site as much as we’d like, on average about once a month.  Some think the content is good, though one thought it “not representative of the industry as a whole”.  The look and feel looked fine to some, but “dull” to another.  The message board is currently used by very few respondents.  People want to see more technical content, vendor information, and application examples along with how-to’s on the web site.

This reaction is about what we expected, and it mirrors our own thinking.  One part of the problem is inflexibility, both in the web technology we currently use, and in our organization.  To post a new item of interest, the Forum needs to convey a change request to the IEEE’s ISTO, a management organization for industry and standards groups.  The ISTO then has to forward it to the web design firm.  This process can take weeks.  The solution is to add in new technologies that support self-editing, such as blogs and Wikis, and then use these to directly manage sections of the site.  We also want to hook up RSS feeds so that people can subscribe to them and immediately find out when we have something new on the site.  We hope to bring about these changes in 2007.   

Beyond this inflexibility, we need to be sure that we state our objectives clearly and then create content that builds upon and clarifies these goals.  For instance, we would like to address a broader range of industry concerns rather than just VoiceXML, so we’d like to see the web site be a resource for information about products and services such as ASR and TTS systems, available platforms, tools, hosting options, etc.  The Forum could implement a best practices portal which would include: archives of articles, collections of sample VoiceXML documents and SRGS grammars, etc.  We would like to open up this section to allow for customization and use by members.

We need to also determine what we can do to increase the use of the message boards, perhaps by improving delivery options (better digests, RSS feeds), but more likely by ensuring that each board is properly moderated to ensure questions are answered.  Spam attacks have periodically hit them, so better defenses need to be put up.

Another area for improvement is to help people understand the various standards processes.  For instance, the W3C standards start at the First Working Draft stage and eventually reach the coveted Recommendation.  We need to define these various stages to help our members understand the W3C process, to better help them determine when it’s safe to build out implementations that use W3C standards.  We likewise could describe the IETF standards, how accessibility guidelines and standards are set.

VoiceXML in Education

Education is a key area of our activities, and some educational institutions are beginning to get more involved in VoiceXML and speech applications.  We asked a number of questions to see how we can improve in this regard.

First, people were interested in having the Forum set up an academic membership level for institutions: 69.3% thought it was at least somewhat important.  In addition, 53.8% wanted to see individual academic memberships.  The argument for this is that there are many benefits that would be realized, for example it would help to grow the VoiceXML developer base, increase the interest in the developer certification process, increase academic interest in speech applications as a study/research area, and increase the Forum membership and resources.

On developer certification, 46.2% of respondents say they would be influenced in a hiring decision if the candidate was a certified VoiceXML developer.

The VoiceXML Review ezine was visited by only 30.8% of respondents in the two months prior to the survey.  When asked about relevant content, 54.5% thought the content was “relevant”, 18.2% “not relevant”, and 27.3% “undecided/neutral/not applicable”.  People want to see more long articles with more technical content, and “fewer vendors”. The Forum needs to understand why this outlet is not being used more, and make the necessary changes. We’ve been trying to improve matters: one recent issue had an article on how VoIP is affecting how voice applications are deployed and another was a short tutorial on the MRCP interface to speech engines. We need to keep doing this.  We should consider allowing people to post comments and feedback on the articles, and use RSS as a mechanism to notify people of new material.

Final Thoughts

The Forum will continue to strive to increase its presence in the community by delivering a clear message and providing great value to the industry as a whole.  We thank you again for all of your input, and we would like you to know that your feedback to the Forum is welcome anytime and it is always appreciated.

We’d also love to have your help. If you’d like to assist with the VoiceXML Review, contact their editorial staff.  Would you like to join an existing technical committee or help organize a new one?  Take a look here.  Do you work for a sponsor-level company and want to help out on our the Board of Directors, the Technical Council, or the Marketing Committee?  Please talk with your board member.  Interested in getting your company to join the Forum, check here (Supporter membership is very reasonable).  It’s even possible to help out if you don’t work for a member company: our Accessibility Committee is open to all, and other committees sometimes have “outside experts” from non-profits, etc.

Finally, we thank Paolo Baggia, James Larson, Ian Sutherland, Cindy Tiritilli, and Andrew Wahbe for their help in running this survey and analyzing the results.


  back to the top

Copyright © 2001-2007 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).