Volume 5, Issue 4 - July/Aug 2005
 
   
   
  VoiceObjects

Turning Too Much into Just Enough:
A Professional Approach to VoiceXML Application Management

By Tiemo Winterkamp

Introduction

Abstract: The growing use of voice-enabled applications and the increasingly sophisticated capabilities demanded of these applications have challenged enterprises’ abilities to manage their voice-driven self-service systems. A voice application management system (VAMS) with a voice application server at its hub is presented as the solution, and the capabilities and characteristics of a fully functioning VAMS are detailed.

The deployment of VoiceXML applications has grown increasingly popular, as enterprises seek to improve customer relationships and trim customer support costs. Datamonitor has forecasted the growth of the worldwide market for VoiceXML-enabled platforms at a CAGR of 20%, which indicates a market size of over $300 million by 2009. More interestingly, this market growth is a function not only of larger numbers of enterprises seeking to capture the benefits of voice, but also of a trend toward the deployment of larger, more ambitious applications and the deployment of self-service portals offering callers multiple applications. Gartner reports that the number of ASR (Automated Speech Recognition) ports deployed increased from about 98,000 in 2002 to more than 150,000 in 2004.

A parallel trend has been established by early adopting enterprises seeking either to multiply the benefits they have already enjoyed by increasing the size and scope of their deployments or to replace a poorly performing voice-automated system with an even more substantial one. Postbank, Germany’s largest retail bank, actually illustrates both cases. Over two years, it struggled with two marginally successful voice banking applications before starting from scratch and launching a comprehensive voice banking portal that handled more than 100,000 calls per day within its first two months in production. Encouraged by this success, Postbank has publicly announced its intention to double the capacity of its voice portal and expand its functionality over the next several months.

Alongside these usage trends, enterprises are demanding more and more sophistication from their voice-enabled applications, especially in three areas. First, complete integration with the balance of the enterprise’s IT infrastructure is required. Voice applications must coordinate with remaining call center resources via CTI (Computer Telephony Integration), and they must be integrated and consistent with an enterprise’s Web presence. Back-end integration with corporate databases and data-driven applications, such as CRM, is also a must. Second, dynamic personalization of the application and its voice user interface is increasingly required, as the Web continues to demonstrate the power of personalized, 1-to-1 customer contact. Finally, voice applications are being deployed in multiple languages, to reach broader audiences, and with multiple personae, to complement and enhance enterprises’ other branding efforts.

A consequence of the confluence of these trends is that many enterprises are becoming overwhelmed by the complexity they have created. For example, simply managing the massive number of prompts, in multiple languages and as delivered by multiple personae, required by a multi-application portal becomes a significant challenge in light of modern update cycles. Enterprises that grow their voice-driven capabilities incrementally from modest beginnings commonly end up with an unmanageable “spaghetti” call flow, with each later application built off of the trunk of the former. Worse, in many of these cases, valuable business intelligence is thrown away because the voice applications aren’t systematically integrated with each other and the enterprise’s call center, databases and other applications.

Additionally, notwithstanding the good intentions of enterprises seeking to harness the power and flexibility of the VoiceXML standard, a technology- and vendor-“lock-in” takes hold as the growing complexity is addressed ad hoc. Switching to or adding alternative ASR, CTI, TTS (Text-to-Speech) or other resources becomes practically impossible, eliminating one of the biggest benefits of coding to the VoiceXML standard in the first place.

Fortunately, there is good precedent for meeting this challenge. It comes from two sources. First, in the case of the World Wide Web, we observed that, as newly developed Java was added to Web clients, the clients got fat, complex and impossible to maintain. In response, there was a migration to thin clients, server-based business logic and server-based application logic. The resulting server-based architecture has since proven stable and flexible enough to accommodate several successive generations of Web technologies. It is reasonable to believe that a similar architecture might be useful in meeting analogous challenges presented by voice applications. Second, a professional approach to IT management recognizes that lifecycle management is fundamental to managing any IT resource. Synthesizing these lessons, it becomes clear that a voice application management system (VAMS), featuring a voice-application-server-based architecture and dedicated tools for managing the unique lifecycle of sophisticated voice-driven applications, is a necessary foundation for any successful voice-driven system.

A VAMS is the embodiment of a middleware platform architecture that helps automate and structure systems development for voice applications while simultaneously using the existing infrastructure of Web application servers and business rules in these servers and adjacent enterprise back-end systems. A voice application server is the central hub of this architecture. Each voice application server must have at least the following six capabilities:

  • Dynamic generation of VoiceXML at call time to facilitate personalization

  • Easy integration with databases and other back-end systems

  • Easy reuse of existing business logic, to ensure that voice applications are consistent with other communications channels, such as the Web

  • Operation, administration, monitoring and logging functionality, to aid in optimizing application deployments and for system maintenance

  • Analytical capabilities that support dynamic dialog adaptation for better-personalized dialogs

  • Seamless interface to the VAMS’ reporting capabilities, to evaluate and optimize system performance

Every useful VAMS and voice application server will exhibit several characteristics. They will support standards, including VoiceXML, and will abstract the dialect differences — often driven by custom extensions - exhibited by competing vendors’ implementations. It must be easy to dynamically add voice application servers to the configuration to meet increasing caller demand, and the overall request load must be automatically, optimally distributed among the servers so that resources are fully utilized. Likewise, voice application servers must be able to fail or be removed without impacting system availability. Finally, system administrators must be presented with a single logical resource for easy management.

All of the server components in a modern voice application environment, including the VoiceXML interpreter, the voice application server, the Web application server, the database server, the speech recognition engine, text-to-speech resources, etc., are linked through an IP network and can therefore be geographically distributed. Thus, communication between each voice application and the voice application server is analogous to that in the desktop Web browser model. This design provides enterprises control of voice application development while reusing existing components of the network.

Conclusion

Voice Application Management Systems (VAMS) and their voice application servers are as essential to best IT practices as Web content management systems and Web application servers. No other approach adequately addresses the complexity resulting from enterprises’ deployment of multiple, sophisticated, voice-driven applications. What’s more, the server architecture provides other benefits, like integrated performance analysis and easy maintenance and back-end integration.

***

Tiemo Winterkamp is Vice President of Strategy and Market Research at VoiceObjects, the worldwide leader in Voice Application Management Systems (VAMS). The VoiceObjects X5 product portfolio enables companies to easily create, test, deploy and analyze voice applications with the industry's best IDE on a carrier-grade, server-based platform. The company has strategic partners worldwide, including Comverse, Danet, Genesys, IBM, NextiraOne, SAP, Softlab, T-Com, T-Systems and VoiceGenie. The company is headquartered in Cologne, Germany and has offices in the United States and in the United Kingdom. For more information, please visit www.voiceobjects.com.

  back to the top

Copyright © 2001-2005 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).