Volume 1, Issue 1 - January 2001
   
   
 

Answers to Your Questions About VoiceXML

By Jeff Kunins

(Continued from Part 1)

Q: Is VoiceXML focused on "screen scraping" Web sites to make voice applications?

A: VoiceXML does bring the Web development paradigm to the phone, and does enable businesses to efficiently re-use existing Internet infrastructure and back-end systems to quickly deliver powerful voice applications that are seamlessly integrated with their Web offering. In no way does this mean that VoiceXML applications are "converted," "transcoded," or "screen-scraped" HTML.

Dialogue as an interface to information and services presents many unique design challenges that demand deep expertise and ongoing refinement. While even the worst visual interfaces are at least somewhat usable, all but the best voice interfaces deeply frustrate and confuse callers. Traditional touch-tone IVR applications can be maddening, and while speech recognition makes it possible to produce outstanding interfaces that quickly and efficiently deliver self-service access to information and services, achieving this is still a complex art that demands specialized expertise and a deep commitment to quality.

In fact, there is no good way to "convert" HTML into VoiceXML. There are several companies that market platforms that attempt to do automatic "transcoding" from one interface markup language to another (e.g., from HTML to WML or VoiceXML), but generally these approaches produce generic, low-quality results that dramatically fall short of commercial viability. Think for a moment about how different a verbal conversation or a piece of movie dialogue is from a movie poster or a Web site. The information may be the same, and the underlying data may be the same, but the actual interface is extremely different.

VoiceXML is explicitly designed to support the description of these interfaces, just as HTML is explicitly designed to do so for visual interfaces. People continue to adopt a growing array of specialized personal communications devices, and businesses can realize tremendous reach and revenue gains by delivering "anytime, anywhere" access to services via the Web, PDAs, the phone, and other channels. VoiceXML and the Web development paradigm make this practical and cost-effective, empowering companies to author shared business logic once and invest new effort only in the specific user interface for each device they support.

Q: How does security work for VoiceXML?

A: Security is a fundamental concern faced by all companies delivering mission-critical services. Protecting customer privacy, corporate data, and network infrastructure across all technology layers that comprise today's multi-tiered distributed systems is a strict requirement. Equally important, services must also deliver convenient access with ever-faster performance.

VoiceXML, like HTML, does not inherently provide or prohibit security. Rather, it supports accompanying standards, such as HTTP, SSL, and cookies that make it possible to deliver secure solutions for mission-critical applications. Moreover, because VoiceXML is built upon existing Internet standards, companies can literally reuse their existing Web-based framework for security, authentication, and personalization when extending services to the phone.

Critical success factors for VoiceXML implementations and security are:

  • SSL secures HTTP transactions, just like the Web. SSL (Secure Sockets Layer), the technology that annually powers over $30 billion in secure e-commerce, can be used to secure network traffic between a VoiceXML "browser" and a business' Web servers.

  • Businesses can reuse their Web authentication architecture. Because VoiceXML applications are literally a new set of "pages" delivered by your Web servers, all existing code and infrastructure such as secure session management through SSL and cookies directly extends to the phone.

  • Businesses retain full control of corporate data and applications. VoiceXML platforms strictly act as the browser in traditional Web transactions. Businesses retain complete control of their data and application code at their facility. Vendors that offer outsourced VoiceXML infrastructure solutions are deeply incented, and typically contractually commit, to deal appropriately with sensitive log data collected on behalf of enterprise customers.

  • Network security is equally important as on the Web. Companies should employ industry best practices such as multiple isolated networks, firewalls, IP filtering, load balancing, and intrusion detection to ensure that data and service quality are not compromised.

  • People are comfortable talking on the phone. U.S. consumers did $430 billion in commerce over the phone in 1999, more than 10 times that of global Web purchases. People comfortably share their most confidential information (e.g. health history, stock trades) with live agents over the public telephone network. From a security perspective, using automated voice applications is equivalent or better than talking to customer service agents who may be anywhere and whose discretion is not guaranteed.
Q: What are "dynamic grammars," and why are they interesting?

A: One of the key benefits of Internet-powered speech applications is the ability to quickly create powerful and integrated services that leverage existing data and systems. Dynamically generated grammars make it possible to create applications that are always current. For example, a voice-activated dialing application could directly integrate with a corporate LDAP directory, giving callers instant access to the newest names and changes.

It is a fact that grammars must "compile" when first loaded by any speech recognition platform, and that very large grammars can take several seconds or more to compile. This is absolutely not unique to VoiceXML. The key difference is that VoiceXML makes it extremely easy for developers to smartly incorporate dynamic grammars in their applications. Developers building speech applications on any platform, including VoiceXML, must carefully plan the usage characteristics of their applications and smartly design when to use static versus dynamic grammars. For example, a catalog retailer could combine a static grammar for its permanent product line with a dynamic grammar adding in daily or seasonal specials to optimize performance.

back to the top

 

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).