VoiceXML is the HTML of the voice web, the open standard markup language for voice applications. VoiceXML harnesses the massive web infrastructure developed for HTML to make it easy to create and deploy voice applications. Like HTML, VoiceXML has opened up huge business opportunities: the Economist even says that “VoiceXML could yet rescue telecoms carriers from their folly in stringing so much optical fibre around the world.”

VoiceXML 1.0 was published by the VoiceXML Forum, a consortium of over 500 companies, in March 2000. The Forum then gave control of the standard to the World Wide Web Consortium (W3C), and now concentrates on conformance, education, and marketing. The W3C has just published VoiceXML 2.0 as a Candidate Recommendation. Products based on VoiceXML 2.0 are already widely available.

While HTML assumes a graphical web browser with display, keyboard, and mouse, VoiceXML assumes a voice browser with audio output, audio input, and keypad input. Audio input is handled by the voice browser’s speech recognizer. Audio output consists both of recordings and speech synthesized by the voice browser’s text-to-speech system.

A voice browser typically runs on a specialized voice gateway node that is connected both to the Internet and to the public switched telephone network (see Figure 1). The voice gateway can support hundreds or thousands of simultaneous callers, and be accessed by any one of the world’s estimated 1,500,000,000 phones, from antique black candlestick phones up to the very latest mobiles.

VoiceXML takes advantage of several trends:

  • The growth of the World-Wide Web and of its capabilities.
  • Improvements in computer-based speech recognition and text-to-speech synthesis.
  • The spread of the WWW beyond the desktop computer.