Why is the VoiceXML approach important? First, the phone is important. There are over 1.5 billion phones in use, far more than there are Internet-connected computers. Phones are easy to use and don’t need to be booted up. Telephone networks are much more reliable than data networks.

Mobile phones are achieving large penetration rates too: unlike notebook computers and many PDAs, mobile phones are highly portable, inexpensive, and have long battery lives. Mobiles are a natural match for location-based applications. They can be used while driving (though not always safely).

Second, voice is important on the phone. Voice has always been the natural mode of communication for phones. Even though some mobiles have WAP/XHTML browsers, their small screens and keypads make micro browsers hard to use, especially while driving. The i-mode system is more compelling, though shares the same limitations.

But there are advantages to combining visual browsing and voice browsing. For instance, complex information is hard to remember when spoken to the user, but easy to remember if it is presented in a persistent visual form. And some misrecognitions of spoken input are easy to correct with keypad entry. Therefore, we should soon begin to see multi-modal applications deployed alongside pure visual applications and pure voice applications.

Third, the Internet is important to voice applications:

  • Voice application development is easier because VoiceXML is a high-level, domain-specific markup language, and because voice applications can now be constructed with plentiful, inexpensive, and powerful web application development tools.
  • Voice applications are now far easier to deploy. No longer must they reside on a special-purpose voice server in a proprietary “walled garden”: they can be placed anywhere on the Internet and accessed from any VoiceXML-compliant voice server.
  • Applications can be cleanly structured into service logic on the web server, and presentation logic, in VoiceXML pages delivered to the voice browser. This has many advantages, not the least of which is that a common application back end on the web server can serve up different types of presentation logic based on the user’s device. This factoring leads to huge savings.

Finally, voice, and therefore VoiceXML, is important for web devices other than the phone. For example, a voice actuated “universal remote” could have an on-board voice browser and VoiceXML content generated from all the devices in its vicinity. You could walk into your family room, pull the remote from your shirt pocket, press its push-to-talk button and say “stereo: off; television: what action movies are playing?”