VoiceXML Review - Columns - Speak & Listen

Volume 6, Issue 1 - Jan/Feb 2006

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org and be sure to read future issues of VoiceXML Review for the answer.

By Matt Oshry

Q: What were the problems VoiceXML was designed to address, the issues that motivated the development of VoiceXML?

A: The first problem was that the voice industry badly needed standards to grow and capture huge economies of scale. Before VoiceXML, each voice platform had its own programming language and APIs, and the resulting vendor lock-in made voice application development much costlier and riskier than it needed to be. Voice application developers had to commit to learning technologies and languages that could quickly become obsolete, while companies had to commit to vendors who might go out of business or whose products might not turn out to be a good fit. This made it quite hard to justify voice applications: investments had to result in immediate and significant economic benefits to offset the higher risk.

Once VoiceXML and related standards like SRGS, CCXML, and MRCP came on the scene, the situation improved dramatically. Developers can now commit to voice technologies that are widely used and that will be used for years to come, while companies can quickly switch voice platform vendors to find the best price points, features, and performance for their applications. Standards have also made it possible to use voice application hosting service, dramatically reducing the costs associated with low-volume applications. They’ve also reduced the cost of voice platforms by allowing them to be flexibly and quickly assembled. For instance, CCXML makes it easier to integrate advanced telephony features, while MRCP makes it easier to integrate speech engines. Standards also make it worthwhile to invest in tools, since they apply to an entire industry instead of to one company’s products.

The second problem VoiceXML was designed to address was the complexity of developing and deploying voice applications. The pioneers of the voice web at the Forum’s founding companies in the mid-1990s ralized that if the web model could be applied to voice applications, it would greatly simplify them and drive their costs down. What was needed was a markup language analogous to HTML, but geared to the unique characteristics of voice interaction, such as the need to handle temporal ordering, imprecise user input, and the transience of human auditory memory.

The web model brought two huge benefits to voice applications. First, it eased their development. VoiceXML unlocked a huge workshop full of web development tools that now could be used to craft voice applications. These tools include XML editors and parsers, J2EE servers, ASPs and JSPs, CGI scripting, AJAX, .NET, HTTPS, web caches, and legacy database integration. Development was also eased because voice applications in many cases could leverage the same back-end business logic used by HTML applications.

The second benefit of the web model was the new ease of application deployment. Proprietary approaches often required that applications be co-located on the voice platform along with telephony, speech engines, and other functions. The application had to be updated on each platform individually. Now, they only need to be updated once, on the web server. This is especially crucial for voice applications, which strongly benefit from continual small usability improvements.

Q: I have not been able to find any examples of VoiceXML code using the <object> tag. Can you give me a “hello world” example using the <object> tag?

A:You can find example VoiceXML markup using the <object> tag in the VoiceXML 2.0 specification. However, you are strongly encouraged not to use the <object> tag in your VoiceXML applications. The <object> tag is a sort of escape hatch that allows platform implementers to expose platform-specific functionality. Hence, by using it you can be assured your code will break if you try to run it on another VoiceXML platform. This is not a good thing!

Q: How can I connect my VoiceXML document with a Java program using the <object> tag. The Java program will store and retrieve speech data based on speech recognition results.

A: You don’t. We’ve already pointed out the problem with <object> in our answer to the previous question. The proper way to accomplish what you are describing is to post the data the VoiceXML browser collects to an application server by using the <submit> tag. Here is an example, taken directly from the VoiceXML 2.0 specification:

<?xml version="1.0" encoding="UTF-8"?> 
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.w3.org/2001/vxml 
    http://www.w3.org/TR/voicexml20/vxml.xsd">
<form id="weather_info">
   <block>Welcome to the weather information service.</block>
   <field name="state">
      <prompt>What state?</prompt>
      <grammar src="state.grxml"  type="application/srgs+xml"/>
      <catch event="help">
         Please speak the state for which you want the weather.
      </catch>
   </field>
   <field name="city">
      <prompt>What city?</prompt>
      <grammar src="city.grxml" type="application/srgs+xml"/>
      <catch event="help">
          Please speak the city for which you want the weather.
      </catch</field>
   <block>
      <submit next="/servlet/weather" namelist="city state"/>
   </block>
</form>
</vxml>

In this example, once the user speaks the state and city, the values of these input items are submitted to a weather servlet running on an application server for further processing. Though not shown, the servlet would obviously generate a VoiceXML document that plays back the weather conditions for the state and city the user spoke.

If your are fortunate enough to have access to a VoiceXML interpreter that supports VoiceXML 2.1, you can also utilize the new <data> element to accomplish this sort of thing. Take a look at Rob Marchand’s First Words column on the <data> element published earlier this year for more information.

back to the top