VoiceXML Review - Columns - Speak & Listen

Volume 2, Issue 2 - Feb./March 2002

By Jeff Kunins

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org and be sure to read future issues of VoiceXML Review for the answer.

Q: I've just started learning VoiceXML. What exactly is the FIA, and how much should I know
about it if a VoiceXML application?

A: The FIA is the Form Interpretation Algorithm -- it is the fundamental set of rules that define how control flows through a VoiceXML application as it executes. In a sense, VoiceXML *is* the FIA --- meaning that the FIA is the spec for how VoiceXML interpreters behave, and VoiceXML applications are a set of instructions that are fed into the FIA at runtime to produce the behavior you're looking for. More precisely, the FIA is the piece of VoiceXML that enables it to be a "declarative" language --- you just get to declare things like forms and fields, and the FIA provides a built-in set of procedural rules to follow when interpreting them. By contrast, a "procedural" language like C requires you to explicitly describe every piece of application logic along the way.

So what does this mean for you? It means that understanding the basic rules of the FIA is absolutely critical for understanding how your VoiceXML application will behave. This is the moral equivalent in HTML of needing to know about how things like frames and tables actually behave in Web browsers (like when you resize the browser window) in order to make Web pages that turn out the way you expect them to. The good news, though, is that the basics of the FIA are really quite intuitive and easy to understand.

For basic applications, the main things to understand are:

· Scoping of events, variable, and grammars. VoiceXML has tight hierarchical scoping. For example, variables declared at "document" scope (e.g. <var> as direct child of <vxml>) are visible to all other scopes within that document (e.g. a form or field). Similarly, events are handled by the innermost declared handler that matches that event. For example, if "nomatch" is thrown within a field, the local nomatch handler in that field is executed if it exists...if it doesn't exist, the FIA starts looking at each succesive parent scope until it either finds a handler. If it gets all the way up to application scope and there's no handler, it uses the built-in global handler (if there is one for that specific event).
Execution begins with the first <form> in a document, and ends with <exit> or when no transition is specified. You have to explicitly specify all transitions from one <form> to the next; you can't simply rely on execution to flow from one form to another in document order.
Within simple (e.g. not mixed-initiative) forms, execution in a given <field> loops until it's "filled" or you explicitly transition out of an event handler. This sounds complicated, but once again it's pretty intuitive. If you're asking the user to say something in a field, the prompts will play and then the interpreter will listen for a response. If the response matches an active grammar, the field is "filled" and execution passes to the <filled> element within that field. If there's a nomatch or noinput, that event is thrown and the innermost handler for the event catches it. If those handlers don't do anything fancy like play with form item variables or <reprompt/>, then the interpreter will just listen again (and again, and again, and so on) until the caller finishes "filling out" the form --- just like a form on a Web page that doesn't let you leave that page until you hit the "submit" button with the right information filled in in all the required fields.
There's lots of fancy stuff you can do to have fine-grained control of the execution. This includes modifying form item variables, having application scope <links>, making <fields> modal, using mixed-initiative forms, etc. You absolutely don't have to do any of these things to build simple applications, but many of them are extremely useful for building more sophisticated applications.
Read the docs, but play around a lot. Like with any programming language, the best way to get comfortable is to actually build and test things ---- there are several great online VoiceXML resource sites (see Rob Marchand's First Words column this month for extensive list of VoiceXML resources) with comprehensive documentation and examples, but the most important thing is to take what you read and actually test out lots of variations to get familiar and comfortable from first-hand experience.

Continued...

back to the top

More Questions on VoiceXML

By Jeff Kunins