VoiceXML Review - Feature Articles

[NOTE: This article is published with the express permission of Unisys Corporation. Unisys
Corporation retains ownership of the copyright of this article.]

Natural Language Semantic Markup Language (NLSML) is an XML-based markup for representing the meaning of a natural language utterance. With NLSML, applications can have a standardized way of representing the output from a speech recognizer in terms that are meaningful to a dialog manager such as a VoiceXML interpreter. Although NLSML isn't widely used yet, it has great potential for playing a role in supporting richer and more complex interactions between users and systems in the future.

Certainly, a semantics format that would be sufficient to provide a fully general representation of natural language meanings would be very ambitious. NLSML has a much less ambitious goal. In NLSML, meanings are represented in terms that are relevant just to a specific application. It was designed as the bridge between input components, such as speech recognizers, and voice browsers. That is, it conveys the semantics of users' utterances to the voice browser so that it can take the appropriate action based on what the user said. In particular, the kinds of information that are represented in NLSML are especially relevant to form-filling applications.

NLSML is being developed by the World Wide Web Consortium (W3C) as part of the activities of the Multi-modal interaction group. In addition to representing meaning, NLSML also includes some additional information about the circumstances under which the utterance was uttered and processed. While today the most common input device is a speech recognizer, NLSML was designed to be able to grow to accommodate the semantics of additional multi-modal input devices such as handwriting and keyboards. The first W3C Working Draft describing NLSML was published in November of 2000 by the W3C Voice Browser group. Because having a standardized meaning representation is especially important in a multi-modal context, responsibility for NLSML was transferred to the Multi-modal group when it was chartered in February, 2002.

There are three important components of an NLSML document: content, side information, and a data model.
We can see these illustrated in the following NLSML example:

<result> 
    <interpretation x-model=<http://dataModel> confidence=”100”> 
         <instance> 
            <airline>
               <to_city> Pittsburgh
               </to_city>
            </airline>

         </instance>
            <input mode="speech" confidence="0.5"
            timestamp-start="2000-04-03T0:00:00"
            timestamp-end="2000-4-03T0:00:00.2">
            I want to go to pittsburgh
            </input>
      </interpretation>
</result>

Let's say we're in an airline reservations application, and the browser has just asked the user where s/he wants to go. The user responds I want to go to Pittsburgh. The semantics of the user's utterance is contained in the application-specific XML element called "<airline>", wrapped in the <instance> element. The specific XML elements within the instance (<airline> and <to_city> are application-specific, and are designed by the application developer. The fact that the details of the meaning representation are application-specific gives the developer a great deal of flexibility in representing the meanings of utterances. Generally, any element that's meaningful to the application and which can be generated by the input interpreter can be included in the meaning representation component of an NLSML document. The downside to this flexibility is that the developer is responsible for making sure that corresponding NLSML and VoiceXML information is consistent. If the NLSML result is going to be used in a VoiceXML browser, where, for example, there might be a corresponding "to_city" field, the NLSML elements have to match the appropriate VoiceXML fields. Right now there's no way to enforce consistency between the VoiceXML and NLSML markup. We'll see how this situation might improve below, when we discuss the data model component of NLSML.

The second major component of an NLSML document is what might be called "side information". Side information refers to information about the utterance and how it was processed. This includes such information as

The third major component of an NLSML document is an optional data model. As discussed above, the developer has complete flexibility in designing the XML elements that describe the utterance semantics. In essence, the developer is creating an XML language for representing the concepts and relationships among concepts that are meaningful to a specific application. Standard XML provides two ways of defining XML languages, Document Type Definitions (DTD's) and XML Schemas. DTD's are the older and more common way of defining XML documents, but suffer from a lack of expressive power and also from the fact that they are not XML documents themselves. XML Schemas are a newer, more powerful, and XML-based way of defining XML languages. If the developer chooses to provide a formal definition of the language used to express utterance meanings, a XML Schema can be provided as a data model. The main benefit of providing a data model for XML documents is that the documents can then be validated by other processors.

W3C Natural Language Semantics Markup Language

By Deborah Dahl