Volume 2, Issue 3- April/May 2002
   
   
 

W3C Natural Language Semantics Markup Language

By Deborah Dahl

What is the W3C Natural Language Semantics Markup Language?

[NOTE: This article is published with the express permission of Unisys Corporation. Unisys
Corporation retains ownership of the copyright of this article.]

Natural Language Semantic Markup Language (NLSML) is an XML-based markup for representing the meaning of a natural language utterance. With NLSML, applications can have a standardized way of representing the output from a speech recognizer in terms that are meaningful to a dialog manager such as a VoiceXML interpreter. Although NLSML isn't widely used yet, it has great potential for playing a role in supporting richer and more complex interactions between users and systems in the future.

Certainly, a semantics format that would be sufficient to provide a fully general representation of natural language meanings would be very ambitious. NLSML has a much less ambitious goal. In NLSML, meanings are represented in terms that are relevant just to a specific application. It was designed as the bridge between input components, such as speech recognizers, and voice browsers. That is, it conveys the semantics of users' utterances to the voice browser so that it can take the appropriate action based on what the user said. In particular, the kinds of information that are represented in NLSML are especially relevant to form-filling applications.

NLSML is being developed by the World Wide Web Consortium (W3C) as part of the activities of the Multi-modal interaction group. In addition to representing meaning, NLSML also includes some additional information about the circumstances under which the utterance was uttered and processed. While today the most common input device is a speech recognizer, NLSML was designed to be able to grow to accommodate the semantics of additional multi-modal input devices such as handwriting and keyboards. The first W3C Working Draft describing NLSML was published in November of 2000 by the W3C Voice Browser group. Because having a standardized meaning representation is especially important in a multi-modal context, responsibility for NLSML was transferred to the Multi-modal group when it was chartered in February, 2002.

What's in an NMSML Document?

There are three important components of an NLSML document: content, side information, and a data model.
We can see these illustrated in the following NLSML example:

<result> 
    <interpretation x-model=<http://dataModel> confidence=”100”>
         <instance>
            <airline>
               <to_city> Pittsburgh
               </to_city>
            </airline>

         </instance>
            <input mode="speech" confidence="0.5"
            timestamp-start="2000-04-03T0:00:00"            
            timestamp-end="2000-4-03T0:00:00.2">
            I want to go to pittsburgh
            </input>
      </interpretation>
</result>


Content:

Let's say we're in an airline reservations application, and the browser has just asked the user where s/he wants to go. The user responds I want to go to Pittsburgh. The semantics of the user's utterance is contained in the application-specific XML element called "<airline>", wrapped in the <instance> element. The specific XML elements within the instance (<airline> and <to_city> are application-specific, and are designed by the application developer. The fact that the details of the meaning representation are application-specific gives the developer a great deal of flexibility in representing the meanings of utterances. Generally, any element that's meaningful to the application and which can be generated by the input interpreter can be included in the meaning representation component of an NLSML document. The downside to this flexibility is that the developer is responsible for making sure that corresponding NLSML and VoiceXML information is consistent. If the NLSML result is going to be used in a VoiceXML browser, where, for example, there might be a corresponding "to_city" field, the NLSML elements have to match the appropriate VoiceXML fields. Right now there's no way to enforce consistency between the VoiceXML and NLSML markup. We'll see how this situation might improve below, when we discuss the data model component of NLSML.

Side Information:

The second major component of an NLSML document is what might be called "side information". Side information refers to information about the utterance and how it was processed. This includes such information as

  • Timestamps for when the utterance started and stopped, or even timestamps for the beginning and ending points of each word if there's a need to go down to that level of granularity. (included in the components of the <input> element)
  • The natural language interpreter's confidence in its interpretation. (included as the "confidence" attribute of the <interpretation> element.)
  • The speech recognizer's confidence in its recognition. If desired, this can even be represented down to the confidence on a per-word basis (in the <input> element)
  • The actual recognized utterance (in the <input> element)
  • The modality of the input, for example, speech, DTMF, or potentially "keyboard", "handwriting", etc. (in the <input> element's "mode" attribute)
  • Alternate interpretations (n-best) resulting from either speech recognition ambiguity (did the user say "Boston" or "Austin" ?) or natural language interpretation ambiguity (If the user said "fried onions and peppers", are the peppers to be fried or not?) (included as multiple <interpretation> elements)

Data Model:

The third major component of an NLSML document is an optional data model. As discussed above, the developer has complete flexibility in designing the XML elements that describe the utterance semantics. In essence, the developer is creating an XML language for representing the concepts and relationships among concepts that are meaningful to a specific application. Standard XML provides two ways of defining XML languages, Document Type Definitions (DTD's) and XML Schemas. DTD's are the older and more common way of defining XML documents, but suffer from a lack of expressive power and also from the fact that they are not XML documents themselves. XML Schemas are a newer, more powerful, and XML-based way of defining XML languages. If the developer chooses to provide a formal definition of the language used to express utterance meanings, a XML Schema can be provided as a data model. The main benefit of providing a data model for XML documents is that the documents can then be validated by other processors.

Continued...

back to the top

 

Copyright © 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).