|
W3C
Natural Language Semantics Markup Language
What
is the W3C Natural Language Semantics Markup Language?
[NOTE: This article is published with the express permission
of Unisys Corporation. Unisys
Corporation retains ownership of the copyright of this
article.]
Natural
Language Semantic Markup Language (NLSML) is an XML-based
markup for representing the meaning of a natural language
utterance. With NLSML, applications can have a standardized
way of representing the output from a speech recognizer
in terms that are meaningful to a dialog manager such
as a VoiceXML interpreter. Although NLSML isn't widely
used yet, it has great potential for playing a role
in supporting richer and more complex interactions between
users and systems in the future.
Certainly,
a semantics format that would be sufficient to provide
a fully general representation of natural language meanings
would be very ambitious. NLSML has a much less ambitious
goal. In NLSML, meanings are represented in terms that
are relevant just to a specific application. It was
designed as the bridge between input components, such
as speech recognizers, and voice browsers. That is,
it conveys the semantics of users' utterances to the
voice browser so that it can take the appropriate action
based on what the user said. In particular, the kinds
of information that are represented in NLSML are especially
relevant to form-filling applications.
NLSML
is being developed by the World Wide Web Consortium
(W3C) as part of the activities of the Multi-modal interaction
group. In addition to representing meaning, NLSML also
includes some additional information about the circumstances
under which the utterance was uttered and processed.
While today the most common input device is a speech
recognizer, NLSML was designed to be able to grow to
accommodate the semantics of additional multi-modal
input devices such as handwriting and keyboards. The
first W3C Working Draft describing NLSML was published
in November of 2000 by the W3C Voice Browser group.
Because having a standardized meaning representation
is especially important in a multi-modal context, responsibility
for NLSML was transferred to the Multi-modal group when
it was chartered in February, 2002.
What's
in an NMSML Document?
There
are three important components of an NLSML document:
content, side information, and a data model.
We can see these illustrated in the following NLSML
example:
<result> <interpretation x-model=<http://dataModel> confidence=100> <instance> <airline> <to_city> Pittsburgh </to_city> </airline>
</instance>
<input
mode="speech" confidence="0.5"
timestamp-start="2000-04-03T0:00:00"
timestamp-end="2000-4-03T0:00:00.2">
I
want to go to pittsburgh
</input>
</interpretation>
</result>
|
Content:
Let's
say we're in an airline reservations application, and
the browser has just asked the user where s/he wants
to go. The user responds I want to go to Pittsburgh.
The semantics of the user's utterance is contained in
the application-specific XML element called "<airline>",
wrapped in the <instance> element. The specific
XML elements within the instance (<airline> and
<to_city> are application-specific, and are designed
by the application developer. The fact that the details
of the meaning representation are application-specific
gives the developer a great deal of flexibility in representing
the meanings of utterances. Generally, any element that's
meaningful to the application and which can be generated
by the input interpreter can be included in the meaning
representation component of an NLSML document. The downside
to this flexibility is that the developer is responsible
for making sure that corresponding NLSML and VoiceXML
information is consistent. If the NLSML result is going
to be used in a VoiceXML browser, where, for example,
there might be a corresponding "to_city" field,
the NLSML elements have to match the appropriate VoiceXML
fields. Right now there's no way to enforce consistency
between the VoiceXML and NLSML markup. We'll see how
this situation might improve below, when we discuss
the data model component of NLSML.
Side
Information:
The
second major component of an NLSML document is what
might be called "side information". Side information
refers to information about the utterance and how it
was processed. This includes such information as
- Timestamps
for when the utterance started and stopped, or even
timestamps for the beginning and ending points of
each word if there's a need to go down to that level
of granularity. (included in the components of the
<input> element)
- The
natural language interpreter's confidence in its interpretation.
(included as the "confidence" attribute
of the <interpretation> element.)
- The
speech recognizer's confidence in its recognition.
If desired, this can even be represented down to the
confidence on a per-word basis (in the <input>
element)
- The
actual recognized utterance (in the <input>
element)
- The
modality of the input, for example, speech, DTMF,
or potentially "keyboard", "handwriting",
etc. (in the <input> element's "mode"
attribute)
- Alternate
interpretations (n-best) resulting from either speech
recognition ambiguity (did the user say "Boston"
or "Austin" ?) or natural language interpretation
ambiguity (If the user said "fried onions and
peppers", are the peppers to be fried or not?)
(included as multiple <interpretation> elements)
Data
Model:
The
third major component of an NLSML document is an optional
data model. As discussed above, the developer has complete
flexibility in designing the XML elements that describe
the utterance semantics. In essence, the developer is
creating an XML language for representing the concepts
and relationships among concepts that are meaningful
to a specific application. Standard XML provides two
ways of defining XML languages, Document Type Definitions
(DTD's) and XML Schemas. DTD's are the older and more
common way of defining XML documents, but suffer from
a lack of expressive power and also from the fact that
they are not XML documents themselves. XML Schemas are
a newer, more powerful, and XML-based way of defining
XML languages. If the developer chooses to provide a
formal definition of the language used to express utterance
meanings, a XML Schema can be provided as a data model.
The main benefit of providing a data model for XML documents
is that the documents can then be validated by other
processors.
Continued...
back
to the top
Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|