The Interface
between Next-Generation
Application Servers and Media Servers: SIP and VoiceXML
The
innovation and reach of the Web combined with the power
of real-time voice opens up all kinds of possibilities
for enhanced services and applications. In the IP environment
of the next generation network (NGN), it is easier to
combine web content with real-time, interactive communications.
This is bringing about new types of converged services
that go far beyond the PSTN replacement services of
voice mail, messaging and IVR to wireless web media,
network gaming and web conferencing.
While
softswitches and media gateways form the foundation
of the access and transport infrastructure of a next
generation network (NGN), applications servers and media
servers are the core components of the emerging application
and enhanced services infrastructure. Application server
and media server components have evolved to power these
enhanced services. However, the interface from the application
server to the media server has not yet been fully defined.
This
article proposes Session Initiation Protocol (SIP) and
VoiceXML as the best interface between next-generation
application servers and media servers. It examines how
these open technologies enable flexibility and since
SIP and XML are already entrenched in the web development
community, how they will propel the development of more
applications and in turn reduce vendor lock in. Readers
should gain an understanding of the interface requirements,
challenges and details of the proposed interfaces and
their advantages
From
its inception the development of SIP was a very open,
collaborative effort. Having its origins in the Internet
Engineering Task Force (IETF) and university development,
SIP began as a standard for packet-based multimedia
conferencing. Other competing interface standards such
as MGCP and H.248 had different origins. MGCP, which
was originally developed to bridge communications between
the PSTN and IP networks, originated in a variety of
flavors that varied depending on the vendor putting
it forward. H.248, which was also developed as an interface
between standard telephony and IP, was developed through
the ITU process.
SIP is an application-layer control (signaling) protocol
for creating, modifying and terminating sessions with
one or more participants over a network. These sessions
go beyond simple conferencing to include content services
such as Internet telephone calls and multimedia distribution.
VoiceXML
is the standard with which voice response applications
are developed on the Internet. Although the inventors
of VoiceXML were thinking in terms of speech recognition,
there is nothing about VoiceXML that prevents it from
being used for other applications such as interactive
voice response. In fact, when coupled with SIP, VoiceXML
has been shown to be very applicable to other modes
of input such as touch-tone access.
SIP
and VoiceXML combined can be used together for initiating
and terminating sessions of all types, not just signaling
and control sessions but also content sessions. These
sessions could convey simple presence information such
as, 'I'm in my car now', meaning that my presence is
in the car so call me on my car phone or 'I'm at my
desk', meaning send the documents or other media to
me there. The ability to establish these sessions means
that a host of innovative services become possible and
economical such as, voice-enriched e-commerce, web page
click-to-dial, instant voice chat with buddy lists,
and IP Centrex services.
SIP
is a request-response protocol that closely resembles
HTTP. HTTP is the basis of the World Wide Web. Using
SIP, telephony becomes another web application and integrates
easily into other Internet services.
VoiceXML
is based on open web-based programming languages such
as XML and HTML. There is a much larger set of support
tools for HTTP, XML and HTML, such as XML editors, syntax
checkers, debuggers, etc. that don't exist for the other
more proprietary telecom protocols and languages. There
are many development packages for creating XML-based
applications. These packages are equally accessible
to anyone from the largest corporations to programmers
at home. The opposite is true from MGCP and H.248.
SIP
& VoiceXML require much less time and resources
to learn than the more proprietary PSTN-oriented protocols
such as MGCP and H.248. SIP and VoiceXML are familiar
to a wider base of programmers. The service development
time is also much shorter using these open Internet
standards.
SIP
and VoiceXML use a model that is very familiar to the
general IP workforce whereas the MGCP and H.248 model
is familiar to a much smaller, more specialized, group
of telephony programmers. This is a very small group
when contrasted with the number of web masters and Java
programmers worldwide.
Continued...
back
to the top
Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|