Are you ready for VoiceXML?

VoiceXML is a language for creating voice-user interfaces, particularly for the telephone. It uses speech recognition and touchtone (DTMF phone keypad) for input, and pre-recorded audio and text-to-speech synthesis (TTS) for output. It is based on the Worldwide Web Consortium's (W3C's) Extensible Markup Language (XML), and leverages the web methodologies for application development and deployment. Speech recognition application development is greatly simplified with VoiceXML which allows you to use familiar web infrastructure, including tools and Web servers. Instead of using a PC with a Web browser, any telephone can access VoiceXML applications via a VoiceXML "interpreter" (also known as a "browser") running on a telephony server. Whereas HTML is commonly used for creating graphical Web applications, VoiceXML is be used for voice-enabled Web applications.

One popular type of application is the voice portal which is used as a means to voice-enable a Web site. A voice portal is a telephone service where callers dial a phone number to retrieve information such as stock quotes, sports scores, and weather reports. Voice portals have been used in recent applications to demonstrate the power of speech recognition-based telephone services. Other application areas, including voice-enabled intranets and contact centers, notification services, and innovative telephony services, can all be built with VoiceXML.

VoiceXML and the voice-enabled Web allow for a new business model for telephony applications known as the Voice Service Provider. By separating application logic (running on a standard Web server) from the voice dialogs (running on a telephony server), it can function as an open-architecture solution for building next-generation interactive voice response telephone services. This approach permits developers to build phone services without having to buy or run equipment Just as an HTML developer doesn't need to know how bits paint the screen of a web user's PC, VoiceXML shields developers from many of the complexities of telephony platforms.

While originally designed for building telephone services, other applications of VoiceXML, such as speech-controlled home appliances, are starting to be developed. VoiceXML has features to control audio output; audio input; presentation logic and control flow; event handling; and basic telephony connections.

A VoiceXML application consists of several components:

Application Server: Typically a Web server, which runs the application logic, and may contain a database or interfaces to an external database or transaction server.
VoiceXML Telephony Server: A platform that runs a VoiceXML interpreter that acts as a client to the application server. The interpreter understands VoiceXML dialogs and controls speech and telephony resources. These resource include ASR, TTS, audio play and record functions, as well as a telephone network interface.  
Internet-style network: A TCP/IP-based packet network that connects the application server and telephony server via HTTP.

Telephone Network: Typically the Public Switched Telephone Network (PSTN), but could be a private telephone network (e.g. PBX), or VoIP packet network.

Caller: Any telephone that can connect to the telephone network.

VoiceXML is a powerful, yet simple language for building voice dialogs. It leverages web architecture, tools, and technology to enable innovative new telephone applications. Thanks to the standardization efforts of the VoiceXML Forum and the W3C, it is gaining widespread adoption--especially by the 350-plus members of the VoiceXML Forum. New language features in the recently published draft of VoiceXML 2.0, and new call control features currently under development, promise an even richer voice-enabled Web.

There are several Issues that must be resolved. Development tools and runtime software on the VoiceXML page server must use the same meta language. The meta language is usually unique to a given tool vendor Therefore, runtime software on the VoiceXML page server will only work with development tools from the same vendor. One unfortunate consequence of this limitationis that applications written with one toolset will not necessarily run on a page server built by a different vendor. This incompatibility is a potential roadblock to the primary advantage for which VoiceXML was created - application portability between platforms. If the target application code is written in VoiceXML, then the systems may be compatible, but some have observed that many applications are represented in a vendor proprietary form (ASP, scripts, etc.) and then converted to VoiceXML at runtime.

In an effort to solve this incompatibility, the VoiceXML Tools Committee is studying ways to standardize the meta language. Vendors would then use the standard meta language to represent parameters of the call flow, even if vendor tools otherwise provide different features. There are presently two proposals under consideration 1) the XForms standard under development by the W3C and 2) an XML-based standard where styles sheets convert between formats used by different vendors. This rather ambitious goal will, if successful, improve the interoperability of development and runtime tools and make applications portable across vendors.

Stay tuned... VoiceXML is coming. And when it gets fully implemented, you will understand why cell phones have the little grapics screen embeded just above the keypad.

Updated on 1 January 2003