WO2000005708A1 - Voice browser for interactive services and methods thereof - Google Patents

Voice browser for interactive services and methods thereof Download PDF

Info

Publication number
WO2000005708A1
WO2000005708A1 PCT/US1999/016776 US9916776W WO0005708A1 WO 2000005708 A1 WO2000005708 A1 WO 2000005708A1 US 9916776 W US9916776 W US 9916776W WO 0005708 A1 WO0005708 A1 WO 0005708A1
Authority
WO
WIPO (PCT)
Prior art keywords
markup language
user
input
value
attribute
Prior art date
Application number
PCT/US1999/016776
Other languages
French (fr)
Inventor
David Ladd
Gregory Johnson
Original Assignee
Motorola Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=27377630&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2000005708(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Motorola Inc. filed Critical Motorola Inc.
Priority to EP99937440A priority Critical patent/EP1099213A4/en
Priority to AU52278/99A priority patent/AU5227899A/en
Publication of WO2000005708A1 publication Critical patent/WO2000005708A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42204Arrangements at the exchange for service or number selection by voice
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/436Arrangements for screening incoming calls, i.e. evaluating the characteristics of a call before deciding whether to answer it
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/60Medium conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2207/00Type of exchange or network, i.e. telephonic medium, in which the telephonic communication takes place
    • H04M2207/20Type of exchange or network, i.e. telephonic medium, in which the telephonic communication takes place hybrid systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/38Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
    • H04M3/382Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42229Personal communication services, i.e. services related to one subscriber independent of his terminal and/or location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/44Additional connecting arrangements for providing access to frequently-wanted subscribers, e.g. abbreviated dialling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/46Arrangements for calling a number of substations in a predetermined sequence until an answer is obtained
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/53Centralised arrangements for recording incoming messages, i.e. mailbox systems
    • H04M3/5307Centralised arrangements for recording incoming messages, i.e. mailbox systems for recording messages comprising any combination of audio and non-audio components

Definitions

  • the present invention generally relates to information retrieval, and more particularity, to methods and systems to allow a user to access information from an information source.
  • On-line electronic information services are being increasingly utilized by individuals having personal computers to retrieve various types of information.
  • a user having a personal computer equipped with a modem dials into a service provider such as an Internet gateway, an on-line service (such an America On-line, CompuServer, or Prodigy), or an electronic bulletin board to download data representative of the information desired by the user.
  • the information from the service provider is typically downloaded in real-time (i.e., the information is downloaded contemporaneously with a request for the information). Examples of information downloaded in this manner include electronic versions of newspapers, books (i.e., an encyclopedia), articles, financial information, etc.
  • the information can include both text and graphical in any of these examples .
  • FIG. 1 is a block diagram of an embodiment of a system in accordance with the present invention
  • FIG. 2 is a flow diagram of a method of retrieving information from an information source
  • FIG. 3 is an exemplary block diagram of another embodiment of a system in accordance with the present invention
  • FIG. 4 is a block diagram of a voice browser of the system of FIG. 3;
  • FIGS. 5a-5c are flow diagrams of a routine carried out by the voice browser of FIG. 4;
  • FIG. 6 is an exemplary markup language document
  • FIG. 7 is a diagrammatic illustration of a hierarchical structure of the markup language document of FIG. 6;
  • FIG. 8 is an exemplary state diagram of a markup language document
  • FIG. 9 is another an exemplary state diagram of an exemplary application of a markup language document.
  • the system 100 generally includes one or more network access apparatus 102 (one being shown), an electronic network 104, and one or more information sources or content providers 106 (one being shown).
  • the electronic network 104 is connected to the network access apparatus 102 via a line 108, and the electronic network 102 is connected to the information source 106 via a line 110.
  • the lines 108 and 110 can include, but are not limited to, a telephone line or link, an ISDN line, a coaxail line, a cable television line, a fiber optic line, a computer network line, a digital subscriber line, or the like.
  • the network access apparatus 102 and the information source 106 can wirelessly communicate with the electronic network.
  • the electronic network 104 can provide information to the network access apparatus 102 by a satellite communication system, a wireline communication system, or a wireless communication system.
  • the system 100 enables users to access information from any location in the world via any suitable network access device.
  • the users can include, but are not limited to, cellular subscribers, wireline subscribers, paging subscribers, satellite subscribers, mobile or portable phone subscribers, trunked radio subscribers, computer network subscribers (i.e., internet subscribers, intranet subscribers, etc.), branch office users, and the like.
  • the users can preferably access information from the information source 106 using voice inputs or commands.
  • the users can access up-to-date information, such as, news updates, designated city weather, traffic conditions, stock quotes, calendar information, user information, address information, and stock market indicators.
  • the system also allows the users to perform various transactions (i.e., order flowers, place orders from restaurants, place buy and sell stock orders, obtain bank account balances, obtain telephone numbers, receive directions to various destinations, etc.).
  • a user utilizes the network access apparatus 102 of the system 100 to communicate and/or connect with the electronic network 104.
  • the electronic network 104 retrieves information from the information source 106 based upon speech commands or DTMF tones from the user.
  • the information is preferably stored in a database or storage device (not shown) of the information source 106.
  • the information source 106 can include one or more server computers (not shown).
  • the information source can be integrated into the electronic network 104 or can be remote from the electronic network (i.e., at a content providers facilities). It will also be recognized that the network access apparatus 102, the electronic network 104, and the information source 106 can be integrated in a single system or device.
  • the information of the information source 106 can be accessed over any suitable communication medium.
  • the information source 106 can be identified by an electronic address using at least a portion of a URL (Uniform Resource Locator), a URN (Uniform Resource Name), an IP (Internet Protocol) address, an electronic mail address, a device address (i.e. a pager number), a direct point to point connection, a memory address, etc.
  • a URL can include: a protocol, a domain name, a path, and a filename.
  • URL protocols include: "file:” for accessing a file stored on a local storage medium; “ftp:” for accessing a file from an FTP (file transfer protocol) server; "http:” for accessing an HTML (hypertext marking language) document; “gopher:” for accessing a Gopher server; “mailto:” for sending an e- mail message; “news:” for linking to a Usenet newsgroup; “telnet:” for opening a telnet session; and “wais:” for accessing a WAIS server.
  • the electronic network 104 can include an open, wide area network such as the Internet, the World Wide Web (WWW), and/or an on-line service.
  • the electronic network 104 can also include, but is not limited to, an intranet, an extranet, a local area network, a telephone network, (i.e., a public switched telephone network), a cellular telephone network, a personal communication system (PCS) network, a television network (i.e., a cable television system), a paging network (i.e., a local paging network), a regional paging network, a national or a global paging network, an email system, a wireless data network (i.e., a satellite data network or a local wireless data network), and/or a telecommunication node.
  • a telephone network i.e., a public switched telephone network
  • PCS personal communication system
  • television network i.e., a cable television system
  • paging network i.e., a local
  • the network access apparatus 102 of the system 100 allows the user to access (i.e., view and/or hear) the information retrieved from the information source.
  • the network access apparatus can provided the information to the user as machine readable data, human readable data, audio or speech communications, textual information, graphical or image data, etc.
  • the network access apparatus can have a variety of forms, including but not limited to, a telephone, a mobile phone, an office phone, a home phone, a pay phone, a paging unit, a radio unit, a web phone, a personal information manager (PIM), a personal digital assistant (PDA), a general purpose computer, a network television, an Internet television, an Internet telephone, a portable wireless device, a workstation, or any other suitable communication device.
  • PIM personal information manager
  • PDA personal digital assistant
  • the network access device can be integrated with the electronic network.
  • the network access device, the electronic network, and/or the information source can reside in a personal computer .
  • the network access apparatus 102 may also include a voice or web browser, such as, a Netscape Navigator® web browser, a Microsoft Internet Explorer® web browser, a Mosaic® web browser, etc. It is also contemplated that the network access apparatus 102 can include an optical scanner or bar code reader to read machine readable data, magnetic data, optical data, or the like, and transmit the data to the electronic network 104. For example, the network access apparatus could read or scan a bar code and then provide the scanned data to the electronic network 104 to access the information from the information source (i.e., a menu of a restaurant, banking information, a web page, weather information, etc. ) .
  • the information source i.e., a menu of a restaurant, banking information, a web page, weather information, etc.
  • FIG. 2 illustrates a flow diagram of a method of retrieving information from a destination or database of the information source 106.
  • a user calls into the electronic network 104 from a network access apparatus .
  • the electronic network can attempt to verify that the user is a subscriber of the system and/or the type of network access apparatus the user is calling from. For example, the system may read and decode the automatic number identification (ANI) or caller line identification (CLI) of the call and then determine whether the CLI of the call is found in a stored ANI or CLI list of subscribers.
  • the system may also identify the user by detecting a unique speech pattern from the user (i.e., speaker verification) or a PIN entered using voice commands or DTMF tones.
  • ANI automatic number identification
  • CLI caller line identification
  • the electronic network After the electronic network answers the call, the electronic network provides a prompt or announcement to the caller at block 154 (i.e., "Hi. This is your personal agent. How may I help you”).
  • the electronic network can also set grammars (i.e., vocabulary) and personalities (i.e., male or female voices) for the call.
  • the electronic network can load the grammars and personalities based upon the CLI, the network access apparatus, or the identity of the user.
  • the grammars and personalities can be set or loaded depending upon the type of device (i.e., a wireless phone), the gender of the caller (i.e., male or female), the type of language (i.e., English, Spanish, etc.), and the accent of the caller (i.e., a New York accent, a southern accent, an English accent, etc.). It is also contemplated that the personalities and grammars may be changed by the user or changed by the electronic network based upon the speech communications detected by the electronic network.
  • the type of device i.e., a wireless phone
  • the gender of the caller i.e., male or female
  • the type of language i.e., English, Spanish, etc.
  • the accent of the caller i.e., a New York accent, a southern accent, an English accent, etc.
  • the personalities and grammars may be changed by the user or changed by the electronic network based upon the speech communications detected by the electronic network.
  • the electronic network waits for an input or command from the user that corresponds to a destination of the information source desired by the user.
  • the input can be audio commands (i.e., speech) or DTMF tones.
  • the electronic network establishes a connection or a link to the information source at block 158.
  • the electronic network preferably determines an electronic address of the information source (i.e., URL, a URN, an IP address, or an electronic mail address) based upon the inputs from the user (i.e., speech or DTMF tones).
  • the electronic address can be retrieved from a database using a look-up operation based upon at least a portion of the input.
  • the electronic network retrieves at least a portion of the information from the destination of the information source at block 160.
  • the electronic network processes the information and then provides an output to the user based upon the retrieved information at block 162.
  • the output can include a speech communication, textual information, and/or graphical information.
  • the electronic network can provide a speech communication using speech-to-text technology or human recorded speech.
  • the process then proceeds to block 164 or block 154 as described above. It will be recognized that the above described method can be carried out by a computer.
  • the system 200 enables a user to access information from any location in the world via a suitable communication device.
  • the system 200 can provide access to yellow pages, directions, traffic, addresses, movies, concerts, airline information, weather information, new reports, financial information, flowers, personal data, calendar data, address data, gifts, books, etc.
  • the user can also perform a series of transactions without having to terminate the original call to the system. For example, the user can access a news update and obtain weather information, all without having to dial additional numbers or terminate the original call.
  • the system 200 also enables application developers to build applications for interactive speech applications using a markup language, such as VoxMLTM voice markup language developed by Motorola, Inc.
  • the system 200 generally includes one or more communication devices or network access apparatus 201, 202, 203 and 204 (four being shown), an electronic network 206, and one or more information sources, such as content providers 208 and 209 (two being shown) and markup language servers.
  • the user can retrieve the information from the information sources using speech commands or DTMF tones .
  • the user can access the electronic network 206 by dialing a single direct access telephone number (i.e., a foreign exchange number, a local number, or a toll-free number or PBX) from the communication device 202.
  • a single direct access telephone number i.e., a foreign exchange number, a local number, or a toll-free number or PBX
  • the user can also access the electronic network 206 from the communication device 204 via the internet, from the communication device 203 via a paging network 211, and from the communication device 201 via a local area network (LAN), a wide area network (WAN), or an email connection.
  • LAN local area network
  • WAN wide area network
  • the communication devices can include, but are not limited to, landline or wireline devices (i.e., home phones, work phones, computers, facsimile machines, pay phones), wireless devices (i.e., mobile phones, trunked radios, handheld devices, PIMs, PDAs, etc.), network access devices (i.e. computers), pagers, etc.
  • the communication devices can include a microphone, a speaker, and/or a display.
  • the electronic network 206 of the system 200 includes a telecommunication network 210 and a communication node 212.
  • the telecommunication network 210 is preferably connected to the communication node 212 via a high-speed data link, such as, a TI telephone line, a local area network (LAN), or a wide area network (WAN) .
  • the telecommunication network 210 preferably includes a public switched network (PSTN) 214 and a carrier network 216.
  • PSTN public switched network
  • the telecommunication network 210 can also include international or local exchange networks, cable television network, interexchange carrier networks (IXC) or long distance carrier networks, cellular networks (i.e., mobile switching centers (MSC)), PBXs, satellite systems, and other switching centers such as conventional or trunked radio systems (not shown), etc.
  • the PSTN 214 of the telecommunication network 210 can include various types of communication equipment or apparatus, such as ATM networks, Fiber Distributed data networks (FDDI), TI lines, cable television networks and the like.
  • the carrier network 216 of the telecommunication network 210 generally includes a telephone switching system or central office 218. It will be recognized that the carrier network 216 can be any suitable system that can route calls to the communication node 212, and the telephone switching system 218 can be any suitable wireline or wireless switching system.
  • the communication node 212 the system 200 is preferably configured to receive and process incoming calls from the carrier network 216 and the internet 220, such as the WWW.
  • the communication node can receive and process pages from the paging network 211 and can also receive and process messages (i.e., emails) from the LAN, WAN or email connection 213.
  • the carrier network 216 routes the incoming call from the PSTN 214 to the communication node 212 over one or more telephone lines or trunks.
  • the incoming calls preferably enters the carrier network 216 through one or more "888" or "800" INWATS trunk lines, local exchange trunk lines, or long distance trunk lines.
  • the incoming calls can be received from a cable network, a cellular system, or any other suitable system.
  • the communication node 212 answers the incoming call from the carrier network 216 and retrieves an appropriate announcement (i.e., a welcome greeting) from a database, server, or browser. The node 212 then plays the announcement to the caller.
  • the communication node 212 retrieves information from a destination or database of one or more of the information sources, such as the content providers 208 and 209 or the markup language servers. After the communication node 212 receives the information, the communication node provides a response to the user based upon the retrieved information.
  • the node 212 can provide various dialog voice personalities (i.e., a female voice, a male voice, etc.) and can implement various grammars (i.e., vocabulary) to detect and respond to the audio inputs from the user.
  • the communication node can automatically select various speech recognition models (i.e., an English model, a Spanish model, an English accent model, etc.) based upon a user profile, the user's communication device, and/or the user's speech patterns.
  • the communication node 212 can also allow the user to select a particular speech recognition model.
  • the communication node 212 can by-pass a user screening option and automatically identify the user (or the type of the user's communication device) through the use of automatic number identification (ANI) or caller line identification (CLI).
  • ANI automatic number identification
  • CLI caller line identification
  • the node After the communication node verifies the call, the node provides a greeting to the user (i.e., "Hi, this is your personal agent, Maya. Welcome Bob. How may I help you?"). The communication node then enters into a dialogue with the user, and the user can select a variety of information offered by the communication node.
  • the node When the user accesses the electronic network 206 from a communication device not registered with the system (i.e., a payphone, a phone of a non-subscriber, etc.), the node answers the call and prompts the user to enter his or her name and/or a personal identification number (PIN) using speech commands or DTMF tones.
  • PIN personal identification number
  • the node can also utilize speaker verification to identify a particular speech pattern of the user. If the node authorizes the user to access the system, the node provides a personal greeting to the user (i.e., "Hi, this is your personal agent, Maya. Welcome Ann. How may I help you?"). The node then enters into a dialogue with the user, and the user can select various information offered by the node. If the name and/or PIN of the user cannot be recognized or verified by the node, the user will be routed to a customer service representative.
  • the communication node 212 preferably includes a telephone switch 230, a voice or audio recognition (VRU) client 232, a voice recognition (VRU) server 234, a controller or call control unit 236, an Operation and Maintenance Office (OAM) or a billing server unit 238, a local area network (LAN) 240, an application server unit 242, a database server unit 244, a gateway server or router firewall server 246, a voice over internet protocol (VOIP) unit 248, a voice browser 250, a markup language server 251, and a paging server 252.
  • VRU voice or audio recognition
  • VRU voice recognition
  • OAM Operation and Maintenance Office
  • the communication node 206 is shown as being constructed with various types of independent and separate units or devices, the communication node 212 can be implemented by one or more integrated circuits, microprocessors, microcontrollers, or computers which may be programmed to execute the operations or functions equivalent to those performed by the device or units shown. It will also be recognized that the communication node 212 can be carried out in the form of hardware components and circuit designs, software or computer programming, or a combination thereof.
  • the communication node 212 can be located in various geographic locations throughout the world or the United States (i.e., Chicago, Illinois).
  • the communication node 212 can be operated by one or more carriers (i.e., Sprint PCS, Qwest Communications, MCI, etc.) or independent service providers, such as, for example, Motorola, Inc.
  • the communication node 212 can be co-located or integrated with the carrier network 216 (i.e., an integral part of the network) or can be located at a remote site from the carrier network 216. It is also contemplated that the communication node 212 may be integrated into a communication device, such as, a wireline or wireless phone, a radio device, a personal computer, a PDA, a PIM, etc.
  • the communication device can be programmed to connect or link directly into an information source.
  • the communication node 212 can also be configured as a standalone system to allow users to dial directly into the communication node via a toll free number or a direct access number.
  • the communication node 212 may comprise a telephony switch (i.e., a PBX or Centrix unit), an enterprise network, or a local area network.
  • the system 200 can be implemented to automatically connect a user to the communication node 212 when the user picks a communication device, such as, the phone.
  • the call control unit 236 sets up a connection in the switch 230 to the VRU client 232.
  • the communication node 212 then enters into a dialog with the user regarding various services and functions.
  • the VRU client 232 preferably generates pre-recorded voice announcements and/or messages to prompt the user to provide inputs to the communication node using speech commands or DTMF tones .
  • the node 212 retrieves information from a destination of one of the information sources and provides outputs to the user based upon the information.
  • the telephone switch 230 of the telecommunication node 212 is preferably connected to the VRU client 232, the VOIP unit 248, and the LAN 240.
  • the telephone switch 230 receives incoming calls from the carrier switch 216.
  • the telephone switch 230 also receives incoming calls from the communication device 204 routed over the internet 220 via the VOIP unit 248.
  • the switch 230 also receives messages and pages from the communication devices 201 and 203, respectively.
  • the telephone switch 230 is preferably a digital cross- connect switch, Model No. LNX, available from Excel Switching Corporation, 255 Independence Drive, Hyannis, MA 02601. It will be recognized that the telephone switch 230 can be any suitable telephone switch.
  • the VRU client 232 of the communication node 212 is preferably connected to the VRU server 234 and the LAN 240.
  • the VRU client 232 processes speech communications, DTMF tones, pages, and messages (i.e., emails) from the user.
  • the VRU client 232 routes the speech communications to the VRU server 234.
  • the VRU client 232 detects DTMF tones, the VRU client 232 sends a command to the call control unit 236.
  • the VRU client 232 can be integrated with the VRU server.
  • the VRU client 232 preferably comprises a computer, such as, a Windows NT compatible computer with hardware capable of connecting individual telephone lines directly to the switch 230.
  • the VRU client preferably includes a microprocessor, random access memory, readonly memory, a TI or ISDN interface board, and one or more voice communication processing board (not shown).
  • the voice communication processing boards of the VRU client 232 are preferably Dialogic boards, Model No. Antares, available from Dialogic Corporation, 1515 Route 10, Parsippany, N.J. 07054.
  • the voice communication boards may include a voice recognition engine having a vocabulary for detecting a speech pattern (i.e., a key word or phrase).
  • the voice recognition engine is preferably a RecServer software package, available from Nuance Communications, 1380 Willow Road, Menlo Park, California 94025.
  • the VRU client 232 can also include an echo canceler (not shown) to reduce or cancel text-to-speech or playback echoes transmitted from the PSTN 214 due to hybrid impedance mismatches .
  • the echo canceler is preferably included in an Antares Board Support Package, available from Dialogic.
  • the call control unit 236 of the communication node 212 is preferably connected to the LAN 240.
  • the call control unit 236 sets up the telephone switch 230 to connect incoming calls to the VRU client 232.
  • the call control unit also sets up incoming calls or pages into the node 212 over the internet 220 and pages and messages sent from the communication devices 201 and 203 via the paging network 203 and email system 213.
  • the control call unit 236 preferably comprises a computer, such as, a Window NT compatible computer.
  • the LAN 240 of the communication node 212 allows the various components and devices of the node 212 to communicate with each other via a twisted pair, a fiber optic cable, a coaxial cable, or the like.
  • the LAN 240 may use Ethernet, Token Ring, or other suitable types of protocols.
  • the LAN 240 is preferably a 100 Megabit per second Ethernet switch, available from Cisco Systems, San Jose, California. It will be recognized that the LAN 240 can comprise any suitable network system, and the communication node 212 may include a plurality of LANs.
  • the VRU server 234 of the communication node 212 is connected to the VRU client 232 and the LAN 240.
  • the VRU server 234 receives speech communications from the user via the VRU client 232.
  • the VRU server 234 processes the speech communications and compares the speech communications against a vocabulary or grammar stored in the database server unit 244 or a memory device.
  • the VRU server 234 provides output signals, representing the result of the speech processing, to the LAN 240.
  • the LAN 240 routes the output signal to the call control unit 236, the application server 242, and/or the voice browser 250.
  • the communication node 212 then performs a specific function associated with the output signals.
  • the VRU server 234 preferably includes a text-to- speech (TTS) unit 252, an automatic speech recognition (ASR) unit 254, and a speech-to-text (STT) unit 256.
  • the TTS unit 252 of the VRU server 234 receives textual data or information (i.e., e-mail, web pages, documents, files, etc.) from the application server unit 242, the database server unit 244, the call control unit 236, the gateway server 246, the application server 242, and the voice browser 250.
  • the TTS unit 252 processes the textual data and converts the data to voice data or information.
  • the TTS unit 252 can provide data to the VRU client 232 which reads or plays the data to the user. For example, when the user requests information (i.e., news updates, stock information, traffic conditions, etc.), the communication node 212 retrieves the desired data (i.e., textual information) from a destination of the one or more of the information sources and converts the data via the TTS unit 252 into a response.
  • information i.e., news updates, stock information, traffic conditions, etc.
  • the communication node 212 retrieves the desired data (i.e., textual information) from a destination of the one or more of the information sources and converts the data via the TTS unit 252 into a response.
  • the response is then sent to the VRU client 232.
  • the VRU client processes the response and reads an audio message to the user based upon the response. It is contemplated that the VRU server 234 can read the audio message to the user using human recorded speech or synthesized speech.
  • the TTS unit 252 is preferably a TTS 2000 software package, available from Lernout and Hauspie Speech Product NV, 52 Third Avenue, Burlington, Mass. 01803.
  • the ASR unit 254 of the VRU server 234 provides speaker independent automatic speech recognition of speech inputs or communications from the user. It is contemplated that the ASR unit 254 can include speaker dependent speech recognition.
  • the ASR unit 254 processes the speech inputs from the user to determine whether a word or a speech pattern matches any of the grammars or vocabulary stored in the database server unit 244 or downloaded from the voice browser. When the ASR unit 254 identifies a selected speech pattern of the speech inputs, the ASR unit 254 sends an output signal to implement the specific function associated with the recognized voice pattern.
  • the ASR unit 254 is preferably a speaker independent speech recognition software package, Model No. RecServer, available from Nuance Communications . It is contemplated that the ASR unit 254 can be any suitable speech recognition unit to detect voice communications from a user.
  • the STT unit 256 of the VRU server 234 receives speech inputs or communications from the user and converts the speech inputs to textual information (i.e., a text message).
  • the textual information can be sent or routed to the communication devices 201, 202, 203 and 204, the content providers 208 and 209, the markup language servers, the voice browser, and the application server 242.
  • the STT unit 256 is preferably a Naturally Speaking software package, available from Dragon Systems, 320 Nevada Street, Newton, MA 02160-9803.
  • the VOIP unit 248 of the telecommunication node 212 is preferably connected to the telephone switch 230 and the LAN 240.
  • the VOIP unit 248 allows a user to access the node 212 via the internet 220 using voice commands.
  • the VOIP unit 240 can receive VOIP protocols (i.e., H.323 protocols) transmitted over the internet 220 and can convert the VOIP protocols to speech information or data. The speech information can then be read to the user via the VRU client 232.
  • the VOIP unit 248 can also receive speech inputs or communications from the user and convert the speech inputs to a VOIP protocol that can be transmitted over the internet 220.
  • the VOIP unit 248 is preferably a Voice Net software package, available from Dialogic Corporation.
  • the telecommunication node 212 also includes a detection unit 260.
  • the detection unit 260 is preferably a phrase or key word spotter unit to detect incoming audio inputs or communications or DTMF tones from the user.
  • the detector unit 260 is preferably incorporated into the switch 230, but can be incorporated into the VRU client 232, the carrier switch 216, or the VRU server 256.
  • the detection unit 260 is preferably included in a RecServer software package, available from Nuance Communications.
  • the detection unit 260 records the audio inputs from the user and compares the audio inputs to the vocabulary or grammar stored in the database server unit 244.
  • the detector unit continuously monitors the user's audio inputs for a key phase or word after the user is connected to the node 212.
  • the VRU client 232 plays a pre-recorded message to the user.
  • the VRU client 232 responds to the audio inputs provided by the user.
  • the billing server unit 238 of the communication node 212 is preferably connected to the LAN 240.
  • the billing server unit 238 can record ' data about the use of the communication node by a user (i.e., length of calls, features accessed by the user, etc.).
  • the call control unit 236 Upon completion of a call by a user, the call control unit 236 sends data to the billing server unit 238. The data can be subsequently processed by the billing server unit in order to prepare customer bills.
  • the billing server unit 238 can use the ANI or CLI of the communication device to properly bill the user.
  • the billing server unit 238 preferably comprises a Windows NT compatible computer.
  • the gateway server unit 246 of the communication node 212 is preferably connected to the LAN 240 and the internet 220.
  • the gateway server unit 246 provides access to the content provider 208 and the markup language server 257 via the internet 220.
  • the gateway unit 246 also allows users to access the communication node 212 from the communication device 204 via the internet 220.
  • the gateway unit 246 can further . function as a firewall to control access to the communication node 212 to authorized users.
  • the gateway unit 246 is preferably a Cisco Router, available from Cisco Systems.
  • the database server unit 244 of the communication node 212 is preferably connected to the LAN 240.
  • the database server unit 244 preferably includes a plurality of storage areas to store data relating to users, speech vocabularies, dialogs, personalities, user entered data, and other information.
  • the database server unit 244 stores a personal file or address book.
  • the personal address book can contain information required for the operation of the system, including user reference numbers, personal access codes, personal account information, contact's addresses, and phone numbers, etc.
  • the database server unit 244 is preferably a computer,
  • the application server 242 of the communication node 212 is preferably connected to the LAN 240 and the content provider 209.
  • the application server 242 allows the communication node 212 to access information from a destination of the information sources, such as the content providers and markup language servers .
  • the application server can retrieve information (i.e., weather reports, stock information, traffic reports, restaurants, flower shops, banks, etc.) from a destination of the information sources.
  • the application server 242 processes the retrieved information and provides the information to the VRU server 234 and the voice browser 250.
  • the VRU server 234 can provide an audio announcement to the user based upon the information using text-to-speech synthesizing or human recorded voice.
  • the application server 242 can also send tasks or requests (i.e., transactional information) received from the user to the information sources (i.e., a request to place an order for a pizza).
  • the application server 242 can further receive user inputs from the VRU server 234 based upon a speech recognition output.
  • the application server is preferably a computer, such as an NT Windows compatible computer.
  • the markup language server 251 of the communication node 212 is preferably connected to the LAN 240.
  • the markup language server 251 can include a database, scripts, and markup language documents or pages.
  • the markup language server 251 is preferably a computer, such as an NT Window Compatible Computer. It will also be recognized that the markup language server 251 can be an internet server (i.e., a Sun Microsystems server).
  • the paging server 252 of the communication node 212 is preferably connected to the LAN 240 and the paging network 211.
  • the paging server 252 routes pages between the LAN 240 and the paging network.
  • the paging server 252 is preferably a computer, such as a NT compatible computer .
  • the voice browser 250 of the system 200 is preferably connected to the LAN 240.
  • the voice browser 250 preferably receives information from the information sources, such as the content provider 209 via the application server 242, the markup language servers 251 and 257, the database 244, and the content provider 208.
  • the voice browser 250 In response to voice inputs from the user or DTMF tones, the voice browser 250 generates a content request (i.e., an electronic address) to navigate to a destination of one or more of the information sources .
  • the content request can use at least a portion of a URL, a URN, an IP, a page request, or an electronic email.
  • the voice browser preferably uses a TCP/IP connect to pass requests to the information source.
  • the information source responds to the requests, sending at least a portion of the requested information, represented in electronic form, to the voice browser.
  • the information can be stored in a database of the information source and can include text content, markup language document or pages, non-text content, dialogs, audio sample data, recognition grammars, etc.
  • the voice browser then parses and interprets the information as further described below. It will be recognized that the voice browser can be integrated into the communication devices 201, 202, 203, and 204.
  • the content provider 209 is connected to the application server 244 of the communication node 212, and the content provider 208 is connected to the gateway server 246 of the communication node 212 via the internet 220.
  • the content providers can store various content information, such as news, weather, traffic conditions, etc.
  • the content providers 208 and 209 can include a server to operate web pages or documents in the form of a markup language.
  • the content providers 208 and 209 can also include a database, scripts, and/or markup language documents or pages.
  • the scripts can include images, audio, grammars, computer programs , etc .
  • the content providers execute suitable server software to send requested information to the voice browser.
  • the voice browser 250 generally includes a network fetcher unit 300, a parser unit 302, an interpreter unit 304, and a state machine unit.306. Although the voice browser is shown as being constructed with various types of independent and separate units or devices, it will be recognized that the voice browser 250 can be carried out in the form of hardware components and circuit designs, software or computer programming, or a combination thereof.
  • the network fetcher 300 of the voice browser 250 is connected to the parser 302 and the interpreter 304.
  • the network fetcher 300 is also connected to the LAN 240 of the communication node 212.
  • the network fetcher unit 304 retrieves information, including markup language documents, audio samples and grammars from the information sources.
  • the parser unit 302 of the voice browser 250 is connected to the network fetcher unit 300 and the state machine unit 306.
  • the parser unit 302 receives the information from the network fetcher unit 300 and parses the information according to the syntax rules of the markup language as further described below (i.e., extensible markup language syntax).
  • the parser unit 302 generates a tree or heirarchial structure representing the markup language that is stored in memory of the state machine unit 306.
  • a tree structure of an exemplary markup language document is shown in FIG. 7.
  • the following text defines the syntax and grammar that the parser unit of the voice browser utilizes to build a tree structure of the markup language document.
  • the interpreter unit 304 of the voice browser 250 is connected to the state machine unit 306 and the network fetcher unit 300.
  • the interpreter unit 304 is also connected to the LAN.
  • the interpreter unit 304 carries out a dialog with the user based upon the tree structure representing a markup language document.
  • the interpreter unit sends data to the TTS 252.
  • the interpreter unit 304 can also receive data based upon inputs from the user via a VRU server and can send outputs to the information source based upon the user inputs .
  • the interpreter unit 304 can transition from state to state (i.e., step to step) within a tree structure (i.e., a dialog) of a markup language document or can transition to a new tree structure within the same dialog or another dialog.
  • state to state i.e., step to step
  • the interpreter unit determines the next state or step based upon the structure of the dialog and the inputs from the user.
  • the interpreter unit transitions to a new dialog or page, the address of the new dialog or page is then sent to the network fetcher.
  • the state machine 306 of the voice browser 250 is connected to the parser unit 302 and the interpreter unit 304.
  • the state machine 306 stores the tree structure of the markup language and maintains the current state or step that the voice browser is executing.
  • FIGS. 5a-5c illustrate a flow diagram of a software routine executed by the voice browser 250 .
  • the software routine allows interactive voice applications.
  • the voice browser 250 determines an initial address (i.e., a URL) and a step element or name.
  • the voice browser then fetches the contents (i.e., a markup or language document) of the current address from the information sources (i.e., content providers and markup language servers) at block 402.
  • the voice browser fetches the address
  • the voice browser processes the contents and builds a local step table (i.e., a tree structure) at block 404.
  • a local step table i.e., a tree structure
  • a prompt can be played to the user via the TTS unit of the system 200 for the current element.
  • the voice browser then waits for an input from the user (i.e., speech or DTMF tones).
  • the voice browser can collect input from the user for the current step element.
  • FIG. 5c shows an exemplary flow diagram of a routine that is executed by the voice browser to determine the grammar for speech recognition.
  • the voice browser determines whether a pre-determined grammar exists for the user input and the markup language. For example, the voice browser determines whether the grammar for the user input is found in a predetermined or pre-existing grammar stored in a database or contained in the markup language. If the grammar is found, the voice browser sends the grammar to the VRU server at block 504.
  • the VRU server compares the user input to the grammar to recognize the user input. After the VRU server recognizes the user input, the process proceeds to block 410 (see FIG. 5a) as described below. If a pre-existing grammar is not found at block 410 (see FIG. 5a) as described below. If a pre-existing grammar is not found at block 410 (see FIG. 5a) as described below. If a pre-existing grammar is not found at block
  • the voice browser dynamically generates the grammar for the user input.
  • the voice browser looks up the pronunciations for the user in a dictionary at block 508.
  • the dictionary can be stored in a database of the system or stored on an external database (i.e., the voice browser can fetch a dictionary from the processor or from the internet).
  • the voice browser generates the grammar for the user inputs based upon the pronunciations from the dictionary and phonetic rules.
  • a software routine available from Nuance Communication, Model No. RecServer, can be used to generate the grammar.
  • the grammar is sent to the VRU server. The voice browser then attempts to match the grammar to the user input at block 506.
  • the voice browser After the voice browser detects or collects an input from the user at block 408, the voice browser determines whether there is an error at block 410. If the voice browser is having difficulty recognizing -29-
  • an appropriate error message is played to the user at block 414. For example, if the voice browser detected too much speech from the user or the recognition is too slow, a prompt is played (i.e., "Sorry, I didn't understand you") to the user via the VRU server. If the voice browser receives unexpected DTMF tones, a prompt is played (i.e., "I heard tones. Please speak your response") to the user via the VRU server. If the voice browser does not detect any speech from the user, a prompt is read to the user (i.e., "I am having difficulty hearing you").
  • the voice browser determines whether a re-prompt was specified in the error response or element. If a re-prompt is to be played to the user at block 416, the process proceeds to block 406 as described above. If a re-prompt is not to be played to the user at block 416, the voice browser determines whether there is a next step element specified in the error response at block 420. If another step element is specified in the error response at block 420, the process proceed to block 402 as described above. If another step element is not specific in the error response at block 420, the process proceeds to block 422.
  • the voice browser determines whether the user requested help at block 412. If the user requested help, an appropriate help response is played to the user (i.e., "please enter or speak your pin") at block 424.
  • the voice browser determines whether a re-prompt was specified in the help response or step. If a re-prompt is specified in the help response at block 425, the process proceeds to block 406 as described above. If a re-prompt is not specified in the help response at block 425, the voice browser determines whether a next step element is specified in the help response at block 426. If another step element is specified in the help response at block 426, the process proceeds to block 402 as described above. If another step element is not specific in the help response at block 426, the process proceeds to block 428. At block 430, the voice browser determines whether a cancel request has been indicated by the user. If the voice browser detects a cancel request from the user at block 430, an appropriate cancel message is played to the user at block 434 (i.e., "Do you wish to exit and return to the Main Menu?").
  • the voice browser determines whether there a next step element is specified in the cancel response or element. If another step element is specified in the cancel response at block 436, the process proceeds to block 448. If another step element is not specified in the error response at block 436, the process proceeds to block 422.
  • the voice browser determines the next step element at block 432.
  • the voice browser determines whether there is an acknowledgement specified in the next step element. If there is no acknowledgement specified in the step element at block 440, the voice browser sets the current step element to the next step element at block 442 and then determines whether the next step element is within the same page at block 444.
  • next step element is within the same page as the current step element at block 444, the process proceeds to block 446. If the next step element is not within the same page as the current page at block 444, the process proceeds to block 448.
  • an acknowledgement response is played to the user at block 450.
  • the voice browser determines whether a confirmation is specified in the information (i.e., a markup language document) at block 452. If a confirmation is not specified in the information at block 452, the process proceeds to block 442 as described above. If a confirmation is specified at block 452, the voice browser determines whether the response was recognized from the user a block 454 and then determines whether the response is affirmative at block 456. If the voice browser receives an affirmative response at block 456, the process proceeds to block 442 as described above. If the voice browser does not receive an affirmative response from the user at block 456, the process proceeds to block 448.
  • the following text describes an exemplary markup language processed by the voice browser of the communication node 212.
  • the markup language preferably includes text, recorded sound samples, navigational controls, and input controls for voice applications as further described below.
  • the markup language enables system designers or developers of service or content providers to create application programs for instructing the voice browser to provide a desired user interactive voice service.
  • the markup language also enables designers to dynamically customize their content. For example, designers can provide up-to-date news, weather, traffic, etc.
  • the markup language can be designed to express flow of control, state management, and the content of information flow between the communication node 212 and the user.
  • the structure of the language can be designed specifically for voice applications and the markup language is preferably designed and delivered in units of dialog.
  • the markup language can include elements that describe the structure of a document or page, provide pronunciation of words and phrases, and place markers in the text to control interactive voice services.
  • the markup language also provides elements that control phrasing, emphasis, pitch, speaking rate, and other characteristics.
  • the markup language documents are preferably stored on databases of the information sources, such as the content providers 208 and 209 and the markup language servers 251 and 257.
  • FIG. 6 illustrates an exemplary markup language document that the voice browser of the communication node can process.
  • the markup language document has a hierarchical structure, in which every element (except the dialog element) is contained by another element. Elements between another elements are defined to be children or a lower element of the tree.
  • FIG. 7 illustrates a tree stucture of the markup language document of FIG. 6.
  • the markup language document includes tags, denoted by ⁇ > symbols, with the actual element between the brackets .
  • the markup language includes start tags (" ⁇ >") and end tags (" ⁇ / >"). A start tag begins a markup element and the end tags ends the corresponding markup element. For example, in the markup language document as shown in FIG.
  • the DIALOG element ( ⁇ dialog>) on line 2 begins a markup language document or page
  • the dialog element ( ⁇ dialog>) on line 26 indicates the markup language document has ended.
  • the elements often have attributes which are assigned values as further described below.
  • the DIALOG element and STEP elements of a markup language document provide the basic structure of the document.
  • the DIALOG element defines the scope of the markup language document, and all other elements are contained by the DIALOG element.
  • the STEP elements define states within a DIALOG element (i.e., the STEP element defines an application state).
  • an application state can include initial prompts, help messages, error messages, or cleanup and exit procedures .
  • the DIALOG element and the associated STEP elements of a markup language document define a state machine that represents an interactive dialogue between the voice browser and the user.
  • the voice browser interprets the markup language document, the voice browser will navigate through the DIALOG element to different STEP elements as a result of the user's responses.
  • the following example illustrates an exemplary markup language document that the voice browser of the communication node can process.
  • the example has one
  • ⁇ PROMPT> Please select a soft drink.
  • ⁇ /PROMPT> ⁇ HELP> Your choices are coke, pepsi, 7 up, or root beer.
  • ⁇ OPTION NEXT #confirm”> coke ⁇ /OPTION>
  • ⁇ OPTION NEXT #confirm”>
  • ⁇ OPTION NEXT #confirm”> 7 up ⁇ /OPTION>
  • the voice browser When the above markup language document is interpreted by the voice browser, the voice browser initially executes the STEP element called "init".
  • the user will hear the text contained by the prompt element (i.e., "Please select a soft drink.”). If the user responds "help” before making a selection, the user would hear the text contained with the HELP element (i.e., "Your choices are coke, pepsi, 7up, or root beer.”). After the user makes a selection, the voice browser will execute the STEP element named
  • the STEP elements in a markup language document are executed based on the user's responses not on the order of the STEP elements within the source file. Although the definition of the "init" STEP element appears before and the definition of the "confirm” STEP element, the order in which they are defined has no impact on the order in which the voice browser navigates through them.
  • the following text describes the markup language elements, their attributes, and their syntax.
  • the DIALOG element includes a BARGEIN attribute.
  • the value of the BARGEIN attribute can be "Y" and "N” .
  • the BARGEIN attribute allows the DIALOG element to be interrupted at any time based upon a predetermined response from the user (i.e., wake up).
  • the DIALOG element defines the basic unit of context within an application, and typically, there is one DIALOG element per address (i.e., URL).
  • Each DIALOG element contains one STEP element named "init”.
  • the execution of the DIALOG element begins with the STEP named "init”.
  • the following example of a markup language document or page contains the DIALOG element.
  • the DIALOG element contains a single STEP element named "init”.
  • the STEP element has a single PROMPT element that will be read to the user via the text-to-speech unit 252. Since there is no INPUT element defined in the STEP element, the markup language application will terminate immediately after the PROMPT element is read.
  • the STEP element of the markup language defines a state in a markup language document or page.
  • the STEP element is contained by a DIALOG element.
  • the STEP element includes a NAME attribute, a PARENT attribute, a BARGEIN attribute, and a COST attribute.
  • the value of the NAME and PARENT attribute can be an identifier (i.e., a pointer or a variable name), the value of the BARGEIN attribute can be "Y" and "N", and the value of the COST attribute can be an integer.
  • the STEP element typically has an associated PROMPT element and INPUT element that define the application state.
  • the following example illustrates the use of the STEP element in a markup language document.
  • the example shown above illustrates a STEP element that collects the user's opinion on one of several public television shows.
  • the STEP element uses the
  • the PARENT attribute to share a common set of help and error elements with other TV-show-rating STEP elements .
  • the PARENT attribute can contain a HELP element explaining what a rating of 1, 5, and 10 would mean, and a common error message can remind the user that a numeric rating is expected.
  • the PROMPT element of the markup language i.e., ⁇ PROMPT> text ⁇ /PROMPT>
  • content i.e., text or an audio file
  • the PROMPT element will contain text and several markup elements (i.e., the BREAK or EMP elements as described below) that are read to the user via the text-to-speech unit.
  • the PROMPT element can be contained within a STEP or a CLASS element.
  • the following example illustrates the use of the PROMPT element in markup language document or page.
  • the INPUT element of the markup language is used to define a valid user input within each STEP element.
  • the INPUT element is contained within a STEP element.
  • the INPUT element of the markup language includes an INPUT attribute.
  • the value of the INPUT attribute can be a DATE input, a DIGIT input, a FORM input, a GRAMMAR input, a HIDDEN input, a MONEY input, a NONE element, a NUMBER input, an OPTIONLIST input, a PHONE input, a PROFILE input, a RECORD input, a TIME input, and a YORN element.
  • the DATE input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be the next STEP address (i.e., a URL).
  • the value of the NEXTMETHOD attribute can be a get and a post (i.e., an input into a Java Script program or a markup language server), and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the following example illustrates the use of the DATE input in a markup language document.
  • the DATE input is used to gather the user's birthday, store it in a variable
  • DATE input makes use of an input grammar to interpret the user's response and store that response in a standard format.
  • the DATE input grammar can interpret dates expressed in several different formats.
  • a fully defined date, such as, "next Friday, July 10 th , 1998” is stored as “07101998 I July 11011998
  • the response “July 4 th “, is stored as "????????
  • the DIGITS input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, a TIMEOUT attribute, a MIN attribute, and a MAX attribute.
  • the value of the NAME attribute can be an identifier
  • the value of the NEXT attribute can be a next step address (i.e., a URL)
  • the value of the NEXTMETHOD attribute can be a get and a post
  • the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the value of the MIN and MAX attributes can be minimum and maximum integer values, respectively.
  • DIGITS input in a markup language document or page.
  • the DIGITS input is used to collect digits from the user, store the number in the a variable named "pin” , and then go to the STEP named "doit". If the user were to speak, "four five six", in response to the PROMPT element, the value "456" would be stored in the variable "pin”.
  • the DIGITS input can collect the digits 0 (zero) through 9 (nine), but not other numbers like 20 (twenty). To collect double-digit numbers (i.e., 20 (twenty) or 400 (four-hundred), the NUMBER input can be used as further described below.
  • the FORM input includes a NAME attribute, a NEXT attribute, a METHOD attribute, an ACTION attribute and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL, pointer or mamory address).
  • the value of the METHOD attribute can be a get or a post, and the value of the ACTION attribute is a pointer to a script that processes the input on the server.
  • the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the FORM input makes use of the speech to text unit to convert user input to text.
  • the user input is then sent to the markup language server in a standard HTML ⁇ FORM> text format to be processed by a script on the server. If the user said "John Smith” then the text string "john smith” would be sent to the server using the pointer and address indicated by the ACTION attribute using the method indicated by the METHOD attribute in a ⁇ FORM> format.
  • the following is an example of the use of the FORM input in a markup language document.
  • the FORM input is used to collect an order input from the user, store the user input converted to text in the variable named "order”, go to the next step named “next order”, post the text to the address "http: //www. test.com/cgi-bin/post-query” , and use a timeout value of 200 milliseconds.
  • the GRAMMAR input includes a SCR attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the SCR attribute can be a grammar address (i.e., a URL), and the value of the NEXT attribute can be a next step address (i.e., a URL).
  • the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • VALUE value "/>) is used to store a value in a variable.
  • the HIDDEN input includes a NAME attribute and a VALUE attribute.
  • the value of the NAME attribute can be an identifier, and the value of the VALUE attribute can be a literal value.
  • HIDDEN input in a markup language document.
  • the HIDDEN input is used to create variables and assign values to those variables.
  • the user has completed the login sequence and certain information is stored in variables as soon as the user's identity has been established. This information could then be used later in the application without requiring another access into the database .
  • ⁇ INPUT TYPE MONEY
  • NAME value
  • NEXT value
  • NEXTMETHOD value " ]
  • TIMEOUT value "
  • the MONEY input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL).
  • the value of the NEXTMEHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the MONEY input makes use of an input grammar to interpret the user's response and store that response in a standard format.
  • the input grammar is able to interpret various ways to express monetary amounts.
  • the data is preferably stored in integer format, in terms of cents. "Five cents” is stored as “5", “five dollars” is stored as “500”, and "a thousand” is stored as “100000”. In the case where the units are ambiguous, the grammar assumes dollars, in which "a thousand” is stored as if the user had said "a thousand dollars”.
  • the MONEY input is used to collect the amount of money that the user would like to deposit in his account, store that amount in a variable named "dep”, and then go to the STEP named "deposit".
  • the NONE input includes a NEXT attribute and a
  • the value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
  • the following example illustrates the use of the NONE input in a markup language.
  • the NONE input is used to jump to another STEP element in this dialog without waiting for any user response.
  • the user would hear the phrase "Welcome to the system" followed immediately by the prompt of the main menu.
  • the NUMBER input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL).
  • the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the NUMBER input is used to collect numbers from the user, store the number in a variable named "age”, and then go to the STEP element named "doit". If the user were to say, "eighteen”, in response to the PROMPT element, the value "18" would be stored in the variable "age”.
  • the NUMBER input will collect numbers like 20 (i.e. twenty), but only one number per input.
  • the DIGITS input can be used as described above.
  • the OPTIONLIST input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step URL.
  • the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the OPTIONLIST input is used in conjunction with the OPTION element, which defines the specific user responses and the behavior associated with each OPTION element.
  • the following example illustrates the use of the OPTIONLIST element in a markup language document.
  • the voice browser will go to a different STEP element or state depending on which cola the user selects. If the user said "coke” or "coca-cola", the voice browser would go to the STEP element named "coke”.
  • the PHONE input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL).
  • the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the PHONE input makes use of an input grammar to interpret the user's response and store that response in a standard format.
  • the phone number is interpreted as a string of digits and stored in a variable. If a user said "One, eight zero zero, seven five nine, eight eight eight eight", the response would be stored as "18007598888".
  • the following is an example of the use of the PHONE input in a markup language document.
  • the PHONE input is used to collect a telephone number from the user, store the number in the variable named "ph”, and go to the STEP named "fax".
  • the user profile information is stored in the database 244 of the system.
  • the PROFILE input includes a NAME attribute, a PROFNAME attribute, and a SUBTYPE attribute.
  • the value of the NAME attribute can be an identifier
  • the value of the PROFNAME attribute can be a profile element name (string)
  • the value of the SUBTYPE attribute can be profile element subtype (string).
  • PROFILE input in a markup language document PROFILE input in a markup language document.
  • the PROFILE input is used to retrieve the user ' s first name and store the string in a variable named "firstname". The string containing the name is then inserted into the PROMPT element using a VALUE element as further described below.
  • a VALUE element as further described below.
  • more than one INPUT element can be included in the same STEP element because the PROFILE input is not an interactive INPUT element.
  • Each STEP element contains only one INPUT element that accepts a response from the user.
  • PREFIX prefix e.g. Mr.
  • the notification address shown above can be used to send a user urgent or timely information (i.e., sending information to a pager).
  • the format of the notification address is preferably of an email address provided by the user when his or her subscription is activated.
  • the user's notification address would be stored a variable named "n_addr".
  • the application could then use this email address to send a message to the user.
  • the PROFILE input can be used in a markup language document in the following manner:
  • NEXT value
  • NEXTMETHOD value " ] />
  • the RECORD input includes a TIMEOUT attribute, a FORMAT attribute, a NAME attribute, a STORAGE attribute, a NEXT attribute, and a NEXTMETHOD attribute.
  • the value of the TIMEOUT attribute can be the maximum record time represented in milliseconds
  • the value of the FORMAT attribute can be a recorded audio format (audio/wav)
  • the value of the NAME attribute can be an identifier
  • the value of the STORAGE attribute can be a file and a request
  • the value of the NEXT attribute can be a next step address (i.e., a URL)
  • the value of the NEXTMETHOD attribute can be a get, post and put.
  • the RECORD input is used to record a seven second audio sample, and then
  • the RECORD input is used to record another seven second audio sample.
  • the sample is stored in a file, instead of sent in the HTTP request as it was in the previous example.
  • the name of the file is chosen by the voice browser automatically and is stored in a variable named "theName".
  • the voice browser After storing the audio sample in the file, the voice browser will continue execution at the URL specified by the NEXT attribute.
  • the value of the variable "theName” will be the name of the audio file.
  • the value of the variable "theName" would be null.
  • TIME input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL).
  • the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the TIME input makes use of an input grammar to interpret the user's response and to store that response in a standard format.
  • This grammar will interpret responses of various forms, including both 12-hour and 24-hour conventions. "Four oh three PM” becomes “403P”. Note that “P” is appended to the time. Likewise, “Ten fifteen in the morning” becomes “1015A”. "Noon” is stored as “1200P”, and “Midnight” is stored as "1200A”.
  • the following example illustrates the TIME input in a markup language document.
  • the TIME input is used to collect a time of day from the user, store that data in the variable named "wakeup”, and then go to the STEP element named "record”.
  • the YORN input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute.
  • the value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL).
  • the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
  • the YORN input maps a variety of affirmative and negative responses to the values "Y" and “N” .
  • the YORN input stores the value "Y” for affirmative responses and the value "N” for negative responses .
  • Affirmative and negative responses are determined using an input grammar that maps various user responses to the appropriate result.
  • the following example illustrates the user of the YORN input in a markup language document.
  • the YORN input is used to collect a "yes” or “no” response from the user, store that response into a variable named "fire”, and then go to the STEP named "confirm”.
  • the OPTION input includes a VALUE attribute, a NEXT attribute, and a NEXTMETHOD attribute.
  • the value of the VALUE attribute can be a literal value
  • the value of the NEXT attribute can be a next step address (i.e., a URL)
  • the value of the NEXTMETHOD attribute can be a get and a post.
  • the OPTION element can exist within the INPUT element, and then only when using the OPTIONLIST input.
  • the example shown above illustrates the use of the OPTION element within the INPUT element.
  • the first OPTION element would be executed when the user responded with "one”
  • the second OPTION would be executed when the user responded with "two”.
  • the example shown above illustrates the use of the OPTION element to select one of three applications.
  • the URLs used in the NEXT attributes are full HTTP URLs, and that unlike the previous example, each OPTION element has a unique NEXT attribute.
  • the OPTIONS element of the markup language i.e., ⁇ OPTIONS/>
  • the OPTIONS element can be used in HELP elements to present the user with a complete list of valid responses.
  • the OPTIONS element can be used anywhere that text is read to the user.
  • the OPTIONS element can be contained by a PROMPT, EMP, PROS, HELP,
  • the ACK element includes a CONFIRM attribute, a BACKGROUND attribute, and a REPROMPT attribute.
  • the value of the BACKGROUND and REPROMPT attributes can be a "Y" and "N", and the
  • CONFIRM attribute can be a YORN element as described above.
  • the ACK element can be contained within a STEP element or a CLASS element as further described below. The following is an example of a markup language document containing the Ack element.
  • the ACK element is used to confirm the user's choice of credit card.
  • the PROMPT element is read to the user using text-to-speech unit 252. The system waits until the user responds with "visa”, “mastercard”, or “discover” and then asks the user to confirm that the type of card was recognized correctly. If the user answers "yes” to the ACK element, the voice browser will proceed to the STEP element named "exp". If the user answers "no" to the ACK element, the text of the PROMPT element will be read again, and the user will be allowed to make his or her choice again. The voice browser then re-enters or executes the STEP element again.
  • the AUDIO element includes a SRC attribute.
  • the value of the SRC attribute can be an audio file URL.
  • the AUDIO element can be contained within a PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element.
  • the following markup language contains the AUDIO element.
  • the AUDIO element is included in a PROMPT element.
  • a prompt i.e., "At the tone, the time will be 11:59 pm. "
  • the WAV file "beep.wav” will be played to the user as specified by the AUDIO element.
  • SIZE "value” ] />) is used to insert a pause into content or information to be played to the user.
  • the BREAK element includes a MSEC attribute and a SIZE attribute.
  • the value of the MSEC attribute can include a number represented in milliseconds, and the value of the SIZE attribute can be none, small, medium, and large.
  • the BREAK element can be used when text or audio sample is to be played to the user.
  • the BREAK element can be contained within a PROMPT, EMP, PROS, HELP,
  • the BREAK element is used with a MSECS attribute, inside a PROMPT element.
  • a prompt i.e, "Welcome to Earth."
  • the system will then pause for 250 milliseconds, and play "How may I help you?".
  • SIZE attribute i.e., "small”
  • the OR element of the markup language (i.e., ⁇ OR/>) is used to define alternate recognition results in an OPTION element.
  • the OR element is interpreted as a logical OR, and is used to associate multiple recognition results with a single NEXT attribute.
  • the CANCEL element includes a NEXT attribute and a NEXTMETHOD attribute.
  • the value the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
  • the CANCEL element can be invoked through a variety of phrases. For example, the user may say only the word “cancel”, or the user may say “I would like to cancel, please.”
  • the CANCEL element can be contained within a STEP element or a CLASS element.
  • the voice browser detects "cancel” from the user, the voice browser responds based upon the use of the CANCEL element in markup language document. If no CANCEL element is associated with a given STEP element, the current prompt will be interrupted (if it is playing) and will stay in the same application state and then process any interactive inputs.
  • the following example illustrates a markup language containing the CANCEL element.
  • the example above illustrates the use of the CANCEL element to specify that when the user says “cancel", the voice browser proceeds to the STEP element named "traffic nenu", instead of the default behavior, which would be to stop the PROMPT element from playing and wait for a user response.
  • the user can also interrupt the PROMPT element by speaking a valid OPTION element.
  • the user could interrupt the PROMPT element and get the traffic conditions for a different city by saying "new city”.
  • the CASE input includes a VALUE attribute, a NEXT attribute, and a NEXTMETHOD attribute.
  • the value of the VALUE attribute can be a literal value
  • the value of the NEXT attribute can be a next step address (i.e. a URL)
  • the value of the NEXTMETHOD attribute can be a get and a post.
  • the CASE element can be contained by a SWITCH element or an INPUT element, when using an input type of the INPUT element that collects a single value (i.e., DATE, DIGITS, MONEY, PHONE, TIME, YORN).
  • the following example illustrates a markup language containing a CASE element.
  • the markup language shows the use of the CASE element within the SWITCH element.
  • the CASE elements are used to direct the voice browser to different URLs based on the value of the markup language variable "pizza".
  • the CLASS input includes a NAME attribute, a PARENT attribute, a BARGEIN attribute, and a COST attribute.
  • the value of the NAME and the PARENT attribute can be an identifier.
  • BARGEIN attribute can be "Y”and "N", and the value of the COST attribute can be an integer number.
  • the CLASS element can be used to define the default behavior of an ERROR element, a HELP element, and a CANCEL element, within a given DIALOG element.
  • the CLASS element can be contained by a DIALOG element.
  • the following example shows a markup language document containing the CLASS element.
  • the markup language document illustrates the use of the CLASS element to define a HELP element and an ERROR element that will be used in several steps within this DIALOG element.
  • the markup language also illustrates the use of the PARENT attribute in the STEP element to refer to the CLASS element, and therefore inherit the behaviors defined within it.
  • the STEP element When interpreted by the voice browser, the STEP element will behave as if the HELP and ERROR elements that are defined in the CLASS element were defined explicitly in the steps themselves
  • the EMP element includes a LEVEL attribute.
  • the value of the LEVEL element can be none, reduced, moderate, and strong.
  • the EMP element can be contained within a PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element.
  • the following example of a markup language document contains the EMP element.
  • the EMP element is used to apply "strong” emphasis to the word “really” in the PROMPT element.
  • the actual effect on the speech output is determined by the text-to-speech (TTS) software of the system.
  • TTS text-to-speech
  • the PROS element as further described below, can be used instead of the EMP element.
  • the ERROR element includes a TYPE attribute, an ORDINAL attribute, a
  • REPROMPT attribute a NEXT attribute, and a NEXTMETHOD attribute.
  • the value of the TYPE attribute can be all, no atch, nospeech, toolittle, too uch, noauth, and badnext.
  • the value of the ORDINAL attribute can be an integer number
  • the value of the REPROMPT attribute can be "Y" or "N”
  • the value of the NEXT attribute can be a next step address (i.e., a URL)
  • the value of the NEXTMETHOD attribute can be a get and a post.
  • the default behavior for the ERROR element is to play the phrase "An error has occurred.”, remain in the current STEP element, replay the PROMPT element, and wait for the user to respond.
  • the ERROR element can be contained within a
  • the ERROR element is used to define the application's behavior in response to an error.
  • the error message is defined to be used the first time an error of type "nomatch" occurs in this STEP element.
  • the error message is to be used the second and all subsequent times an error of type "nomatch” occurs in this STEP.
  • the ORDINAL attribute of the ERROR element of the markup language determines which message will be used in the case of repeated errors within the same STEP element.
  • the voice browser can choose an error message based on the following algorithm. If the error has occurred three times, the voice browser will look for an ERROR element with an ORDINAL attribute of "3".
  • the voice browser will look for an ERROR element with an ORDINAL attribute of "2", and then "1", and then an ERROR element with no ORDINAL attribute defined.
  • the ERROR element is defined with the ORDINAL attribute of "6" in the STEP element shown above, and the same error occurred six times in a row, the user would hear the first error message one time, then the second error message four times, and finally the error message with ORDINAL attribute of "6".
  • the HELP element includes an ORDINAL attribute, a REPROMPT attribute, a NEXT attribute, and a NEXTMETHOD attribute.
  • the value of the ORDINAL attribute can be an integer number, and the value of the REPROMPT attribute can be a "Y" and "N” .
  • the value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
  • the HELP element like CANCEL the element, can be detected through a variety of phrases. The user may say only the word “help”, or the user may say “I would like help, please.” In either case, the HELP element will be interpreted.
  • the HELP element can be contained within a STEP element or a CLASS element.
  • the voice browser When the voice browser detects "help" from the user, the voice browser responds based upon the use of the HELP element in markup language document. If no HELP element is associated with a given STEP, the current prompt will be interrupted (if it is playing), the user will hear “No help is available.”, and will stay in the same application state and process any interactive inputs.
  • the following example illustrates the use of the HELP element in a markup language document.
  • the HELP element is used to define the application's behavior in response to the user input "help".
  • the help message is defined to be used the first time the user says “help”.
  • the help message is defined to be used the second and all subsequent times the user says "help”. It should also be noted that through the use of the REPROMPT attribute, the prompt will be repeated after the first help message, but it will not be repeated after the second help message.
  • the ORDINAL attribute of the HELP element of the markup language determines which message will be used in the case of repeated utterances of "help" within the same STEP element.
  • the voice browser will choose a help message based on the following algorithm. If the user has said "help” three times, the voice browser will look for a HELP element with an ORDINAL attribute of "3". If no such HELP element has been defined, the voice browser will look for a HELP element with an ORDINAL attribute of "2", and then "1", and then a HELP element with no ORDINAL attribute defined.
  • a HELP element is defined with ORDINAL attribute of "6" in the STEP element shown above, and the user said "help” six times in a row, the user would hear the first help message one time, then the second help message four times, and finally the help message with ORDINAL attribute of "6".
  • PROMPT PROMPT
  • HELP ERROR
  • CANCEL CANCEL
  • the PROS element includes a RATE attribute, a VOL attribute, a PITCH attribute, and a RANGE attribute.
  • the value of the RATE attribute can be an integer number representing words per minute
  • the value of the VOL attribute can be an integer number representing volume of speech.
  • the value of the PITCH attribute can be an integer number representing pitch in hertz
  • the value of the RANGE attribute can be an integer number representing range in hertz .
  • the PROS element can be contained within a
  • PROMPT EMP
  • PROS PROS
  • HELP ERROR
  • CANCEL CANCEL
  • ACK element
  • the rename element includes a VARNAME attribute and a RECNAME attribute.
  • the value of the VARNAME and the RECNAME attributes can be identifiers.
  • the RENAME element can exist only within the INPUT element, and then only when using the GRAMMAR input type.
  • the RENAME element is used to account for differences in the variable names collected from a grammar and those expected by another script.
  • a grammar from foo.com is used to provide input to an application hosted by fancyquotes.com. Because, in this example, the grammar and script have been developed independently, the RENAME element is used to help connect the grammar and the stock-quoting application.
  • the response element includes a FIELDS attribute, a NEXT attribute, and a NEXTMETHOD attribute.
  • the value of the FIELDS attribute can be a list of identifiers
  • the value of the NEXT attribute can be a next step address (i.e., a URL)
  • the value of the NEXTMETHOD attribute can be a get and a post.
  • the RESPONSE element enables application developers to define a different NEXT attribute depending on which of the grammar's slots were filled.
  • the RESPONSE element can exist within an INPUT element, and then only when using an input type of grammar.
  • the example shown above illustrates the use of the RESPONSE element where the user specifies less than all the possible variables available in the grammar.
  • the application can arrange to collect the information not already filled in by prior steps.
  • this example transfers to the "askaccts" STEP element if neither the source nor destination account is specified (i.e., the user said "transfer 500 dollars"), but it transfers to the "askfromacct" STEP element if the user said what account to transfer to, but did not specify a source account (i.e., if the user had said "transfer 100 dollars to savings").
  • the next URL of the INPUT element is used when the user's response does not match any of the defined responses.
  • the switch element includes a FIELD attribute.
  • the value of the FIELD attribute can be an identifier.
  • the SWITCH element is used in conjunction with the CASE element.
  • the SWITCH element can exist within the INPUT element, and then only when using the grammar input type.
  • SWITCH element in a markup language document
  • the SWITCH element is used to determine the next STEP element to execute in response to a banking request.
  • the grammar may fill in some or all of the variables (i.e., "action”, “amount”, “ fromacct”, and “toacct”). If the user asks for a transfer or balance action, the next STEP element to execute is the transfer or balance step. If the user asks for a report of account activity, a second SWITCH element determines the next STEP element based on the account type for which a report is being requested (assumed to be available in the "fromacct" variable) .
  • the VALUE element includes a FIELD attribute.
  • the value of the FIELD attribute can be an identifier.
  • the VALUE element can be used anywhere that text is read to the user.
  • the VALUE element can be contained by a PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element.
  • ⁇ /STEP> The example shown above illustrates the use of the VALUE element to read the user's selections back to the user. As shown above, the value of the variable named "first” would be inserted into the PROMPT element, and the value of the variable named "second” would be inserted into the PROMPT element.
  • the COST attribute of the STEP element of the markup language enables is used to charge a user for various services.
  • the COST attribute can be used in the definition of one of more STEP or CLASS elements.
  • the value of the COST attribute is the integer number of credits the user is to be charged for viewing the content. For example, to charge 10 credits for listening to a particular step element a provider might write the following markup language:
  • the content provider can maintain records on a per- subscriber basis.
  • FIG. 8 shows an exemplary state diagram of the weather application containing states that prompt the user for input in order to access the weather database. After speaking the current or forecast weather information, the application expects the user to say a city name or the word "exit" to return to the main welcome prompt. The user can select to hear the forecast after the current weather conditions prompt. It will be recognized that the application could be designed to address errors, help and cancel requests properly.
  • the markup language set forth below is a static version of the weather application.
  • the initial state or welcome prompt is within the first step, init (lines 11-20).
  • the user can respond with a choice of "weather", “market”, “news” or “exit”.
  • the next step weather (lines 21-29), begins.
  • the prompt queries the user for a city name. Valid choices are "London”, “New York”, and “Chicago”.
  • FIG. 9 illustrates the same state diagram for the weather application as shown in FIG. 8 with labels for each dialog boundary.
  • the initial dialog and dialogl contains the user prompts for welcome and city name.
  • the Dialogl also controls the prompts for transitioning to hear a city's current or forecast weather and returning to the main menu.
  • Dialog2 handles access of the weather database for the current conditions of the city specified by the user and the information is read to the user. The Dialog2 then returns control to dialogl again to get the user's next request.
  • dialog3 handles access of the weather database for the forecast of the city requested and speaks the information. It returns control to dailogl to get the next user input.
  • the markup language set forth below illustrates an example of the weather application corresponding to the dialog boundaries as presented in the state diagram of FIG. 9.
  • the implementation of the application is with Active Server Pages using VBscript. It consists of three files called dialogl. asp, dialog2.asp, and dialog3.asp, each corresponding to the appropriate dialog.
  • help_top and help_dialogl there are two help message types, help_top and help_dialogl (lines 16 and 29).
  • the first step, init is at line 19.
  • the weather step follows at line 32.
  • Valid city names are those from the citylist table (line 36) of the weather database.
  • Lines 7 and 8 accomplish the database connection via ADO.
  • Line 38 is the start of a loop for creating an option list of all possible city responses. If the user chooses a city, control goes to the step getcurrentweather in dialog2 , as shown at line 40. In this case, the city name is also passed to dialog2 via the variable CITY at line 34.
  • the last major step in dialogl is nextcommand and can be referenced by dialog2 or dialog3. It prompts the user for a cityname or the word forecast. Similar to the weather step, nextcommand uses a loop to create the optionlist (line 53). If the user responds with a city name, the step getcurrentweather in dialog2 is called. If the user responds with the word
  • Dialog2 contains a single step getcurrentweather.
  • the step first reads the city name into local variable strCity (line 95).
  • a database query tries to find a match in the weather database for the city (lines 97 and 98). If there is no weather information found for the city, the application will speak a message (line 101) and proceed to init step in dialogl (line 110). Otherwise, the application will speak the current weather information for the city (line 105) and switch to the nextcommand step in dialogl (line 112).
  • Dialog3 is similar to dialog2. It contains a single step getforecastweather. The database query is identical to the one in dialog2. If there is weather information available for the city, the application will speak the weather forecast (line 105), otherwise a notification message is spoken (line 101). Dialog3 relinquishes control back to dialogl with either the init step (line 110) or next command (line 112).
  • Dialog3. asp —> there has been described herein methods and systems to allow users to access information from any location in the world via any suitable network access device.
  • the user can access up-to-date information, such as, news updates, designated city weather, traffic conditions, stock quotes, and stock market indicators.
  • the system also allows the user to perform various transactions (i.e., order flowers, place orders from restaurants, place buy or sell orders for stocks, obtain bank account balances, obtain telephone numbers, receive directions to destinations, etc.)

Abstract

A voice browser to process a markup language document. A voice browser includes a network fetcher unit to retrieve information from a destination of an information source. A parser unit is communicatively coupled to the network fetcher to parse the retrieved information based on predetermined syntax. The parser unit generates a tree structure representing the hierarchy of the retrieved information. An interpreter unit and a state machine are also used. The method includes the steps of retrieving and parsing a markup language document to determine at least one user input, determining whether the user input corresponds to a predetermined grammar, and using the predetermined grammar when the user input corresponds to the predetermined grammar. The method of determining a grammar is based upon phonetic rules and pronunciation. The grammar is sent to a speech recognition engine and compared to a user input.

Description

VOICE BROWSER FOR INTERACTIVE SERVICES AND METHODS
THEREOF
Notice of Copyright
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights and similar rights whatsoever.
Field of the Invention
The present invention generally relates to information retrieval, and more particularity, to methods and systems to allow a user to access information from an information source.
Background of the Invention
On-line electronic information services are being increasingly utilized by individuals having personal computers to retrieve various types of information. Typically, a user having a personal computer equipped with a modem dials into a service provider, such as an Internet gateway, an on-line service (such an America On-line, CompuServer, or Prodigy), or an electronic bulletin board to download data representative of the information desired by the user. The information from the service provider is typically downloaded in real-time (i.e., the information is downloaded contemporaneously with a request for the information). Examples of information downloaded in this manner include electronic versions of newspapers, books (i.e., an encyclopedia), articles, financial information, etc. The information can include both text and graphical in any of these examples .
Brief Description of the Drawings
The invention is pointed out with particularity in the appended claims. However, other features of the invention will become more apparent and the invention will be best understood by referring to the following detailed description in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of an embodiment of a system in accordance with the present invention; FIG. 2 is a flow diagram of a method of retrieving information from an information source;
FIG. 3 is an exemplary block diagram of another embodiment of a system in accordance with the present invention; FIG. 4 is a block diagram of a voice browser of the system of FIG. 3;
FIGS. 5a-5c are flow diagrams of a routine carried out by the voice browser of FIG. 4;
FIG. 6 is an exemplary markup language document; FIG. 7 is a diagrammatic illustration of a hierarchical structure of the markup language document of FIG. 6;
FIG. 8 is an exemplary state diagram of a markup language document; and FIG. 9 is another an exemplary state diagram of an exemplary application of a markup language document.
Detailed Description of the Preferred Embodiments
Before explaining the present embodiments in detail, it should be understood that the invention is not limited in its application or use to the details of construction and arrangement of parts illustrated in the accompanying drawings and description. It will be recognized that the illustrative embodiments of the invention may be implemented or incorporated in other embodiments, variations and modifications, and may be practiced or carried out in various ways. Furthermore, unless otherwise indicated, the terms and expressions employed herein have been chosen for the purpose of describing the illustrative embodiments of the present invention for the convenience of the reader and are not for the purpose of limitation.
Referring now to the drawings, and more particularly to FIG. 1, a block diagram of a system 100 is illustrated to enable a user to access information. The system 100 generally includes one or more network access apparatus 102 (one being shown), an electronic network 104, and one or more information sources or content providers 106 (one being shown).
The electronic network 104 is connected to the network access apparatus 102 via a line 108, and the electronic network 102 is connected to the information source 106 via a line 110. The lines 108 and 110 can include, but are not limited to, a telephone line or link, an ISDN line, a coaxail line, a cable television line, a fiber optic line, a computer network line, a digital subscriber line, or the like. Alternatively, the network access apparatus 102 and the information source 106 can wirelessly communicate with the electronic network. For example, the electronic network 104 can provide information to the network access apparatus 102 by a satellite communication system, a wireline communication system, or a wireless communication system.
The system 100 enables users to access information from any location in the world via any suitable network access device. The users can include, but are not limited to, cellular subscribers, wireline subscribers, paging subscribers, satellite subscribers, mobile or portable phone subscribers, trunked radio subscribers, computer network subscribers (i.e., internet subscribers, intranet subscribers, etc.), branch office users, and the like.
The users can preferably access information from the information source 106 using voice inputs or commands. For example, the users can access up-to-date information, such as, news updates, designated city weather, traffic conditions, stock quotes, calendar information, user information, address information, and stock market indicators. The system also allows the users to perform various transactions (i.e., order flowers, place orders from restaurants, place buy and sell stock orders, obtain bank account balances, obtain telephone numbers, receive directions to various destinations, etc.). As shown in FIG. 1, a user utilizes the network access apparatus 102 of the system 100 to communicate and/or connect with the electronic network 104. The electronic network 104 retrieves information from the information source 106 based upon speech commands or DTMF tones from the user. The information is preferably stored in a database or storage device (not shown) of the information source 106. The information source 106 can include one or more server computers (not shown). The information source can be integrated into the electronic network 104 or can be remote from the electronic network (i.e., at a content providers facilities). It will also be recognized that the network access apparatus 102, the electronic network 104, and the information source 106 can be integrated in a single system or device.
The information of the information source 106 can be accessed over any suitable communication medium. The information source 106 can be identified by an electronic address using at least a portion of a URL (Uniform Resource Locator), a URN (Uniform Resource Name), an IP (Internet Protocol) address, an electronic mail address, a device address (i.e. a pager number), a direct point to point connection, a memory address, etc. It is noted that a URL can include: a protocol, a domain name, a path, and a filename. URL protocols include: "file:" for accessing a file stored on a local storage medium; "ftp:" for accessing a file from an FTP (file transfer protocol) server; "http:" for accessing an HTML (hypertext marking language) document; "gopher:" for accessing a Gopher server; "mailto:" for sending an e- mail message; "news:" for linking to a Usenet newsgroup; "telnet:" for opening a telnet session; and "wais:" for accessing a WAIS server. Once the electronic network 104 of the system 100 receives the information from the information source 106, the electronic network sends the information to the network access apparatus 102. The electronic network 104 can include an open, wide area network such as the Internet, the World Wide Web (WWW), and/or an on-line service. The electronic network 104 can also include, but is not limited to, an intranet, an extranet, a local area network, a telephone network, (i.e., a public switched telephone network), a cellular telephone network, a personal communication system (PCS) network, a television network (i.e., a cable television system), a paging network (i.e., a local paging network), a regional paging network, a national or a global paging network, an email system, a wireless data network (i.e., a satellite data network or a local wireless data network), and/or a telecommunication node.
The network access apparatus 102 of the system 100 allows the user to access (i.e., view and/or hear) the information retrieved from the information source. The network access apparatus can provided the information to the user as machine readable data, human readable data, audio or speech communications, textual information, graphical or image data, etc. The network access apparatus can have a variety of forms, including but not limited to, a telephone, a mobile phone, an office phone, a home phone, a pay phone, a paging unit, a radio unit, a web phone, a personal information manager (PIM), a personal digital assistant (PDA), a general purpose computer, a network television, an Internet television, an Internet telephone, a portable wireless device, a workstation, or any other suitable communication device. It is contemplated that the network access device can be integrated with the electronic network. For example, the network access device, the electronic network, and/or the information source can reside in a personal computer .
The network access apparatus 102 may also include a voice or web browser, such as, a Netscape Navigator® web browser, a Microsoft Internet Explorer® web browser, a Mosaic® web browser, etc. It is also contemplated that the network access apparatus 102 can include an optical scanner or bar code reader to read machine readable data, magnetic data, optical data, or the like, and transmit the data to the electronic network 104. For example, the network access apparatus could read or scan a bar code and then provide the scanned data to the electronic network 104 to access the information from the information source (i.e., a menu of a restaurant, banking information, a web page, weather information, etc. ) .
FIG. 2 illustrates a flow diagram of a method of retrieving information from a destination or database of the information source 106. At block 150, a user calls into the electronic network 104 from a network access apparatus . After the electronic network answers the incoming calls at block 152, the electronic network can attempt to verify that the user is a subscriber of the system and/or the type of network access apparatus the user is calling from. For example, the system may read and decode the automatic number identification (ANI) or caller line identification (CLI) of the call and then determine whether the CLI of the call is found in a stored ANI or CLI list of subscribers. The system may also identify the user by detecting a unique speech pattern from the user (i.e., speaker verification) or a PIN entered using voice commands or DTMF tones.
After the electronic network answers the call, the electronic network provides a prompt or announcement to the caller at block 154 (i.e., "Hi. This is your personal agent. How may I help you"). The electronic network can also set grammars (i.e., vocabulary) and personalities (i.e., male or female voices) for the call. The electronic network can load the grammars and personalities based upon the CLI, the network access apparatus, or the identity of the user. For example, the grammars and personalities can be set or loaded depending upon the type of device (i.e., a wireless phone), the gender of the caller (i.e., male or female), the type of language (i.e., English, Spanish, etc.), and the accent of the caller (i.e., a New York accent, a southern accent, an English accent, etc.). It is also contemplated that the personalities and grammars may be changed by the user or changed by the electronic network based upon the speech communications detected by the electronic network.
At block 156, the electronic network waits for an input or command from the user that corresponds to a destination of the information source desired by the user. The input can be audio commands (i.e., speech) or DTMF tones. After the electronic network receives the input from the user, the electronic network establishes a connection or a link to the information source at block 158. The electronic network preferably determines an electronic address of the information source (i.e., URL, a URN, an IP address, or an electronic mail address) based upon the inputs from the user (i.e., speech or DTMF tones). The electronic address can be retrieved from a database using a look-up operation based upon at least a portion of the input.
At block 160, the electronic network retrieves at least a portion of the information from the destination of the information source at block 160. The electronic network processes the information and then provides an output to the user based upon the retrieved information at block 162. The output can include a speech communication, textual information, and/or graphical information. For example, the electronic network can provide a speech communication using speech-to-text technology or human recorded speech. The process then proceeds to block 164 or block 154 as described above. It will be recognized that the above described method can be carried out by a computer.
Referring now to FIG. 3, an exemplary block diagram of an embodiment of a system 200 to enable a user to access information is shown. The system 200 enables a user to access information from any location in the world via a suitable communication device. The system 200 can provide access to yellow pages, directions, traffic, addresses, movies, concerts, airline information, weather information, new reports, financial information, flowers, personal data, calendar data, address data, gifts, books, etc. The user can also perform a series of transactions without having to terminate the original call to the system. For example, the user can access a news update and obtain weather information, all without having to dial additional numbers or terminate the original call. The system 200 also enables application developers to build applications for interactive speech applications using a markup language, such as VoxML™ voice markup language developed by Motorola, Inc.
The system 200 generally includes one or more communication devices or network access apparatus 201, 202, 203 and 204 (four being shown), an electronic network 206, and one or more information sources, such as content providers 208 and 209 (two being shown) and markup language servers. The user can retrieve the information from the information sources using speech commands or DTMF tones . The user can access the electronic network 206 by dialing a single direct access telephone number (i.e., a foreign exchange number, a local number, or a toll-free number or PBX) from the communication device 202. The user can also access the electronic network 206 from the communication device 204 via the internet, from the communication device 203 via a paging network 211, and from the communication device 201 via a local area network (LAN), a wide area network (WAN), or an email connection.
The communication devices can include, but are not limited to, landline or wireline devices (i.e., home phones, work phones, computers, facsimile machines, pay phones), wireless devices (i.e., mobile phones, trunked radios, handheld devices, PIMs, PDAs, etc.), network access devices (i.e. computers), pagers, etc. The communication devices can include a microphone, a speaker, and/or a display.
As shown in FIG. 3, the electronic network 206 of the system 200 includes a telecommunication network 210 and a communication node 212. The telecommunication network 210 is preferably connected to the communication node 212 via a high-speed data link, such as, a TI telephone line, a local area network (LAN), or a wide area network (WAN) . The telecommunication network 210 preferably includes a public switched network (PSTN) 214 and a carrier network 216. The telecommunication network 210 can also include international or local exchange networks, cable television network, interexchange carrier networks (IXC) or long distance carrier networks, cellular networks (i.e., mobile switching centers (MSC)), PBXs, satellite systems, and other switching centers such as conventional or trunked radio systems (not shown), etc. The PSTN 214 of the telecommunication network 210 can include various types of communication equipment or apparatus, such as ATM networks, Fiber Distributed data networks (FDDI), TI lines, cable television networks and the like. The carrier network 216 of the telecommunication network 210 generally includes a telephone switching system or central office 218. It will be recognized that the carrier network 216 can be any suitable system that can route calls to the communication node 212, and the telephone switching system 218 can be any suitable wireline or wireless switching system.
The communication node 212 the system 200 is preferably configured to receive and process incoming calls from the carrier network 216 and the internet 220, such as the WWW. The communication node can receive and process pages from the paging network 211 and can also receive and process messages (i.e., emails) from the LAN, WAN or email connection 213. When a user dials into the electronic network 206 from the communication device 202, the carrier network 216 routes the incoming call from the PSTN 214 to the communication node 212 over one or more telephone lines or trunks. The incoming calls preferably enters the carrier network 216 through one or more "888" or "800" INWATS trunk lines, local exchange trunk lines, or long distance trunk lines. It is also contemplated that the incoming calls can be received from a cable network, a cellular system, or any other suitable system. The communication node 212 answers the incoming call from the carrier network 216 and retrieves an appropriate announcement (i.e., a welcome greeting) from a database, server, or browser. The node 212 then plays the announcement to the caller. In response to audio inputs from the user, the communication node 212 retrieves information from a destination or database of one or more of the information sources, such as the content providers 208 and 209 or the markup language servers. After the communication node 212 receives the information, the communication node provides a response to the user based upon the retrieved information.
The node 212 can provide various dialog voice personalities (i.e., a female voice, a male voice, etc.) and can implement various grammars (i.e., vocabulary) to detect and respond to the audio inputs from the user. In addition, the communication node can automatically select various speech recognition models (i.e., an English model, a Spanish model, an English accent model, etc.) based upon a user profile, the user's communication device, and/or the user's speech patterns. The communication node 212 can also allow the user to select a particular speech recognition model.
When a user accesses the electronic network 206 from a communication device registered with the system (i.e., a user's home phone, work phone, cellular phone, etc.), the communication node 212 can by-pass a user screening option and automatically identify the user (or the type of the user's communication device) through the use of automatic number identification (ANI) or caller line identification (CLI). After the communication node verifies the call, the node provides a greeting to the user (i.e., "Hi, this is your personal agent, Maya. Welcome Bob. How may I help you?"). The communication node then enters into a dialogue with the user, and the user can select a variety of information offered by the communication node.
When the user accesses the electronic network 206 from a communication device not registered with the system (i.e., a payphone, a phone of a non-subscriber, etc.), the node answers the call and prompts the user to enter his or her name and/or a personal identification number (PIN) using speech commands or DTMF tones. The node can also utilize speaker verification to identify a particular speech pattern of the user. If the node authorizes the user to access the system, the node provides a personal greeting to the user (i.e., "Hi, this is your personal agent, Maya. Welcome Ann. How may I help you?"). The node then enters into a dialogue with the user, and the user can select various information offered by the node. If the name and/or PIN of the user cannot be recognized or verified by the node, the user will be routed to a customer service representative.
As shown in FIG. 3, the communication node 212 preferably includes a telephone switch 230, a voice or audio recognition (VRU) client 232, a voice recognition (VRU) server 234, a controller or call control unit 236, an Operation and Maintenance Office (OAM) or a billing server unit 238, a local area network (LAN) 240, an application server unit 242, a database server unit 244, a gateway server or router firewall server 246, a voice over internet protocol (VOIP) unit 248, a voice browser 250, a markup language server 251, and a paging server 252. Although the communication node 206 is shown as being constructed with various types of independent and separate units or devices, the communication node 212 can be implemented by one or more integrated circuits, microprocessors, microcontrollers, or computers which may be programmed to execute the operations or functions equivalent to those performed by the device or units shown. It will also be recognized that the communication node 212 can be carried out in the form of hardware components and circuit designs, software or computer programming, or a combination thereof.
The communication node 212 can be located in various geographic locations throughout the world or the United States (i.e., Chicago, Illinois). The communication node 212 can be operated by one or more carriers (i.e., Sprint PCS, Qwest Communications, MCI, etc.) or independent service providers, such as, for example, Motorola, Inc. The communication node 212 can be co-located or integrated with the carrier network 216 (i.e., an integral part of the network) or can be located at a remote site from the carrier network 216. It is also contemplated that the communication node 212 may be integrated into a communication device, such as, a wireline or wireless phone, a radio device, a personal computer, a PDA, a PIM, etc. In this arrangement, the communication device can be programmed to connect or link directly into an information source. The communication node 212 can also be configured as a standalone system to allow users to dial directly into the communication node via a toll free number or a direct access number. In addition, the communication node 212 may comprise a telephony switch (i.e., a PBX or Centrix unit), an enterprise network, or a local area network. In this configuration, the system 200 can be implemented to automatically connect a user to the communication node 212 when the user picks a communication device, such as, the phone. When the telephone switch 230 of the communication node 212 receives an incoming call from the carrier network 216, the call control unit 236 sets up a connection in the switch 230 to the VRU client 232. The communication node 212 then enters into a dialog with the user regarding various services and functions. The VRU client 232 preferably generates pre-recorded voice announcements and/or messages to prompt the user to provide inputs to the communication node using speech commands or DTMF tones . In response to the inputs from the user, the node 212 retrieves information from a destination of one of the information sources and provides outputs to the user based upon the information. The telephone switch 230 of the telecommunication node 212 is preferably connected to the VRU client 232, the VOIP unit 248, and the LAN 240. The telephone switch 230 receives incoming calls from the carrier switch 216. The telephone switch 230 also receives incoming calls from the communication device 204 routed over the internet 220 via the VOIP unit 248. The switch 230 also receives messages and pages from the communication devices 201 and 203, respectively. The telephone switch 230 is preferably a digital cross- connect switch, Model No. LNX, available from Excel Switching Corporation, 255 Independence Drive, Hyannis, MA 02601. It will be recognized that the telephone switch 230 can be any suitable telephone switch.
The VRU client 232 of the communication node 212 is preferably connected to the VRU server 234 and the LAN 240. The VRU client 232 processes speech communications, DTMF tones, pages, and messages (i.e., emails) from the user. Upon receiving speech communications from the user, the VRU client 232 routes the speech communications to the VRU server 234. When the VRU client 232 detects DTMF tones, the VRU client 232 sends a command to the call control unit 236. It will be recognized that the VRU client 232 can be integrated with the VRU server. The VRU client 232 preferably comprises a computer, such as, a Windows NT compatible computer with hardware capable of connecting individual telephone lines directly to the switch 230. The VRU client preferably includes a microprocessor, random access memory, readonly memory, a TI or ISDN interface board, and one or more voice communication processing board (not shown). The voice communication processing boards of the VRU client 232 are preferably Dialogic boards, Model No. Antares, available from Dialogic Corporation, 1515 Route 10, Parsippany, N.J. 07054. The voice communication boards may include a voice recognition engine having a vocabulary for detecting a speech pattern (i.e., a key word or phrase). The voice recognition engine is preferably a RecServer software package, available from Nuance Communications, 1380 Willow Road, Menlo Park, California 94025.
The VRU client 232 can also include an echo canceler (not shown) to reduce or cancel text-to-speech or playback echoes transmitted from the PSTN 214 due to hybrid impedance mismatches . The echo canceler is preferably included in an Antares Board Support Package, available from Dialogic.
The call control unit 236 of the communication node 212 is preferably connected to the LAN 240. The call control unit 236 sets up the telephone switch 230 to connect incoming calls to the VRU client 232. The call control unit also sets up incoming calls or pages into the node 212 over the internet 220 and pages and messages sent from the communication devices 201 and 203 via the paging network 203 and email system 213. The control call unit 236 preferably comprises a computer, such as, a Window NT compatible computer. The LAN 240 of the communication node 212 allows the various components and devices of the node 212 to communicate with each other via a twisted pair, a fiber optic cable, a coaxial cable, or the like. The LAN 240 may use Ethernet, Token Ring, or other suitable types of protocols. The LAN 240 is preferably a 100 Megabit per second Ethernet switch, available from Cisco Systems, San Jose, California. It will be recognized that the LAN 240 can comprise any suitable network system, and the communication node 212 may include a plurality of LANs.
The VRU server 234 of the communication node 212 is connected to the VRU client 232 and the LAN 240. The VRU server 234 receives speech communications from the user via the VRU client 232. The VRU server 234 processes the speech communications and compares the speech communications against a vocabulary or grammar stored in the database server unit 244 or a memory device. The VRU server 234 provides output signals, representing the result of the speech processing, to the LAN 240. The LAN 240 routes the output signal to the call control unit 236, the application server 242, and/or the voice browser 250. The communication node 212 then performs a specific function associated with the output signals. The VRU server 234 preferably includes a text-to- speech (TTS) unit 252, an automatic speech recognition (ASR) unit 254, and a speech-to-text (STT) unit 256. The TTS unit 252 of the VRU server 234 receives textual data or information (i.e., e-mail, web pages, documents, files, etc.) from the application server unit 242, the database server unit 244, the call control unit 236, the gateway server 246, the application server 242, and the voice browser 250. The TTS unit 252 processes the textual data and converts the data to voice data or information.
The TTS unit 252 can provide data to the VRU client 232 which reads or plays the data to the user. For example, when the user requests information (i.e., news updates, stock information, traffic conditions, etc.), the communication node 212 retrieves the desired data (i.e., textual information) from a destination of the one or more of the information sources and converts the data via the TTS unit 252 into a response.
The response is then sent to the VRU client 232. The VRU client processes the response and reads an audio message to the user based upon the response. It is contemplated that the VRU server 234 can read the audio message to the user using human recorded speech or synthesized speech. The TTS unit 252 is preferably a TTS 2000 software package, available from Lernout and Hauspie Speech Product NV, 52 Third Avenue, Burlington, Mass. 01803. The ASR unit 254 of the VRU server 234 provides speaker independent automatic speech recognition of speech inputs or communications from the user. It is contemplated that the ASR unit 254 can include speaker dependent speech recognition. The ASR unit 254 processes the speech inputs from the user to determine whether a word or a speech pattern matches any of the grammars or vocabulary stored in the database server unit 244 or downloaded from the voice browser. When the ASR unit 254 identifies a selected speech pattern of the speech inputs, the ASR unit 254 sends an output signal to implement the specific function associated with the recognized voice pattern. The ASR unit 254 is preferably a speaker independent speech recognition software package, Model No. RecServer, available from Nuance Communications . It is contemplated that the ASR unit 254 can be any suitable speech recognition unit to detect voice communications from a user.
The STT unit 256 of the VRU server 234 receives speech inputs or communications from the user and converts the speech inputs to textual information (i.e., a text message). The textual information can be sent or routed to the communication devices 201, 202, 203 and 204, the content providers 208 and 209, the markup language servers, the voice browser, and the application server 242. The STT unit 256 is preferably a Naturally Speaking software package, available from Dragon Systems, 320 Nevada Street, Newton, MA 02160-9803.
The VOIP unit 248 of the telecommunication node 212 is preferably connected to the telephone switch 230 and the LAN 240. The VOIP unit 248 allows a user to access the node 212 via the internet 220 using voice commands. The VOIP unit 240 can receive VOIP protocols (i.e., H.323 protocols) transmitted over the internet 220 and can convert the VOIP protocols to speech information or data. The speech information can then be read to the user via the VRU client 232. The VOIP unit 248 can also receive speech inputs or communications from the user and convert the speech inputs to a VOIP protocol that can be transmitted over the internet 220. The VOIP unit 248 is preferably a Voice Net software package, available from Dialogic Corporation. It will be recognized that the VOIP device can be incorporated into a communication device. The telecommunication node 212 also includes a detection unit 260. The detection unit 260 is preferably a phrase or key word spotter unit to detect incoming audio inputs or communications or DTMF tones from the user. The detector unit 260 is preferably incorporated into the switch 230, but can be incorporated into the VRU client 232, the carrier switch 216, or the VRU server 256. The detection unit 260 is preferably included in a RecServer software package, available from Nuance Communications.
The detection unit 260 records the audio inputs from the user and compares the audio inputs to the vocabulary or grammar stored in the database server unit 244. The detector unit continuously monitors the user's audio inputs for a key phase or word after the user is connected to the node 212. When the key phrase or word is detected by the detection unit 260, the VRU client 232 plays a pre-recorded message to the user. The VRU client 232 then responds to the audio inputs provided by the user.
The billing server unit 238 of the communication node 212 is preferably connected to the LAN 240. The billing server unit 238 can record' data about the use of the communication node by a user (i.e., length of calls, features accessed by the user, etc.). Upon completion of a call by a user, the call control unit 236 sends data to the billing server unit 238. The data can be subsequently processed by the billing server unit in order to prepare customer bills. The billing server unit 238 can use the ANI or CLI of the communication device to properly bill the user. The billing server unit 238 preferably comprises a Windows NT compatible computer. The gateway server unit 246 of the communication node 212 is preferably connected to the LAN 240 and the internet 220. The gateway server unit 246 provides access to the content provider 208 and the markup language server 257 via the internet 220. The gateway unit 246 also allows users to access the communication node 212 from the communication device 204 via the internet 220. The gateway unit 246 can further. function as a firewall to control access to the communication node 212 to authorized users. The gateway unit 246 is preferably a Cisco Router, available from Cisco Systems. The database server unit 244 of the communication node 212 is preferably connected to the LAN 240. The database server unit 244 preferably includes a plurality of storage areas to store data relating to users, speech vocabularies, dialogs, personalities, user entered data, and other information. Preferably, the database server unit 244 stores a personal file or address book. The personal address book can contain information required for the operation of the system, including user reference numbers, personal access codes, personal account information, contact's addresses, and phone numbers, etc. The database server unit 244 is preferably a computer, such as an NT Window compatible computer .
The application server 242 of the communication node 212 is preferably connected to the LAN 240 and the content provider 209. The application server 242 allows the communication node 212 to access information from a destination of the information sources, such as the content providers and markup language servers . For example, the application server can retrieve information (i.e., weather reports, stock information, traffic reports, restaurants, flower shops, banks, etc.) from a destination of the information sources. The application server 242 processes the retrieved information and provides the information to the VRU server 234 and the voice browser 250. The VRU server 234 can provide an audio announcement to the user based upon the information using text-to-speech synthesizing or human recorded voice. The application server 242 can also send tasks or requests (i.e., transactional information) received from the user to the information sources (i.e., a request to place an order for a pizza). The application server 242 can further receive user inputs from the VRU server 234 based upon a speech recognition output. The application server is preferably a computer, such as an NT Windows compatible computer.
The markup language server 251 of the communication node 212 is preferably connected to the LAN 240. The markup language server 251 can include a database, scripts, and markup language documents or pages. The markup language server 251 is preferably a computer, such as an NT Window Compatible Computer. It will also be recognized that the markup language server 251 can be an internet server (i.e., a Sun Microsystems server).
The paging server 252 of the communication node 212 is preferably connected to the LAN 240 and the paging network 211. The paging server 252 routes pages between the LAN 240 and the paging network. The paging server 252 is preferably a computer, such as a NT compatible computer .
The voice browser 250 of the system 200 is preferably connected to the LAN 240. The voice browser 250 preferably receives information from the information sources, such as the content provider 209 via the application server 242, the markup language servers 251 and 257, the database 244, and the content provider 208. In response to voice inputs from the user or DTMF tones, the voice browser 250 generates a content request (i.e., an electronic address) to navigate to a destination of one or more of the information sources . The content request can use at least a portion of a URL, a URN, an IP, a page request, or an electronic email. After the voice browser is connected to an information source, the voice browser preferably uses a TCP/IP connect to pass requests to the information source. The information source responds to the requests, sending at least a portion of the requested information, represented in electronic form, to the voice browser. The information can be stored in a database of the information source and can include text content, markup language document or pages, non-text content, dialogs, audio sample data, recognition grammars, etc. The voice browser then parses and interprets the information as further described below. It will be recognized that the voice browser can be integrated into the communication devices 201, 202, 203, and 204.
As shown in FIG. 3, the content provider 209 is connected to the application server 244 of the communication node 212, and the content provider 208 is connected to the gateway server 246 of the communication node 212 via the internet 220. The content providers can store various content information, such as news, weather, traffic conditions, etc. The content providers 208 and 209 can include a server to operate web pages or documents in the form of a markup language. The content providers 208 and 209 can also include a database, scripts, and/or markup language documents or pages. The scripts can include images, audio, grammars, computer programs , etc . The content providers execute suitable server software to send requested information to the voice browser.
Referring now to FIG. 4, a block diagram of the voice browser 250 of the communication node 212 is illustrated. The voice browser 250 generally includes a network fetcher unit 300, a parser unit 302, an interpreter unit 304, and a state machine unit.306. Although the voice browser is shown as being constructed with various types of independent and separate units or devices, it will be recognized that the voice browser 250 can be carried out in the form of hardware components and circuit designs, software or computer programming, or a combination thereof.
The network fetcher 300 of the voice browser 250 is connected to the parser 302 and the interpreter 304. The network fetcher 300 is also connected to the LAN 240 of the communication node 212. The network fetcher unit 304 retrieves information, including markup language documents, audio samples and grammars from the information sources. The parser unit 302 of the voice browser 250 is connected to the network fetcher unit 300 and the state machine unit 306. The parser unit 302 receives the information from the network fetcher unit 300 and parses the information according to the syntax rules of the markup language as further described below (i.e., extensible markup language syntax). The parser unit 302 generates a tree or heirarchial structure representing the markup language that is stored in memory of the state machine unit 306. A tree structure of an exemplary markup language document is shown in FIG. 7. The following text defines the syntax and grammar that the parser unit of the voice browser utilizes to build a tree structure of the markup language document.
<! ELEMENT dialog (step | class ) *>
<!ATTLIST dialog bargein (Y|N) "Y"> <! ELEMENT Step ( prompt | input | help | error | cancel | ack ) *> <!ATTLIST step name ID #REQUIRED parent IDREF #IMPLIED bargein (Y|N) "Y" cost CDATA #IMPLIED> <! ELEMENT class (prompt | help | error | cancel | ack) *> <!ATTLIST class name ID #REQUIRED parent IDREF #IMPLIED bargein (Y|N) "Y" cost CDATA #IMPLIED>
<! ELEMENT prompt (#PCDATA I options I value | emp | break | pros | audio ) *>
<! ELEMENT emp (#PCDATA I options | value | emp | break | pros | audio ) *> <!ATTLIST emp level ( strong [moderate | none | reduced) "moderate ">
<! ELEMENT pros (#PCDATA | options | value | emp | break | pros | audio ) *> <!ATTLIST pros rate CDATA #IMPLIED vol CDATA #IMPLIED pitch CDATA #IMPLIED range CDATA #IMPLIED> <! ELEMENT help (#PCDATA | options | value | emp | break | pros | audio ) *> <!ATTLIST help ordinal CDATA #IMPLIED reprompt (Y|N) "N" next CDATA #IMPLIED nextmethod ( get | post) "get"> <! ELEMENT error (#PCDATA | options | value | emp | break | pros | audio ) *> <!ATTLIST error type NMTOKENS "ALL" ordinal CDATA #IMPLIED reprompt (Y|N) "N" next CDATA #IMPLIED nextmethod ( get | post) "get">
< ! ELEMENT cancel [#PCDATA|value| emp | break | pros | audio ) *> <!ATTLIST cancel next CDATA #REQUIRED nextmethod ( get | post) "get">
< !ELEMENT audio EMPTY> <!ATTLIST audio src CDATA #REQUIRED> < !ELEMENT ack
(#PCDATA | options | value | emp | break | pros | audio ) *> <!ATTLIST ack confirm NMTOKEN "YORN" background (Y|N) "N" reprompt (Y|N) "N"> <! ELEMENT input ( option | response | rename | switch | case ) *> <!ATTLIST input type ( none | optionlist | record | grammar | profile | hidden | yorn | digits j number j time | date | money | phone ) #REQUIRED name ID #IMPLIED next CDATA #IMPLIED nextmethod ( get | post) "get" timeout CDATA #IMPLIED min CDATA #IMPLIED max CDATA #IMPLIED profna e NMTOKEN #IMPLIED subtype NMTOKEN #IMPLIED src CDATA #IMPLIED value CDATA #IMPLIED msecs CDATA #IMPLIED storage (file | request ) #REQUIRED format CDATA #IMPLIED> <!ELEMENT switch (case | switch) *> <!ATTLIST switch field NMTOKEN #REQUIRED> <! ELEMENT response ( switch) *> <!ATTLIST response next CDATA #IMPLIED nextmethod ( get | post) "get" fields NMTOKENS #REQUIRED> <!ELEMENT rename EMPTY>
<!ATTLIST rename varname NMTOKEN #REQUIRED recna e NMTOKEN #REQUIRED> <!ELEMENT case EMPTY>
<!ATTLIST case value CDATA #REQUIRED next CDATA #REQUIRED nextmethod ( get | post) "get"> <!ELEMENT value EMPTY>
<!ATTLIST value name NMTOKEN #REQUIRED> <!ELEMENT break EMPTY> <!ATTLIST break msecs CDATA #IMPLIED> size (none | small | edium | large) "medium">
<!ELEMENT options EMPTY> <!ELEMENT or EMPTY> <!ELEMENT option (#PCDATA | value | or) *> <!ATTLIST option value CDATA #IMPLIED next CDATA #IMPLIED nextmethod (ge | post) "get">
Referring again to FIG. 4, the interpreter unit 304 of the voice browser 250 is connected to the state machine unit 306 and the network fetcher unit 300. The interpreter unit 304 is also connected to the LAN. The interpreter unit 304 carries out a dialog with the user based upon the tree structure representing a markup language document. The interpreter unit sends data to the TTS 252. The interpreter unit 304 can also receive data based upon inputs from the user via a VRU server and can send outputs to the information source based upon the user inputs . The interpreter unit 304 can transition from state to state (i.e., step to step) within a tree structure (i.e., a dialog) of a markup language document or can transition to a new tree structure within the same dialog or another dialog. The interpreter unit determines the next state or step based upon the structure of the dialog and the inputs from the user. When the interpreter unit transitions to a new dialog or page, the address of the new dialog or page is then sent to the network fetcher.
The state machine 306 of the voice browser 250 is connected to the parser unit 302 and the interpreter unit 304. The state machine 306 stores the tree structure of the markup language and maintains the current state or step that the voice browser is executing.
FIGS. 5a-5c illustrate a flow diagram of a software routine executed by the voice browser 250 . The software routine allows interactive voice applications. At block 400, the voice browser 250 determines an initial address (i.e., a URL) and a step element or name. The voice browser then fetches the contents (i.e., a markup or language document) of the current address from the information sources (i.e., content providers and markup language servers) at block 402. After the voice browser fetches the address, the voice browser processes the contents and builds a local step table (i.e., a tree structure) at block 404.
At block 406, a prompt can be played to the user via the TTS unit of the system 200 for the current element. The voice browser then waits for an input from the user (i.e., speech or DTMF tones). At block 408, the voice browser can collect input from the user for the current step element. FIG. 5c shows an exemplary flow diagram of a routine that is executed by the voice browser to determine the grammar for speech recognition. At block 502, the voice browser determines whether a pre-determined grammar exists for the user input and the markup language. For example, the voice browser determines whether the grammar for the user input is found in a predetermined or pre-existing grammar stored in a database or contained in the markup language. If the grammar is found, the voice browser sends the grammar to the VRU server at block 504. At block 506, the VRU server compares the user input to the grammar to recognize the user input. After the VRU server recognizes the user input, the process proceeds to block 410 (see FIG. 5a) as described below. If a pre-existing grammar is not found at block
502, the voice browser dynamically generates the grammar for the user input. At block 508, the voice browser looks up the pronunciations for the user in a dictionary at block 508. The dictionary can be stored in a database of the system or stored on an external database (i.e., the voice browser can fetch a dictionary from the processor or from the internet).
At block 510, the voice browser generates the grammar for the user inputs based upon the pronunciations from the dictionary and phonetic rules. A software routine available from Nuance Communication, Model No. RecServer, can be used to generate the grammar. At block 512, the grammar is sent to the VRU server. The voice browser then attempts to match the grammar to the user input at block 506.
After the voice browser detects or collects an input from the user at block 408, the voice browser determines whether there is an error at block 410. If the voice browser is having difficulty recognizing -29-
inputs from the user or detects a recognition error, a timeout error, etc., an appropriate error message is played to the user at block 414. For example, if the voice browser detected too much speech from the user or the recognition is too slow, a prompt is played (i.e., "Sorry, I didn't understand you") to the user via the VRU server. If the voice browser receives unexpected DTMF tones, a prompt is played (i.e., "I heard tones. Please speak your response") to the user via the VRU server. If the voice browser does not detect any speech from the user, a prompt is read to the user (i.e., "I am having difficulty hearing you").
At block 416, the voice browser determines whether a re-prompt was specified in the error response or element. If a re-prompt is to be played to the user at block 416, the process proceeds to block 406 as described above. If a re-prompt is not to be played to the user at block 416, the voice browser determines whether there is a next step element specified in the error response at block 420. If another step element is specified in the error response at block 420, the process proceed to block 402 as described above. If another step element is not specific in the error response at block 420, the process proceeds to block 422.
If the voice browser does not detect a recognition error at block 410, the voice browser determines whether the user requested help at block 412. If the user requested help, an appropriate help response is played to the user (i.e., "please enter or speak your pin") at block 424.
At block 425, the voice browser determines whether a re-prompt was specified in the help response or step. If a re-prompt is specified in the help response at block 425, the process proceeds to block 406 as described above. If a re-prompt is not specified in the help response at block 425, the voice browser determines whether a next step element is specified in the help response at block 426. If another step element is specified in the help response at block 426, the process proceeds to block 402 as described above. If another step element is not specific in the help response at block 426, the process proceeds to block 428. At block 430, the voice browser determines whether a cancel request has been indicated by the user. If the voice browser detects a cancel request from the user at block 430, an appropriate cancel message is played to the user at block 434 (i.e., "Do you wish to exit and return to the Main Menu?").
At block 436, the voice browser then determines whether there a next step element is specified in the cancel response or element. If another step element is specified in the cancel response at block 436, the process proceeds to block 448. If another step element is not specified in the error response at block 436, the process proceeds to block 422.
If a cancel request was not detected at block 430, the voice browser determines the next step element at block 432. At block 440, the voice browser determines whether there is an acknowledgement specified in the next step element. If there is no acknowledgement specified in the step element at block 440, the voice browser sets the current step element to the next step element at block 442 and then determines whether the next step element is within the same page at block 444.
If the next step element is within the same page as the current step element at block 444, the process proceeds to block 446. If the next step element is not within the same page as the current page at block 444, the process proceeds to block 448.
If an acknowledgement is specified in the next step element at block 440, an acknowledgement response is played to the user at block 450. The voice browser then determines whether a confirmation is specified in the information (i.e., a markup language document) at block 452. If a confirmation is not specified in the information at block 452, the process proceeds to block 442 as described above. If a confirmation is specified at block 452, the voice browser determines whether the response was recognized from the user a block 454 and then determines whether the response is affirmative at block 456. If the voice browser receives an affirmative response at block 456, the process proceeds to block 442 as described above. If the voice browser does not receive an affirmative response from the user at block 456, the process proceeds to block 448. The following text describes an exemplary markup language processed by the voice browser of the communication node 212. The markup language preferably includes text, recorded sound samples, navigational controls, and input controls for voice applications as further described below. The markup language enables system designers or developers of service or content providers to create application programs for instructing the voice browser to provide a desired user interactive voice service. The markup language also enables designers to dynamically customize their content. For example, designers can provide up-to-date news, weather, traffic, etc.
The markup language can be designed to express flow of control, state management, and the content of information flow between the communication node 212 and the user. The structure of the language can be designed specifically for voice applications and the markup language is preferably designed and delivered in units of dialog.
The markup language can include elements that describe the structure of a document or page, provide pronunciation of words and phrases, and place markers in the text to control interactive voice services. The markup language also provides elements that control phrasing, emphasis, pitch, speaking rate, and other characteristics. The markup language documents are preferably stored on databases of the information sources, such as the content providers 208 and 209 and the markup language servers 251 and 257.
FIG. 6 illustrates an exemplary markup language document that the voice browser of the communication node can process. The markup language document has a hierarchical structure, in which every element (except the dialog element) is contained by another element. Elements between another elements are defined to be children or a lower element of the tree. FIG. 7 illustrates a tree stucture of the markup language document of FIG. 6. As shown in FIG. 6, the markup language document includes tags, denoted by <> symbols, with the actual element between the brackets . The markup language includes start tags ("< >") and end tags ("</ >"). A start tag begins a markup element and the end tags ends the corresponding markup element. For example, in the markup language document as shown in FIG. 6, the DIALOG element (<dialog>) on line 2 begins a markup language document or page, and the dialog element (<dialog>) on line 26 indicates the markup language document has ended. The elements often have attributes which are assigned values as further described below.
The DIALOG element and STEP elements of a markup language document provide the basic structure of the document. The DIALOG element defines the scope of the markup language document, and all other elements are contained by the DIALOG element. The STEP elements define states within a DIALOG element (i.e., the STEP element defines an application state). For example, an application state can include initial prompts, help messages, error messages, or cleanup and exit procedures .
The DIALOG element and the associated STEP elements of a markup language document define a state machine that represents an interactive dialogue between the voice browser and the user. When the voice browser interprets the markup language document, the voice browser will navigate through the DIALOG element to different STEP elements as a result of the user's responses.
The following example illustrates an exemplary markup language document that the voice browser of the communication node can process. The example has one
DIALOG element and two STEP elements. <?XML VERSION="1.0"?>
<DIALOG>
<STEP NAME="init">
<PROMPT> Please select a soft drink. </PROMPT> <HELP> Your choices are coke, pepsi, 7 up, or root beer. </HELP> <INPUT TYPE="optionlist" NAME="drink"> <OPTION NEXT="#confirm"> coke </OPTION> <OPTION NEXT="#confirm"> pepsi </OPTION> <OPTION NEXT="#confirm"> 7 up </OPTION>
<OPTION NEXT="#confirm"> root beer </OPTION> </INPUT> </STEP> <STEP NAME="confirm"> <PROMPT> You ordered a <VALUE NAME="drink"/> . </PROMPT> </STEP> </DIAL0G>
When the above markup language document is interpreted by the voice browser, the voice browser initially executes the STEP element called "init".
First, the user will hear the text contained by the prompt element (i.e., "Please select a soft drink."). If the user responds "help" before making a selection, the user would hear the text contained with the HELP element (i.e., "Your choices are coke, pepsi, 7up, or root beer."). After the user makes a selection, the voice browser will execute the STEP element named
"confirm", which will read back the user's selection and then exit the application. It is noted that the STEP elements in a markup language document are executed based on the user's responses not on the order of the STEP elements within the source file. Although the definition of the "init" STEP element appears before and the definition of the "confirm" STEP element, the order in which they are defined has no impact on the order in which the voice browser navigates through them. The following text describes the markup language elements, their attributes, and their syntax. The DIALOG element of the markup language (i.e., <DIALOG [BARGEIN="value" ] > markup language document </DIAL0G>) is the fundamental element of the markup language. The DIALOG element includes a BARGEIN attribute. The value of the BARGEIN attribute can be "Y" and "N" . The BARGEIN attribute allows the DIALOG element to be interrupted at any time based upon a predetermined response from the user (i.e., wake up). The DIALOG element defines the basic unit of context within an application, and typically, there is one DIALOG element per address (i.e., URL). Each DIALOG element contains one STEP element named "init". The execution of the DIALOG element begins with the STEP named "init". The following example of a markup language document or page contains the DIALOG element.
<DIALOG>
<STEP NAME="init">
<PROMPT> Welcome to VoxML™ voice markup language. </PROMPT>
</STEP> </DIALOG>
In the example above, the DIALOG element contains a single STEP element named "init". The STEP element has a single PROMPT element that will be read to the user via the text-to-speech unit 252. Since there is no INPUT element defined in the STEP element, the markup language application will terminate immediately after the PROMPT element is read.
The STEP element of the markup language (i.e., <STEP NAME="value" [PARENT="value " ] [BARGEIN="value" ] [COST="value" ] > text </STEP>) defines a state in a markup language document or page. The STEP element is contained by a DIALOG element. The STEP element includes a NAME attribute, a PARENT attribute, a BARGEIN attribute, and a COST attribute. The value of the NAME and PARENT attribute can be an identifier (i.e., a pointer or a variable name), the value of the BARGEIN attribute can be "Y" and "N", and the value of the COST attribute can be an integer.
The STEP element typically has an associated PROMPT element and INPUT element that define the application state. The following example illustrates the use of the STEP element in a markup language document.
<STEP NAME="askpython" PARENT="tvrating"> <PROMPT> Please rate Monty Python ' s Flying Circus on a scale of 1 to 10. </PROMPT> <INPUT NAME="python" TYPE= "number" NEXT="#drwho" /> </STEP>
The example shown above illustrates a STEP element that collects the user's opinion on one of several public television shows. The STEP element uses the
PARENT attribute to share a common set of help and error elements with other TV-show-rating STEP elements . For example, the PARENT attribute can contain a HELP element explaining what a rating of 1, 5, and 10 would mean, and a common error message can remind the user that a numeric rating is expected.
The PROMPT element of the markup language (i.e., <PROMPT> text </PROMPT>) is used to define content (i.e., text or an audio file) that is to be presented to the user. Typically, the PROMPT element will contain text and several markup elements (i.e., the BREAK or EMP elements as described below) that are read to the user via the text-to-speech unit.
The PROMPT element can be contained within a STEP or a CLASS element. The following example illustrates the use of the PROMPT element in markup language document or page.
<STEP NAME="init">
<PROMPT> How old are you? </PROMPT> <INPUT TYPE=" number" NAME="age"
NEXT="#weight"/> </STEP>
In the example shown above, the text "How old are you?" will be played to the user via the text-to-speech unit, and then the voice browser will wait for the user to say his or her age. The INPUT element of the markup language is used to define a valid user input within each STEP element. The INPUT element is contained within a STEP element. The INPUT element of the markup language includes an INPUT attribute. The value of the INPUT attribute can be a DATE input, a DIGIT input, a FORM input, a GRAMMAR input, a HIDDEN input, a MONEY input, a NONE element, a NUMBER input, an OPTIONLIST input, a PHONE input, a PROFILE input, a RECORD input, a TIME input, and a YORN element.
The DATE input of the INPUT attribute of the markup language (i.e., <INPUT TYPE="DATE" NAME="value" NEXT="value" [NEXTMETHOD="value" ] [TIsMEOUT="value" ] />) is used to collect a calendar date from the user. The DATE input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be the next STEP address (i.e., a URL). The value of the NEXTMETHOD attribute can be a get and a post (i.e., an input into a Java Script program or a markup language server), and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The following example illustrates the use of the DATE input in a markup language document.
<STEP NAME="init">
<PROMPT> What is your date of birth? <PROMPT>
<INPUT TYPE="date" NAME="dob" NEXT="#soc"/>
</STEP> In the example above, the DATE input is used to gather the user's birthday, store it in a variable
"dob", and then go to the STEP element named "soc". The
DATE input makes use of an input grammar to interpret the user's response and store that response in a standard format. The DATE input grammar can interpret dates expressed in several different formats. A fully defined date, such as, "next Friday, July 10th, 1998" is stored as "07101998 I July 11011998 | Friday | next" . If the date cannot be determined by the user's response, the ambiguous parts of the response will be omitted from the data. The response "July 4th", is stored as "????????| July I 4 I I I " , "Tomorrow" becomes "???????? I I I I I tomorrow" , "The 15th" is stored as "???????? | 115 J I I ", and "Monday" becomes "???????? I I I |Monday| ".
The DIGITS input of the INPUT attribute of the markup language (i.e., <INPUT TYPE="DIGITS" NAME="value" NEXT= "value " [NEXTMETHOD="value " ] [TIMEOUT="value " ] [MIN="value" ] [MAX="value" ] />) is used to collect a series of digits from the user. The DIGITS input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, a TIMEOUT attribute, a MIN attribute, and a MAX attribute. The value of the NAME attribute can be an identifier, the value of the NEXT attribute can be a next step address (i.e., a URL), the value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds. The value of the MIN and MAX attributes can be minimum and maximum integer values, respectively.
The following example illustrates the use the
DIGITS input in a markup language document or page.
<STEP NAME="init"> <PROMPT> Please say your pin now. </PROMPT>
<INPUT TYPE="digits" NAME="pin" NEXT="#doit"/> </STEP>
In the example above, the DIGITS input is used to collect digits from the user, store the number in the a variable named "pin" , and then go to the STEP named "doit". If the user were to speak, "four five six", in response to the PROMPT element, the value "456" would be stored in the variable "pin". The DIGITS input can collect the digits 0 (zero) through 9 (nine), but not other numbers like 20 (twenty). To collect double-digit numbers (i.e., 20 (twenty) or 400 (four-hundred), the NUMBER input can be used as further described below. The FORM input of INPUT attribute of the markup language (i.e., <INPUT TYPE="FORM" NAME="value"
MEHOD="value" ACTION="value" TIMEOUT="value" /> is used to collect input from the user, convert the input to text using the speech to text unit, and send the text to the markup language server. The FORM input includes a NAME attribute, a NEXT attribute, a METHOD attribute, an ACTION attribute and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL, pointer or mamory address). The value of the METHOD attribute can be a get or a post, and the value of the ACTION attribute is a pointer to a script that processes the input on the server. The value of the TIMEOUT attribute can be a number represented in milliseconds. The FORM input makes use of the speech to text unit to convert user input to text. The user input is then sent to the markup language server in a standard HTML <FORM> text format to be processed by a script on the server. If the user said "John Smith" then the text string "john smith" would be sent to the server using the pointer and address indicated by the ACTION attribute using the method indicated by the METHOD attribute in a <FORM> format. The following is an example of the use of the FORM input in a markup language document.
<STEP NAME="order form">
<PROMPT> What you like to order? </PROMPT> <INPUT TYPE="form" NAME="order" NEXT="#next order" METHOD="post"
ACTION="http: //www. test.com/cqi-bin/post- query"
TIMEOUT="200" /> </STEP>
In the example shown above, the FORM input is used to collect an order input from the user, store the user input converted to text in the variable named "order", go to the next step named "next order", post the text to the address "http: //www. test.com/cgi-bin/post-query" , and use a timeout value of 200 milliseconds.
The GRAMMAR input of the of the INPUT attribute of the markup language (i.e., <INPUT TYPE="GRAMMAR" SRC="value" NEXT="value" [NEXTMETHOD="value" ] [TIMEOUT="value" ] />, <INPUT TYPE="GRAMMAR" SRC="value" NEXT="value " [NEXTMETHOD="value " ] [TIME0UT="value " ] > RENAME elements </lNPUT>, or <INPUT TYPE="GRAMMAR" SRC="value" [TIME0UT="value " ] [NEXT="value"
[NEXTMETHOD="value"] ] > RESPONSE elements </INPUT>) is used to specify an input grammar when interpreting the user's responses. The GRAMMAR input includes a SCR attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the SCR attribute can be a grammar address (i.e., a URL), and the value of the NEXT attribute can be a next step address (i.e., a URL). The value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The following example illustrates the use of the GRAMMAR input in a markup language document. <STEP NAME="init">
<PROMPT> Say the month and year in which the credit card expires. </PROMPT> <INPUT TYP ="GRAMMAR" SRC="gram: // .SomeGrammar/month/year"
NEXT="#stepNineteen" /> </STEP>
The above example illustrates the use of the GRAMMAR input to generate a predetermined grammar corresponding to a month and year from the user, store the interpreted values in variables named "month" and
"year", and then go to the step named "stepNineteen".
The HIDDEN input of the INPUT attribute of the markup language (i.e., <INPUT TYPE="HIDDEN" NAME="value"
VALUE="value "/>) is used to store a value in a variable.
The HIDDEN input includes a NAME attribute and a VALUE attribute. The value of the NAME attribute can be an identifier, and the value of the VALUE attribute can be a literal value.
The following example illustrates the use of the
HIDDEN input in a markup language document.
<STEP NAME="init">
<PROMPT> Login sequence complete. Are you ready to place your order?
</PROMPT> <INPUT TYPE=" hidden" NAME="firstname"
VALUE="Bill"/> <INPUT TYPE=" hidden" NAME="lastname" VALUE="Clinton"/>
<INPUT TYPE="hidden" NAME=" favorite"
VALUE=" fries "/> <INPUT TYPE="optionlist">
<OPTION NEXT="#order"> yes </OPTION> <OPTION NEXT="#wait"> not yet </OPTION>
</INPUT> </STEP>
In the example shown above, the HIDDEN input is used to create variables and assign values to those variables. In this example, the user has completed the login sequence and certain information is stored in variables as soon as the user's identity has been established. This information could then be used later in the application without requiring another access into the database . The MONEY input of the INPUT attribute of the markup language (i.e., <INPUT TYPE="MONEY" NAME="value" NEXT="value" [NEXTMETHOD="value " ] [TIMEOUT="value " ] />) is used to collect monetary amounts from the user. The MONEY input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL). The value of the NEXTMEHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The MONEY input makes use of an input grammar to interpret the user's response and store that response in a standard format. The input grammar is able to interpret various ways to express monetary amounts. The data is preferably stored in integer format, in terms of cents. "Five cents" is stored as "5", "five dollars" is stored as "500", and "a thousand" is stored as "100000". In the case where the units are ambiguous, the grammar assumes dollars, in which "a thousand" is stored as if the user had said "a thousand dollars".
The following example illustrates the use of the
MONEY input in a markup language document.
<STEP NAME="init"> <PROMPT> How much would you like to deposit?
</PROMPT> <INPUT TYPE="money" NAME="dep" NEXT="#deposit"/> </STEP> The example shown above, the MONEY input is used to collect the amount of money that the user would like to deposit in his account, store that amount in a variable named "dep", and then go to the STEP named "deposit". The NONE input of the INPUT attribute of the markup language (i.e., <INPUT TYPE="NONE" NEXT="value" [NEXTMETHOD="value" ] />) is used to specify the next location for the voice browser to go to continue execution when no response is collected from the user. The NONE input includes a NEXT attribute and a
NEXTMETHOD attribute. The value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post. The following example illustrates the use of the NONE input in a markup language.
<STEP NAME="init">
<PROMPT> Welcome to the system. </PROMPT>
<INPUT TYPE="none" NEXT="#mainmenu"/> </STEP>
In the example shown above, the NONE input is used to jump to another STEP element in this dialog without waiting for any user response. In this example, the user would hear the phrase "Welcome to the system" followed immediately by the prompt of the main menu.
The NUMBER input of INPUT attribute of the markup language (i.e., <INPUT TYPE="NUMBER" NAME="value" NEXT="value" [NEXTMETHOD="value" ] [TIMEOUT="value" ] /> ) is used to collect numbers from the user. The NUMBER input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL). The value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The following example illustrates the use of the
NUMBER input in a markup language document or page.
<STEP NAME="init">
<PROMPT> Please say your age now. </PROMPT> <INPUT TYPE="number" NAME="age" NEXT="#doit"/> </STEP>
In the example shown above, the NUMBER input is used to collect numbers from the user, store the number in a variable named "age", and then go to the STEP element named "doit". If the user were to say, "eighteen", in response to the PROMPT element, the value "18" would be stored in the variable "age". The NUMBER input will collect numbers like 20 (i.e. twenty), but only one number per input. To collect a series of digits like "four five six" (i.e. "456"), the DIGITS input can be used as described above.
The OPTIONLIST input of INPUT attribute of the markup language (i.e., <INPUT TYPE="OPTIONLIST" [NAME="value" ] [TIMEOUT="value" ] [NEXT="value" [NEXTMETHOD="value" ] ] > OPTION elements </INPUT>) is used to specify a list of options from which the user can select. The OPTIONLIST input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step URL. The value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The OPTIONLIST input is used in conjunction with the OPTION element, which defines the specific user responses and the behavior associated with each OPTION element. The following example illustrates the use of the OPTIONLIST element in a markup language document.
<STEP NAME="init">
<PROMPT> What would you like to drink? </PROMPT>
<INPUT TYPE="optionlist">
<OPTION NEXT="#coke"> coke </OPTION> <OPTION NEXT="#coke"> coca-cola </OPTION> <OPTION NEXT="#pepsi"> pepsi </OPTION> <OPTION NEXT="#rc"> r c </OPTION
</INPUT> </STEP>
In the example shown above, the voice browser will go to a different STEP element or state depending on which cola the user selects. If the user said "coke" or "coca-cola", the voice browser would go to the STEP element named "coke".
The PHONE input of INPUT attribute of the markup language (i.e., <INPUT TYPE="PHONE" NAME="value"
NEXT="value" [NEXTMETHOD="value " ] [TIMEOUT="value" ] /> ) is used to collect telephone numbers from the user. The PHONE input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL). The value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The PHONE input makes use of an input grammar to interpret the user's response and store that response in a standard format. The phone number is interpreted as a string of digits and stored in a variable. If a user said "One, eight zero zero, seven five nine, eight eight eight eight", the response would be stored as "18007598888". The following is an example of the use of the PHONE input in a markup language document.
<STEP NAME="phone ">
<PROMPT> What is your phone number? </PROMPT> <INPUT TYPE="phone" NAME="ph" NEXT="#fax" />
</STEP>
In this example shown above, the PHONE input is used to collect a telephone number from the user, store the number in the variable named "ph", and go to the STEP named "fax".
The PROFILE input of INPUT attribute of the markup language (i.e., <INPUT TYPE="PROFILE" NAME="value" PROFNAME="value" [ SUBTYPE="value " ] />) is used to collect the user's profile information (i.e, first name, last name, mailing address, email address, and notification address). The user profile information is stored in the database 244 of the system.
The PROFILE input includes a NAME attribute, a PROFNAME attribute, and a SUBTYPE attribute. The value of the NAME attribute can be an identifier, the value of the PROFNAME attribute can be a profile element name (string), and the value of the SUBTYPE attribute can be profile element subtype (string). The following example illustrates the use of the
PROFILE input in a markup language document.
<STEP NAME="getinfo">
<INPUT TYPE="profile" NAME="firstna e" PROFNAME= "N" SUBTYPE= " first " /> <PROMPT> Hello, <VALUE NAME="firstname" /> .
Please say your pin. </PROMPT> <INPUT TYPE=" digits" NAME="pin" NEXT="#verify"/> </STEP>
In the example above, the PROFILE input is used to retrieve the user ' s first name and store the string in a variable named "firstname". The string containing the name is then inserted into the PROMPT element using a VALUE element as further described below. When using the PROFILE input, more than one INPUT element can be included in the same STEP element because the PROFILE input is not an interactive INPUT element. Each STEP element contains only one INPUT element that accepts a response from the user.
The following table lists the valid combinations of profile names and their associated subtypes
Profile Name Subtype Description
ADR POSTAL postal address
PARCEL parcel address
HOME home address
WORK work address
DOM domestic address
INTL international address BDAY none birthday
EMAIL none primary email address
NOTIFICATION notification email address FN none formatted name
GEO none geographic location
( longitude ; lattitude )
KEY none public encryption key LABEL none mailing label
MAILER none email program used N FIRST first name
LAST last name
MIDDLE middle name
PREFIX prefix (e.g. Mr.,
Mrs . , Dr . ) SUFFIX suffix (e.g. Jr.
D.D.S, M.D,
ORG none organization
ROLE none job role or position
TEL HOME home telephone number
WORK work telephone number
MSG voicemail telephone number
VOICE voice call telephone number
FAX fax call telephone number
CELL cellular telephone number
PREF preferred telephone number
TITLE none job title
TZ none time zone
UID none globally unique id
URL none URL of home page
VERSION none version of Vcard
The notification address shown above can be used to send a user urgent or timely information (i.e., sending information to a pager). The format of the notification address is preferably of an email address provided by the user when his or her subscription is activated. The user's notification address would be stored a variable named "n_addr". The application could then use this email address to send a message to the user. To retrieve the notification address from the voice browser, the PROFILE input can be used in a markup language document in the following manner:
<INPUT TYPE="profile" NAME="n_addr"
PROFNAME="email" SUBTYPE="notification"/> The RECORD input of the INPUT attribute of the markup language (i.e., <INPUT TYPE="RECORD" TIMEOUT="value " STORAGE="value " [ FORMAT= "value " ] [NAME="value"] NEXT="value" [ NEXTMETHOD="value " ] />) is used to record an audio sample and to store that audio sample in a specified location. The RECORD input includes a TIMEOUT attribute, a FORMAT attribute, a NAME attribute, a STORAGE attribute, a NEXT attribute, and a NEXTMETHOD attribute. The value of the TIMEOUT attribute can be the maximum record time represented in milliseconds, the value of the FORMAT attribute can be a recorded audio format (audio/wav), the value of the NAME attribute can be an identifier, the value of the STORAGE attribute can be a file and a request, the value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get, post and put.
The following two examples illustrate the RECORD input in a markup language document. <STEP NAME="init">
<PROMPT> Please say your first and last name.
</PROMPT> <INPUT TYPE=" record" TIMEOUT="7000" NAME= " theName " STORAGE= "REQUEST" NEXT=" http : //wavhost/acceptwav. asp"
NEXTMETHOD="POST" /> </STEP>
In the example shown above, the RECORD input is used to record a seven second audio sample, and then
"POST" that sample to the remote machine named
"wavhost". The response to the "POST" has to be a dialog which continues the execution of the application.
<STEP NAME="init"> <PROMPT> Please say your first and last name.
</PROMPT> <INPUT TYPE="record" TIMEOUT="7000 " NAME="theName" STORAGE="FILE" NEXT="#reccomplete " NEXTMETHOD="GET" /> </STEP>
In the example shown above, the RECORD input is used to record another seven second audio sample. However, the sample is stored in a file, instead of sent in the HTTP request as it was in the previous example. The name of the file is chosen by the voice browser automatically and is stored in a variable named "theName". After storing the audio sample in the file, the voice browser will continue execution at the URL specified by the NEXT attribute. In contrast to the previous example, the value of the variable "theName" will be the name of the audio file. In the earlier example (where the audio sample was transmitted via the HTTP request), the value of the variable "theName" would be null.
The TIME input type of the INPUT attriute of the markup language (i.e., <INPUT TYPE="TIME" NAME="value" NEXT="value" [NEXTMETHOD="value" ] [ TIMEOUT="value " ] />) is used to collect a time of day from the user. The
TIME input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL). The value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The TIME input makes use of an input grammar to interpret the user's response and to store that response in a standard format. This grammar will interpret responses of various forms, including both 12-hour and 24-hour conventions. "Four oh three PM" becomes "403P". Note that "P" is appended to the time. Likewise, "Ten fifteen in the morning" becomes "1015A". "Noon" is stored as "1200P", and "Midnight" is stored as "1200A".
Military time, such as, "Thirteen hundred hours" becomes
"100P". If the user does not specify the morning or evening, no indication is stored in the variable (i.e., "Four o'clock" is stored as "400").
The following example illustrates the TIME input in a markup language document.
<STEP NAME="init">
<PROMPT> What time would you like your wakeup call? </PROMPT>
<INPUT TYPE="time" NAME="wakeup" NEXT="#record"/> </STEP>
In the example shown above, the TIME input is used to collect a time of day from the user, store that data in the variable named "wakeup", and then go to the STEP element named "record".
The YORN input of the INPUT attribute of the markup language (i.e, <INPUT TYPE="YORN" NAME="value"
[TIMEOUT="value" ] NEXT="value" [NEXTMETHOD="value" ] /> , or <INPUT TYPE="YORN" [NAME="value" ] [ TIMEOUT="value " ] [NEXT="value" [NEXTMETHOD="value"] ] > CASE elements </INPUT>) is used to collect "yes" or "no" responses from the user. The YORN input includes a NAME attribute, a NEXT attribute, a NEXTMETHOD attribute, and a TIMEOUT attribute. The value of the NAME attribute can be an identifier, and the value of the NEXT attribute can be a next step address (i.e., a URL). The value of the NEXTMETHOD attribute can be a get and a post, and the value of the TIMEOUT attribute can be a number represented in milliseconds.
The YORN input maps a variety of affirmative and negative responses to the values "Y" and "N" . The YORN input stores the value "Y" for affirmative responses and the value "N" for negative responses . Affirmative and negative responses are determined using an input grammar that maps various user responses to the appropriate result.
The following example illustrates the user of the YORN input in a markup language document.
<STEP NAME="ask">
<PROMPT> Fire the missies now? </PROMPT> <INPUT TYPE="YORN" NAME="fire" NEXT="#confirm"/> </STEP>
In the example shown above, the YORN input is used to collect a "yes" or "no" response from the user, store that response into a variable named "fire", and then go to the STEP named "confirm".
The OPTION element of the markup language (i.e. <OPTION [NEXT="value" [NEXTMETHOD="value" ] ] [VALUE="value"] > text </OPTION>) is used to define the type of response expected from the user in a STEP element or state. The OPTION input includes a VALUE attribute, a NEXT attribute, and a NEXTMETHOD attribute. The value of the VALUE attribute can be a literal value, the value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post. The OPTION element can exist within the INPUT element, and then only when using the OPTIONLIST input.
The following two examples illustrate the use of the OPTION element in a markup language document. <INPUT NAME=" choice" TYPE="optionlist">
<OPTION NEXT="#doit" VALUE="1"> one </OPTION> <OPTION NEXT="#doit" VALUE="2"> two </OPTION> </INPUT>
The example shown above illustrates the use of the OPTION element within the INPUT element. In this example, the first OPTION element would be executed when the user responded with "one", and the second OPTION would be executed when the user responded with "two".
If the user said "one", the value of the variable named
"choice" would be "1", because of the use of the VALUE attribute. Because the NEXT attributes for both of the
OPTION element in this OPTIONLIST element are the same, the voice browser would proceed to the STEP element named "doit" when either "one" or "two" was recognized.
<INPUT TUPE="optionlist"> <OPTION
NEXT= " http : //localhost/vml/weather . asp"> weather </OPTION> <OPTION NEXT="http : / /localhost/v l/news . asp"> news </OPTION> <OPTION
NEXT="http://localhost/vml/traffic.asp"> traffic </OPTION> </INPUT>
The example shown above illustrates the use of the OPTION element to select one of three applications. Note that the URLs used in the NEXT attributes are full HTTP URLs, and that unlike the previous example, each OPTION element has a unique NEXT attribute. The OPTIONS element of the markup language (i.e., <OPTIONS/>) describes the type of input expected within a given STEP element. The OPTIONS element can be used in HELP elements to present the user with a complete list of valid responses. The OPTIONS element can be used anywhere that text is read to the user. The OPTIONS element can be contained by a PROMPT, EMP, PROS, HELP,
ERROR, or ACK element.
The following example illustrates the use of the
OPTIONS element in a markup language document. <CLAΞS NAME="helpful ">
<HELP> Your choices are: <OPTIONS/> </HELP> </CLASS> The example shown above illustrates how the OPTIONS element can be used to construct a "helpful" class. Any STEP elements that directly or indirectly name "helpful" as a PARENT element respond to a helpful request (i.e., "help") by speaking the message, in which the OPTIONS element expands to a description of what can be said by the user at this point in the dialog.
The ACK element of the markup language (i.e., <ACK [CONFIRM="value" ] [BACKGROUND="value" ] [ EPROMPT="value " ] > text </ACK>) is used to acknowledge the transition between Step elements, usually as a result of a user response. The ACK element includes a CONFIRM attribute, a BACKGROUND attribute, and a REPROMPT attribute. The value of the BACKGROUND and REPROMPT attributes can be a "Y" and "N", and the
CONFIRM attribute can be a YORN element as described above. The ACK element can be contained within a STEP element or a CLASS element as further described below. The following is an example of a markup language document containing the Ack element.
<STEP NAME="card_type"> <PROMPT>
What type of credit card do you have? </PROMPT> <INPUT NAME="type" TYPE="optionlist ">
<OPTION NEXT="#exp"> visa </OPTION> <OPTION NEXT="#exp"> mastercard </OPTION>
<OPTION NEXT="#exp"> discover </OPTION> </INPUT>
<ACK CONFIRM="YORN" REPROMPT="Y">
I thought you said <VALUE NAME="type"/> <BREAK/> Is that correct? </ACK> </STEP>
In the example above, the ACK element is used to confirm the user's choice of credit card. When this element is interpreted by the voice browser, the PROMPT element is read to the user using text-to-speech unit 252. The system waits until the user responds with "visa", "mastercard", or "discover" and then asks the user to confirm that the type of card was recognized correctly. If the user answers "yes" to the ACK element, the voice browser will proceed to the STEP element named "exp". If the user answers "no" to the ACK element, the text of the PROMPT element will be read again, and the user will be allowed to make his or her choice again. The voice browser then re-enters or executes the STEP element again.
The AUDIO element of the markup language (i.e., <AUDIO SRC="value" />) specifies an audio file that should be played. The AUDIO element includes a SRC attribute. The value of the SRC attribute can be an audio file URL. The AUDIO element can be contained within a PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element.
The following markup language contains the AUDIO element.
<PROMPT>
At the tone, the time will be 11:59 p m <AUDIO SRC="http: //localhost/sounds/beep.wav" /> </PROMPT>
In the example above, the AUDIO element is included in a PROMPT element. When interpreted by the voice browser, a prompt (i.e., "At the tone, the time will be 11:59 pm. " ) will be played to the user, and the WAV file "beep.wav" will be played to the user as specified by the AUDIO element.
The BREAK element of the markup language (i.e., <BREAK [MSECS="value" | SIZE="value" ] />) is used to insert a pause into content or information to be played to the user. The BREAK element includes a MSEC attribute and a SIZE attribute. The value of the MSEC attribute can include a number represented in milliseconds, and the value of the SIZE attribute can be none, small, medium, and large. The BREAK element can be used when text or audio sample is to be played to the user. The BREAK element can be contained within a PROMPT, EMP, PROS, HELP,
ERROR, CANCEL, or ACK element. The following markup language contains the BREAK element. <PROMPT>
Welcome to Earth. <BREAK MSECS="250"/> How may I help you? </PROMPT>
In the example above, the BREAK element is used with a MSECS attribute, inside a PROMPT element. When interpreted by the voice browser, a prompt (i.e, "Welcome to Earth.") is read to the user. The system will then pause for 250 milliseconds, and play "How may I help you?".
Alternatively, the SIZE attribute (i.e., "small",
"medium", and "large" ) of the BREAK element can be used to control the duration of the pause instead of specifying the number of milliseconds as shown below. <PROMPT>
Welcome to Earth. <BREAK SIZE= "medium" /> How may I help you? </PROMPT>
The OR element of the markup language (i.e., <OR/>) is used to define alternate recognition results in an OPTION element. The OR element is interpreted as a logical OR, and is used to associate multiple recognition results with a single NEXT attribute. The following example illustrates the use of the OR element in a markup language document. <INPUT TYPE="optionlist"> <OPTION NEXT="#coke_chosen"> coke <OR/> coca-cola </OPTION>
<OPTION NEXT="#pepsi_chosen"> pepsi </OPTION> </INPUT>
The example shown above illustrates the use of the OR element within an OPTION element. As shown above, the user may respond with either "coke" or "coca-cola", and the voice browser will proceed to the STEP named "coke_chosen".
The CANCEL element of the markup language (i.e., <CANCEL NEXT="value" [NEXTMETHOD="value " ] /> or <CANCEL NEXT="value" [NEXTMETHOD="value"] > text </CANCEL>) is used to define the behavior of the application in response to a user's request to cancel the current PROMPT element. The CANCEL element includes a NEXT attribute and a NEXTMETHOD attribute. The value the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
The CANCEL element can be invoked through a variety of phrases. For example, the user may say only the word "cancel", or the user may say "I would like to cancel, please." The CANCEL element can be contained within a STEP element or a CLASS element. When the voice browser detects "cancel" from the user, the voice browser responds based upon the use of the CANCEL element in markup language document. If no CANCEL element is associated with a given STEP element, the current prompt will be interrupted (if it is playing) and will stay in the same application state and then process any interactive inputs.
The following example illustrates a markup language containing the CANCEL element.
<STEP NAME="report">
<CANCEL NEXT="#traffic menu"/> <PROMPT> Traffic conditions for Chicago, Illinois,
Monday, May 18. Heavy congestion on ... </PROMPT> INPUT TYPE="optionlist">
<OPTION NEXT="#report"> repeat </OPTION> <OPTION NEXT="#choose"> new city </OPTION>
</INPUT> </STEP>
The example above illustrates the use of the CANCEL element to specify that when the user says "cancel", the voice browser proceeds to the STEP element named "traffic nenu", instead of the default behavior, which would be to stop the PROMPT element from playing and wait for a user response. The user can also interrupt the PROMPT element by speaking a valid OPTION element. In this example, the user could interrupt the PROMPT element and get the traffic conditions for a different city by saying "new city".
The CASE element of the markup language (i.e., <CASE VALUE="value" NEXT="value" [NEXTMETHOD="value" ] /> ) is used to define the flow of control of the application, based on the values of internal markup language variables. The CASE input includes a VALUE attribute, a NEXT attribute, and a NEXTMETHOD attribute. The value of the VALUE attribute can be a literal value, the value of the NEXT attribute can be a next step address (i.e. a URL), and the value of the NEXTMETHOD attribute can be a get and a post. The CASE element can be contained by a SWITCH element or an INPUT element, when using an input type of the INPUT element that collects a single value (i.e., DATE, DIGITS, MONEY, PHONE, TIME, YORN).
The following example illustrates a markup language containing a CASE element.
<SWITCH FILED = "pizza"> <CASE VALUE="pepperoni" NEXT="#p_pizza" /> <CASE VALUE=" sausage" NEXT="#s_pizza"/> <CASE VALUE="veggie" NEXT="#v_pizza"/> </SWITCH>
In the example above, the markup language shows the use of the CASE element within the SWITCH element. In this example, the CASE elements are used to direct the voice browser to different URLs based on the value of the markup language variable "pizza".
The CLASS element of the markup language (i.e., <CLASS NAME="value" [PARENT="value " ] [BARGEIN="value " ] [ COST="value " ] > text </CLASS>) is used to define a set of elements that are to be reused within the content of a dialog. For example, application developers can define a set of elements once, and then use them several times. The CLASS input includes a NAME attribute, a PARENT attribute, a BARGEIN attribute, and a COST attribute. The value of the NAME and the PARENT attribute can be an identifier. The value of the
BARGEIN attribute can be "Y"and "N", and the value of the COST attribute can be an integer number.
The CLASS element can be used to define the default behavior of an ERROR element, a HELP element, and a CANCEL element, within a given DIALOG element. The CLASS element can be contained by a DIALOG element. The following example shows a markup language document containing the CLASS element.
<CLASS NAME="simple"> <HELP> Your choices are <OPTIONS/> </HELP>
<ERROR> I did not understand what you said. Valid responses are <OPTIONS/> </ERR0R> </CLASS> <STEP NAME="beverage" PARENT="simple">
<PR0MPT> Please choose a drink. </PROMPT> <INPUT NAME="drink" TYPE="optionlist">
<OPTION NEXT="#food"> coke </OPTION> <OPTION NEXT="#food"> pepsi </OPTION> </INPUT> </STEP>
<STEP NAME="food" PARENT="simple"> <PROMPT> Please choose a meal. </PROMPT>
<INPUT NAME="meal" TYPE="optionlist"> <OPTION NEXT="#deliver"> pizza </OPTION> <OPTION NEXT="#deliver"> tacos </OPTION> </INPUT> </STEP>
In the example above , the markup language document illustrates the use of the CLASS element to define a HELP element and an ERROR element that will be used in several steps within this DIALOG element. The markup language also illustrates the use of the PARENT attribute in the STEP element to refer to the CLASS element, and therefore inherit the behaviors defined within it. When interpreted by the voice browser, the STEP element will behave as if the HELP and ERROR elements that are defined in the CLASS element were defined explicitly in the steps themselves
The EMP element of the markup language (i.e., <EMP [LEVEL="value " ] > text </EMP>) is used to identify content within text that will be read to the user where emphasis is to be applied. The EMP element includes a LEVEL attribute. The value of the LEVEL element can be none, reduced, moderate, and strong. The EMP element can be contained within a PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element. The following example of a markup language document contains the EMP element.
<PROMPT>
This example is
<EMP LEVEL=" strong "> really </EMP> simple.
</PROMPT>
In the above example, the EMP element is used to apply "strong" emphasis to the word "really" in the PROMPT element. The actual effect on the speech output is determined by the text-to-speech (TTS) software of the system. To achieve a specific emphatic effect, the PROS element, as further described below, can be used instead of the EMP element. The ERROR element of the markup language (i.e., <ERROR [TYPE="value"] [ORDINAL-"value" ]
[REPROMPT="value"] [NEXT="value" [NEXTMETHOD="value " ] ] > text </ERROR>) is used to define the behavior of the application in response to an error. The ERROR element includes a TYPE attribute, an ORDINAL attribute, a
REPROMPT attribute, a NEXT attribute, and a NEXTMETHOD attribute. The value of the TYPE attribute can be all, no atch, nospeech, toolittle, too uch, noauth, and badnext. The value of the ORDINAL attribute can be an integer number, the value of the REPROMPT attribute can be "Y" or "N", the value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
If the application developer does not define the behavior of an ERROR element for a given STEP element, the default behavior will be used. The default behavior for the ERROR element is to play the phrase "An error has occurred.", remain in the current STEP element, replay the PROMPT element, and wait for the user to respond. The ERROR element can be contained within a
STEP or a CLASS element.
The following example illustrates the use of the
ERROR element in a markup language document.
1 <STEP NAME="errors "> 2 <ERROR TYPE="nomatch"> First error message.
3 I did not understand what you said. </HELP>
4 <ERROR TYPE="nomatch" ORDINAL="2">
5 Second error message. 6 I did not understand what you said. </HELP>
7 <PROMPT> This step tests error messages. 8 Say Oops' twice. Then say 'done' to
9 choose another test. </PROMPT>
10 <INPUT TYPE="OPTIONLIST">
11 <OPTION NEXT="#end"> done </OPTION> 12 </INPUT>
13 </STEP>
In the example above, the ERROR element is used to define the application's behavior in response to an error. On line 2, the error message is defined to be used the first time an error of type "nomatch" occurs in this STEP element. On line 4, the error message is to be used the second and all subsequent times an error of type "nomatch" occurs in this STEP. The ORDINAL attribute of the ERROR element of the markup language determines which message will be used in the case of repeated errors within the same STEP element. The voice browser can choose an error message based on the following algorithm. If the error has occurred three times, the voice browser will look for an ERROR element with an ORDINAL attribute of "3". If no such ERROR element has been defined, the voice browser will look for an ERROR element with an ORDINAL attribute of "2", and then "1", and then an ERROR element with no ORDINAL attribute defined. Thus, if the ERROR element is defined with the ORDINAL attribute of "6" in the STEP element shown above, and the same error occurred six times in a row, the user would hear the first error message one time, then the second error message four times, and finally the error message with ORDINAL attribute of "6".
The HELP element of the markup language (i.e.,<HELP [ ORDINAL="value " ] [REPROMPT="value " ] [NEXT="value " [NEXTMETHOD="value"] ] > text </HELP>) is used to define the behavior of the application when the user asks for help. The HELP element includes an ORDINAL attribute, a REPROMPT attribute, a NEXT attribute, and a NEXTMETHOD attribute. The value of the ORDINAL attribute can be an integer number, and the value of the REPROMPT attribute can be a "Y" and "N" . The value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
The HELP element, like CANCEL the element, can be detected through a variety of phrases. The user may say only the word "help", or the user may say "I would like help, please." In either case, the HELP element will be interpreted. The HELP element can be contained within a STEP element or a CLASS element.
When the voice browser detects "help" from the user, the voice browser responds based upon the use of the HELP element in markup language document. If no HELP element is associated with a given STEP, the current prompt will be interrupted (if it is playing), the user will hear "No help is available.", and will stay in the same application state and process any interactive inputs.
The following example illustrates the use of the HELP element in a markup language document.
1 <STEP NAME="helps ">
2 <HELP REPROMPT="Y"> First help message. 3 You should hear the prompt again. </HELP>
4 <HELP ORDINAL="2"> Second help message.
5 You should not hear the prompt now. </HELP>
6 <PROMPT> This step tests help prompts. 7 Say 'help' twice. Then say 'done' to
8 choose another test. </PROMPT>
9 <INPUT TYPE="OPTIONLIST">
10 <OPTION NEXT="#end"> done </OPTION>
11 </INPUT> 12 </STEP>
In the example above, the HELP element is used to define the application's behavior in response to the user input "help". On line 2, the help message is defined to be used the first time the user says "help". On line 4, the help message is defined to be used the second and all subsequent times the user says "help". It should also be noted that through the use of the REPROMPT attribute, the prompt will be repeated after the first help message, but it will not be repeated after the second help message.
The ORDINAL attribute of the HELP element of the markup language determines which message will be used in the case of repeated utterances of "help" within the same STEP element. The voice browser will choose a help message based on the following algorithm. If the user has said "help" three times, the voice browser will look for a HELP element with an ORDINAL attribute of "3". If no such HELP element has been defined, the voice browser will look for a HELP element with an ORDINAL attribute of "2", and then "1", and then a HELP element with no ORDINAL attribute defined. Thus, if a HELP element is defined with ORDINAL attribute of "6" in the STEP element shown above, and the user said "help" six times in a row, the user would hear the first help message one time, then the second help message four times, and finally the help message with ORDINAL attribute of "6". The PROS element of the markup language (i.e., <PROS [RATE="value"] [VOL="value" ] [PITCH="value" ] [RANGE="value " ] > text </PROS>) is used to control the prosody of the content presented to the user via PROMPT, HELP, ERROR, CANCEL, and ACK elements. Prosody affects certain qualities of the text-to-speech presentation, including rate of speech, pitch, range, and volume. The PROS element includes a RATE attribute, a VOL attribute, a PITCH attribute, and a RANGE attribute. The value of the RATE attribute can be an integer number representing words per minute, and the value of the VOL attribute can be an integer number representing volume of speech. The value of the PITCH attribute can be an integer number representing pitch in hertz, and the value of the RANGE attribute can be an integer number representing range in hertz . The PROS element can be contained within a
PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element.
The following example illustrates the use of the pros element. <PROMPT> Let me tell you a secret:
<PROS VOL="0.5"> I ate the apple. </PROS> </PROMPT>
In the example shown above, the phrase "I ate the apple" is spoken with one half of the normal volume. The RENAME element of the markup language (i.e., <RENAME RECNAME="value" VARNAME="value" /> ) is used to rename recognition slots in grammars, such that the resulting variable name can be different from the name of the recognition slot defined in the grammar. The rename element includes a VARNAME attribute and a RECNAME attribute. The value of the VARNAME and the RECNAME attributes can be identifiers. The RENAME element can exist only within the INPUT element, and then only when using the GRAMMAR input type.
The following example illustrates the use of the RENAME element in a markup language document.
<INPUT TYPE="GRAMMAR"
SRC="http: //www. foo.com/mygram.grm"
NEXT="http: //www. fancyquotes .com/vmlstocks . asp"> <RENAME VARNAME="sym" RECNAME=" symbol "> <RENAME VARNAME="detail" RECNAME="quotetype"> </INPUT>
In the example shown above, the RENAME element is used to account for differences in the variable names collected from a grammar and those expected by another script. In particular, a grammar from foo.com is used to provide input to an application hosted by fancyquotes.com. Because, in this example, the grammar and script have been developed independently, the RENAME element is used to help connect the grammar and the stock-quoting application.
The RESPONSE element of the markup language (i.e., <RESP0NSE FIELDS= "value" [NEXT="value" [NEXTMETHOD="value"] ] />or <RESP0NSE FIELDS="value" [NEXT="value" [NEXTMETH0D="value " ] ] > SWITCH elements </RESP0NSE>) is used to define the behavior of an application in response to different combinations of recognition slots. The response element includes a FIELDS attribute, a NEXT attribute, and a NEXTMETHOD attribute. The value of the FIELDS attribute can be a list of identifiers, the value of the NEXT attribute can be a next step address (i.e., a URL), and the value of the NEXTMETHOD attribute can be a get and a post.
The RESPONSE element enables application developers to define a different NEXT attribute depending on which of the grammar's slots were filled. The RESPONSE element can exist within an INPUT element, and then only when using an input type of grammar.
The following example illustrates the RESPONSE element in a markup language document. <INPUT TYPE="GRAMMAR"
SRC="gram: // .Banking/action/amt/fromacct/toacct " NEXT= "#notenoughfieIds "> <RESP0NSE FIELDS= " action, amt, fromacct , toacct "
NEXT="#doit"/> <RESP0NSE FIELDS= " action, amt , fromacct"
NEXT="#asktoacct"/> <RESP0NSE FIELDS=" action , amt , toacct " NEXT="#askfromacct"/>
<RESP0NSE FIELDS=" action, amt" NEXT="#askaccts " />
<RESP0NSE FIELDS=" action" NEXT="#askamtaccts"/> </INPUT>
The example shown above illustrates the use of the RESPONSE element where the user specifies less than all the possible variables available in the grammar. Using the RESPONSE element, the application can arrange to collect the information not already filled in by prior steps. In particular, this example transfers to the "askaccts" STEP element if neither the source nor destination account is specified (i.e., the user said "transfer 500 dollars"), but it transfers to the "askfromacct" STEP element if the user said what account to transfer to, but did not specify a source account (i.e., if the user had said "transfer 100 dollars to savings"). The next URL of the INPUT element is used when the user's response does not match any of the defined responses.
The SWITCH element of the markup language (i.e., <SWITCH FIELD="value"> vml </SWITCH>) is used to define the application behavior dependant on the value of a specified recognition slot. The switch element includes a FIELD attribute. The value of the FIELD attribute can be an identifier. The SWITCH element is used in conjunction with the CASE element. The SWITCH element can exist within the INPUT element, and then only when using the grammar input type.
The following example illustrates the use of the
SWITCH element in a markup language document.
<INPUT TYPE="GRAMMAR"
SRC="gram: // .Banking/action/amount/fromacct/toacct"> <SWITCH FIELD=" action">
<CASE VALUE="transfer" NEXT="#transfer" /> <CASE VALUE="balance" NEXT="#balance" /> <CASE VALUE=" activity">
<SWITCH FIELD="fromacct ">
<CASE VALUE="checking" NEXT="#chxact " /> <CASE VALUE="savings" NEXT="#savact " /> </SWITCH> </CASE> </SWITCH> </INPUT>
In the example shown above, the SWITCH element is used to determine the next STEP element to execute in response to a banking request. In this example, the grammar may fill in some or all of the variables (i.e., "action", "amount", " fromacct", and "toacct"). If the user asks for a transfer or balance action, the next STEP element to execute is the transfer or balance step. If the user asks for a report of account activity, a second SWITCH element determines the next STEP element based on the account type for which a report is being requested (assumed to be available in the "fromacct" variable) .
The VALUE element of the markup language (i.e., <VALUE NAME="value" />) is used to present the value of a variable to the user via the text-to-speech unit. The VALUE element includes a FIELD attribute. The value of the FIELD attribute can be an identifier. The VALUE element can be used anywhere that text is read to the user. The VALUE element can be contained by a PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element.
The following example illustrates the use of the value element in a markup language document.
<STEP NAME="thanks ">
<PROMPT> Thanks for your responses . I'll record that <VALUE NAME=" first "/> is your favorite and that <VALUE NAME=" second"/> is your second choice. </PROMPT>
<INPUT TYPE="NONE" NEXT=" /recordresults . asp" />
</STEP> The example shown above illustrates the use of the VALUE element to read the user's selections back to the user. As shown above, the value of the variable named "first" would be inserted into the PROMPT element, and the value of the variable named "second" would be inserted into the PROMPT element.
The COST attribute of the STEP element of the markup language enables is used to charge a user for various services. The COST attribute can be used in the definition of one of more STEP or CLASS elements. The value of the COST attribute is the integer number of credits the user is to be charged for viewing the content. For example, to charge 10 credits for listening to a particular step element a provider might write the following markup language:
<STEP NAME="premiumContent" COST="10"> ... premium content goes here ... </STEP> If a content provider wishes to maintain a record of subscriber charges, the content provider need only request identifying data for the user using the PROFILE input type as in:
<INPUT TYPE="PROFILE" PROFNAME="UID" NAME="subID"/>
Using the resulting value and examining the SUB_CHARGE query-string parameter at each page request, the content provider can maintain records on a per- subscriber basis.
The following text describes a weather application 500 that can be executed by the system 200 of FIG. 3. FIG. 8 shows an exemplary state diagram of the weather application containing states that prompt the user for input in order to access the weather database. After speaking the current or forecast weather information, the application expects the user to say a city name or the word "exit" to return to the main welcome prompt. The user can select to hear the forecast after the current weather conditions prompt. It will be recognized that the application could be designed to address errors, help and cancel requests properly.
The markup language set forth below is a static version of the weather application. The initial state or welcome prompt is within the first step, init (lines 11-20). The user can respond with a choice of "weather", "market", "news" or "exit". Once the application detects the user's response of "weather", the next step, weather (lines 21-29), begins. The prompt queries the user for a city name. Valid choices are "London", "New York", and "Chicago". The steps called london_current , london_forecast , newyork_current, newyork_forecast , chicago_current, and chicago_forecast provide weather information prompts for each city. It is noted that Market and news steps are just placeholders in the example (lines 111 and 115). <?XML VERSION="1.0"?> <! — —>
<! — (c) 1998 Motorola Inc.
—>
<! — weather.vml
—> <! — —>
<DIALOG>
<CLASS NAME="help_top">
<HELP>You are at the top level menu. For weather information, say weather. </HELP> </CLASS> <STEP NAME="init" PARENT="help_top">
<PROMPT>Welcome to Genie. <BREAK SIZE=" large"/> How may I help you? </PROMPT> <INPUT TYPE="OPTIONLIST">
<OPTION NEXT="#weather">weather</OPTION> <OPTION NEXT="#market">market</OPTION> <OPTION NEXT="#news">news</OPTION> <OPTION NEXT="#bye">exit</OPTION> </INPUT> </STEP> <STEP NAME= "weather" PARENT="help_top"> <PROMPT>What city? </PROMPT> <INPUT TYPE="OPTIONLIST"> <OPTION NEXT="#london_current">london</OPTION> <OPTION NEXT="#newyork_current">new york</OPTION>
<OPTION NEXT="#chicago_current">chicago</OPTION>
<OPTION NEXT="#init">exit</OPTION> </INPUT>
</STEP> <CLASS NAME="help_generic">
<HELP>Your choices are <OPTIONS/>.</HELP> </CLASS> <STEP NAME= " london_current " PARENT="help_generic"> <PROMPT>It is currently 46 degrees in London, with rain.
<BREAK SIZE=" large "/>
To hear the 3 day forecast for London, say forecast, or say another city name, such as Chicago or New York.</PROMPT>
<INPUT TYPE="OPTIONLIST"> <OPTION NEXT="#london_forecast">forecast</OPTION>
<OPTION NEXT="#london_current ">london</OPTION>
<OPTION NEXT="#newyork_current">new york</OPTION> <OPTION
NEXT="#chicago_current">chicago</OPTION>
<OPTION NEXT="#init">exit</OPTION> </INPUT> </STEP> <STEP NAME="london_forecast" PARENT="help_generic"> <PROMPT>London forecast for Tuesday. Showers. High of 50. Low of 44. Wednesday. Partly cloudy. High of 39. Low of 35. <BREAK SIZE=" large "/>
Choose a city, or say exit to return to the main menu.</PROMPT>
<INPUT TYPE="OPTIONLIST"> <OPTION NEXT="#london current ">london</OPTION> <OPTION NEXT="#newyork_current">new york</OPTION>
<0PTI0N NEXT="#chicago_current">chicago</OPTION> <OPTION NEXT="#init">exit</OPTION>
</INPUT> </STEP>
<STEP NAME="chicago_current" PARENT="help_generic"> <PROMPT>It is currently 31 degrees in Chicago, with snow.
<BREAK SIZE=" large "/>
To hear the 3 day forecast for Chicago, say forecast, or say another city name, such as London or New York.</PROMPT>
<INPUT TYPE="OPTIONLIST"> <OPTION NEXT="#chicago_forecast">forecast</OPTION> <OPTION NEXT="#london_current">london</OPTION>
<OPTION NEXT="#newyork_current">new york</OPTION>
<OPTION NEXT="#chicago_current">chicago</OPTION> <OPTION NEXT="#init">exit</OPTION>
</INPUT> </STEP>
<STEP NAME="chicago_forecast" PARENT="help_generic "> <PROMPT>Chicago forecast for
Tuesday. Flurries. High of 27. Low of 22. Wednesday. Snow showers. High of 27. Low of 12.
<BREAK SIZE=" large "/> Choose a city, or say exit to return to the main menu.</PROMPT>
<INPUT TYPE="OPTIONLIST"> <OPTION NEXT="#london_current">london</OPTION> <OPTION NEXT="#newyork_current">new york</OPTION>
<OPTION NEXT="#chicago_current ">chicago</0PTI0N>
<OPTION NEXT="#init">exit</OPTION> </INPUT>
</STEP>
<STEP NAME="newyork_current" PARENT="help_generic"> <PROMPT>It is currently 39 degrees in New York City, with cloudy skies. <BREAK SIZE=" large " /> To hear the 3 day forecast for New York, say forecast, or say another city name, such as London or New York.</PROMPT> <INPUT TYPE="OPTIONLIST">
<OPTION NEXT="#newyork_forecast">forecast</OPTION>
<OPTION NEXT="#london_">london</OPTION> <OPTION NEXT="#newyork">new york</OPTION> <OPTION NEXT="#chicago">chicago</OPTION>
<OPTION NEXT="#init">exit</OPTION> </INPUT> </STEP>
<STEP NAME="newyork_forecast" PARENT="help_generic">
<PROMPT>New York City forecast for Tuesday. Windy. High of 48. Low of 43. Wednesday. Rain. High of 43. Low of 28. <BREAK ΞIZE=" large" /> Choose a city, or say exit to return to the main menu . </PROMPT>
<INPUT TYPE="OPTIONLIST"> <OPTION NEXT="#london_current">london</OPTION> <OPTION NEXT="#newyork_current">new york</OPTION>
<OPTION NEXT="#chicago. ">chicago</OPTION> <OPTION NEXT="#init">exit</OPTION> </INPUT> </STEP>
<STEP NAME= "market ">
<PROMPT>Market update is currently not supported. </PROMPT>
<INPUT TYPE="NONE" NEXT="#init" /> </STEP>
<STEP NAME="news">
<PROMPT>News update is currently not supported. </PROMPT>
<INPUT TYPE="NONE" NEXT="#init " /> </STEP>
<STEP NAME="bye" PARENT="help_top">
<PROMPT>Thanks for using Genie. Goodbye. </PR0MPT>
<INPUT TYPE="NONE" NEXT="#exit " /> </STEP>
</DIALOG>
FIG. 9 illustrates the same state diagram for the weather application as shown in FIG. 8 with labels for each dialog boundary. The initial dialog and dialogl contains the user prompts for welcome and city name. The Dialogl also controls the prompts for transitioning to hear a city's current or forecast weather and returning to the main menu. Dialog2 handles access of the weather database for the current conditions of the city specified by the user and the information is read to the user. The Dialog2 then returns control to dialogl again to get the user's next request. Similarly, dialog3 handles access of the weather database for the forecast of the city requested and speaks the information. It returns control to dailogl to get the next user input.
The markup language set forth below illustrates an example of the weather application corresponding to the dialog boundaries as presented in the state diagram of FIG. 9. The implementation of the application is with Active Server Pages using VBscript. It consists of three files called dialogl. asp, dialog2.asp, and dialog3.asp, each corresponding to the appropriate dialog.
For dialogl, there are two help message types, help_top and help_dialogl (lines 16 and 29). The first step, init, is at line 19. The weather step follows at line 32. Valid city names are those from the citylist table (line 36) of the weather database. Lines 7 and 8 accomplish the database connection via ADO. Line 38 is the start of a loop for creating an option list of all possible city responses. If the user chooses a city, control goes to the step getcurrentweather in dialog2 , as shown at line 40. In this case, the city name is also passed to dialog2 via the variable CITY at line 34. The last major step in dialogl is nextcommand and can be referenced by dialog2 or dialog3. It prompts the user for a cityname or the word forecast. Similar to the weather step, nextcommand uses a loop to create the optionlist (line 53). If the user responds with a city name, the step getcurrentweather in dialog2 is called. If the user responds with the word forecast, step getforecastweather is called instead.
Dialog2 contains a single step getcurrentweather. The step first reads the city name into local variable strCity (line 95). A database query tries to find a match in the weather database for the city (lines 97 and 98). If there is no weather information found for the city, the application will speak a message (line 101) and proceed to init step in dialogl (line 110). Otherwise, the application will speak the current weather information for the city (line 105) and switch to the nextcommand step in dialogl (line 112).
Dialog3 is similar to dialog2. It contains a single step getforecastweather. The database query is identical to the one in dialog2. If there is weather information available for the city, the application will speak the weather forecast (line 105), otherwise a notification message is spoken (line 101). Dialog3 relinquishes control back to dialogl with either the init step (line 110) or next command (line 112).
<%@ LANGUAGE="VBSCRIPT" %> <%
Option Explicit
Private objConnection, rsCities Private strCity, SQLQuery ' Create and open a connection to the database.
Set objConnection = Server. CreateObject ( "ADODB. Connection" ) objConnection. Open "Weather Database" %> <?XML VERSION="1.0"?> <! —
—— __ . ___ _ <! — (c) 1998 Motorola Inc. —>
<! — dialogl. asp —> <! — —>
<DIAL0G>
<CLASS NAME="help_top"> <HELP>You are at the top level menu. For weather information, say weather. </HELP> </CLASS>
<STEP NAME="init" PARENT="help_top"> <PROMPT>Welcome to Genie.<BREAK SIZE="large"/>
How may I help you? </PROMPT> <INPUT TYPE="OPTIONLIST">
<OPTION NEXT="#weather">weather</OPTION> <OPTION NEXT="#market">market</OPTION> <OPTION NEXT="#news">news</OPTION>
<OPTION NEXT="#bye">exit</OPTION> </INPUT> </STEP>
<CLASS NAME="help_dialogl"> <HELP>Your choices are <OPTIONS/>.</HELP>
</CLASS>
<STEP NAME="weather" PARENT="help_dialogl"> <PROMPT>What city? </PROMPT> <INPUT TYPE="optionlist" NAME="CITY"> <% ' Get all city names. %>
<% SQLQuery = "SELECT * FROM CityList" %> <% Set rsCities = objConnection .Execute ( SQLQuery) %>
<% Do Until rsCities.EOF %> <% ' Create an OPTION element for each city. %>
<OPTION NEXT="dialog2. asp#getcurrentweather"
VALUE="<%= rsCities ("City") %>"> <%= rsCities ("City") %></OPTION>
<% rsCities.MoveNext %> <% Loop %>
<OPTION NEXT="#init">exit</OPTION> </INPUT> </STEP>
<STEP NAME=" nextcommand" PARENT="help_dialogl"> <% strCity = Request .QueryString( "CITY" ) %> <PROMPT> To hear the 3 day forecast for <%=strCity%>, say forecast, or say another city name . </PROMPT> <INPUT TYPE="optionlist" NAME="CITY"> <% ' Get all city names. %> <% SQLQuery = "SELECT * FROM CityList" %> <% Set rsCities = objConnection. Execute (SQLQuery) %>
<% Do Until rsCities. EOF %>
<% ' Create an OPTION element for each city. %>
<OPTION NEXT="dialog2.asp#getcurrentweather"
VALUE="<%= rsCities ( "City" ) %>"> <%= rsCities ( "City") %></OPTION> <% rsCities.MoveNext %> <% Loop %> <OPTION
NEXT="dialog3. asp#getforecastweather"
VALUE="<%= strCity %>">forecast</OPTlON>
<OPTION NEXT="#init">exit</OPTION> </INPUT>
</STEP> <STEP NAME="market">
<PROMPT>Market update is currently not supported. </PROMPT> <INPUT TYPE="NONE" NEXT="#init" />
</STEP> <STEP NAME="news">
<PROMPT>News update is currently not supported. </PROMPT> <INPUT TYPE="NONE" NEXT="#init" />
</STEP> <STEP NAME="bye" PARENT="help_top">
<PROMPT>Thanks for using Genie. Goodbye. </PR0MPT> <INPUT TYPE="NONE" NEXT="#exit " />
</STEP> </DIALOG>
<! — End Of
Dialogl . asp —> <%@ LANGUAGE="VBSCRIPT" %> <%
Option Explicit
Private objConnection, rsWeather, SQLQuery Private strCity, Valid ' Create and open a connection to the database.
Set objConnection = Server. CreateObject ( "ADODB. Connection" ) objConnection. Open "Weather Database" %> <?XML VERSION="1.0"?> <! — —>
<! — (c) 1998 Motorola Inc. —>
<! — dialog2.asp —>
<! — —>
<DIAL0G>
<CLASS NAME="help_dialog2">
<HELP>Your choices are <OPTIONS/>.</HELP> </CLASS> <STEP NAME="getcurrentweather">
<% strCity = Reques .QueryString( "CITY" ) %> <% Valid = "TRUE" %>
<% SQLQuery = "SELECT * FROM WDB WHERE ( City= ' " & strCity & " ' ) " %>
<% Set rsWeather = objConnection. Execute (SQLQuery) %>
<% If rsWeather.EOF Then %> <% Valid = "FALSE" %> <PROMPT> Sorry, <BREAK/> There are no current weather conditions available for <%=strCity%>.<BREAK/></PROMPT> <% Else %> <% ' Speak current weather information %>
<PROMPT> <%=rsWeather( "Current" )%> </PROMPT> <%End If %>
<INPUT TYPE = "Hidden" NAME="CITY" VALUE="<%=strCity%>" > </INPUT>
<% If ( Valid = "FALSE" ) Then %> <INPUT TYPE="none" NEXT="dialogl. asp#init"</INPUT> <% Else %> <INPUT TYPE="none"
NEXT="dialogl. asp#nextcommand"x/INPUT> <% End If %> </STEP> </DIAL0G> <! — End of
Dialog2. asp —>
<%@ LANGUAGE^'VBSCRIPT" %> <%
Option Explicit Private objConnection, rsWeather, SQLQuery Private strCity, Valid ' Create and open a connection to the database. Set objConnection = Server. CreateObject ( "ADODB.Connection" ) objConnection.Open "Weather Database" %>
<?XML VERSION="1.0"?> <! —
—> ~ <! — (c) 1998 Motorola Inc. —>
<! — dialog3.asp —>
<! —
_—>
~ <DIALOG>
<CLASS NAME="helρ_dialog3">
<HELP>Your choices are <OPTIONS/>.</HELP> </CLASS>
<STEP NAME="getforecastweather"> <% strCity = Request. QueryString( "CITY") %> <% Valid = "TRUE" %>
<% SQLQuery = "SELECT * FROM WDB WHERE ( City= '" & strCity & " ' ) " %>
<% Set rsWeather = objConnection.Execute ( SQLQuery) %>
<% If rsWeather.EOF Then%> <% Valid = "FALSE" %> <PROMPT> Sorry, <BREAK/> There is no forecast weather available for <%=strCity%>.<BREAK/x/PROMPT> <% Else %>
<% ' Speak forecast weather information %> <PROMPT> <%=rsWeather( "Forecast" )%> </PROMPT>
<% End If %>
<INPUT TYPE = "Hidden" NAME="CITY" VALUE="<%=strCity%>" > </INPUT>
<% If ( Valid = "FALSE" ) Then%> <INPUT TYPE="none" NEXT="dialogl. asp#init"</INPUT> <% Else %> <INPUT TYPE="none" NEXT="dialogl . asp#nextcommand"></INPUT> <% End If %> </STEP>
</DIALOG>
<! — End of
Dialog3. asp —> Accordingly, there has been described herein methods and systems to allow users to access information from any location in the world via any suitable network access device. The user can access up-to-date information, such as, news updates, designated city weather, traffic conditions, stock quotes, and stock market indicators. The system also allows the user to perform various transactions (i.e., order flowers, place orders from restaurants, place buy or sell orders for stocks, obtain bank account balances, obtain telephone numbers, receive directions to destinations, etc.)
It will be apparent to those skilled in the art that the disclosed embodiment may be modified in numerous ways and may assume many embodiments other than the preferred form specifically set out and described above. Accordingly, it is intended by the appended claims to cover all modifications of the invention which fall within the true spirit and scope of the invention.

Claims

What is claimed is:
1. A voice browser to process a markup language comprising: a network fetcher unit to retrieve information from a destination of an information source; a parser unit, communicatively coupled to the network fetcher, to parse the retrieved information based upon a predetermined syntax, the parser unit generating a tree structure representing the hierarchy of the retrieved information; an interpreter unit, communicatively coupled to the network fetcher unit, to process the markup language; and a state machine communicatively coupled to the interpreter unit and the parser unit.
2. The voice browser of claim 1, wherein the state machine executes a state sequence based upon the tree structure and an input from a user.
3. The voice browser of claim 1, wherein the state machine maintains the voice browser in a current state based upon the tree structure.
4. A method of generating a grammar for a voice browser comprising the steps of: retrieving a markup language file; parsing the markup language file to determine at least one user input; determining whether the at least one user input corresponds to a predetermined grammar; using the predetermined grammar when the at least one user input corresponds to the predetermined grammar; determining a grammar based upon predefined phonetic rules and pronunciation when the at least one user input is not found in the predetermined grammar; sending the grammar to a speech recognition engine; and comparing the grammar to a user input.
5. A method of processing a markup language comprising the steps of: fetching the markup language document from an information source; parsing the markup language document to identify elements and attributes of the markup language document; generating a tree representation to identify the hierarchy of the elements and attributes of the markup language document; and identifying syntactic information associated with the markup language; interpreting at least one markup element in the markup language document; prompting the user to enter audio input; and processing the audio input in accordance with the markup language.
6. The method of claim 5, further comprising the step of generating a grammar based upon predetermined phonetic rules.
7. The method of claim 5, further comprising the step of comparing the grammar to an electronic representation of the audio input.
8. A method of processing a markup language document comprising the steps of: determining an electronic address for a destination; retrieving the markup language document from the destination; parsing the markup language document to identify at least one tag in the markup language documen ; generating a hierarchical tree structure having a plurality of elements; interpreting the tree structure to identify a first step element; playing an announcement to the user when a prompt element is identified in the tree structure; collecting input from the user when an input element is identified in the tree structure; and traversing the tree structure to identify a second step element.
9. The method of claim 8, wherein the step of traversing the tree structure is based upon one of a user input and the markup language document.
10. The method of claim 8, wherein the destination is associated with one of a database and an internet server file.
11. The method of claim 8, wherein the electronic address comprises one of a URL, an email address, a file address, a memory address, a pointer, and a variable.
12. A method of browsing an electronic markup language document comprising the steps of: receiving the markup language document; identifying one of a plurality of hierarchical elements within the electronic markup language document as an initial element containing an indicated starting point; extracting at least one attribute of a step element; extracting data from a prompt element; generating a voice communication to be read to the user based upon the data; extracting at least one option associated with a user input from an input element; and comparing a grammar to the user input.
13. The method of claim 12, further comprising the step of transitioning to a second step based upon one of the user input and an attribute of the input element.
14. A method of processing a markup language document comprising the steps of: receiving the markup language document from a content provider; identifying a prompt element having announcement; transforming the announcement of the prompt element into a phonetic representation of the announcement; and synthesizing the phonetic representation of the announcement to generate an audio output.
15. A computer comprising: an audio unit to receive an audio input from a user and to provide an audio output to the user; a voice browser, communicatively coupled to the audio unit, to process a markup language document and to receive the audio input from the user; and a markup language server, communicatively coupled to the audio unit, to store the markup language document and to provide the markup language document to the voice browser.
PCT/US1999/016776 1998-07-24 1999-07-23 Voice browser for interactive services and methods thereof WO2000005708A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP99937440A EP1099213A4 (en) 1998-07-24 1999-07-23 Voice browser for interactive services and methods thereof
AU52278/99A AU5227899A (en) 1998-07-24 1999-07-23 Voice browser for interactive services and methods thereof

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US9413198P 1998-07-24 1998-07-24
US9403298P 1998-07-24 1998-07-24
US60/094,032 1998-07-24
US60/094,131 1998-07-24
US09/165,487 US6269336B1 (en) 1998-07-24 1998-10-02 Voice browser for interactive services and methods thereof
US09/165,487 1998-10-02

Publications (1)

Publication Number Publication Date
WO2000005708A1 true WO2000005708A1 (en) 2000-02-03

Family

ID=27377630

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/016776 WO2000005708A1 (en) 1998-07-24 1999-07-23 Voice browser for interactive services and methods thereof

Country Status (4)

Country Link
US (2) US6269336B1 (en)
EP (1) EP1099213A4 (en)
AU (1) AU5227899A (en)
WO (1) WO2000005708A1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000078022A1 (en) * 1999-06-11 2000-12-21 Telstra New Wave Pty Ltd A method of developing an interactive system
EP1139335A3 (en) * 2000-03-31 2001-12-05 Canon Kabushiki Kaisha Voice browser system
WO2002005264A1 (en) * 2000-07-07 2002-01-17 Siemens Aktiengesellschaft Voice-controlled system and method for voice input and voice recognition
WO2002007075A1 (en) * 2000-07-19 2002-01-24 Siemens Aktiengesellschaft Electronic calling card
GB2366010A (en) * 2000-03-23 2002-02-27 Canon Kk Machine interface including mark-up instructions and a word probability search
EP1211861A1 (en) * 2000-12-04 2002-06-05 Alcatel Browser environment for accessing local and remote services on a phone
WO2002046959A2 (en) * 2000-12-08 2002-06-13 Koninklijke Philips Electronics N.V. Distributed speech recognition for internet access
DE10064661A1 (en) * 2000-12-22 2002-07-11 Siemens Ag Communication arrangement and method for communication systems with interactive speech function
WO2002073599A1 (en) * 2001-03-12 2002-09-19 Mediavoice S.R.L. Method for enabling the voice interaction with a web page
GB2373697A (en) * 2000-11-29 2002-09-25 Hewlett Packard Co Locality-dependent presentation
WO2002044887A3 (en) * 2000-12-01 2003-04-24 Univ Columbia A method and system for voice activating web pages
WO2003039122A1 (en) * 2001-10-29 2003-05-08 Siemens Aktiengesellschaft Method and system for dynamic generation of announcement contents
AU777441B2 (en) * 1999-06-11 2004-10-14 Telstra Corporation Limited A method of developing an interactive system
US7062297B2 (en) 2000-07-21 2006-06-13 Telefonaktiebolaget L M Ericsson (Publ) Method and system for accessing a network using voice recognition
US7149287B1 (en) 2002-01-17 2006-12-12 Snowshore Networks, Inc. Universal voice browser framework
US7212971B2 (en) 2001-12-20 2007-05-01 Canon Kabushiki Kaisha Control apparatus for enabling a user to communicate by speech with a processor-controlled apparatus
EP1881685A1 (en) * 2000-12-01 2008-01-23 The Trustees Of Columbia University In The City Of New York A method and system for voice activating web pages
US7712031B2 (en) 2002-07-24 2010-05-04 Telstra Corporation Limited System and process for developing a voice application
US7917363B2 (en) 2003-02-11 2011-03-29 Telstra Corporation Limited System for predicting speech recognition accuracy and development for a dialog system
US8046227B2 (en) 2002-09-06 2011-10-25 Telestra Corporation Limited Development system for a dialog system
US8296129B2 (en) 2003-04-29 2012-10-23 Telstra Corporation Limited System and process for grammatical inference
US9202467B2 (en) 2003-06-06 2015-12-01 The Trustees Of Columbia University In The City Of New York System and method for voice activating web pages
EP3246828A1 (en) * 2016-05-19 2017-11-22 Palo Alto Research Center Incorporated Natural language web browser

Families Citing this family (477)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7387253B1 (en) 1996-09-03 2008-06-17 Hand Held Products, Inc. Optical reader system comprising local host processor and optical reader
US8910876B2 (en) 1994-05-25 2014-12-16 Marshall Feature Recognition, Llc Method and apparatus for accessing electronic data via a familiar printed medium
US6164534A (en) * 1996-04-04 2000-12-26 Rathus; Spencer A. Method and apparatus for accessing electronic data via a familiar printed medium
US7712668B2 (en) 1994-05-25 2010-05-11 Marshall Feature Recognition, Llc Method and apparatus for accessing electronic data via a familiar printed medium
US6866196B1 (en) * 1994-05-25 2005-03-15 Spencer A. Rathus Method and apparatus for accessing electronic data via a familiar printed medium
US8261993B2 (en) 1994-05-25 2012-09-11 Marshall Feature Recognition, Llc Method and apparatus for accessing electronic data via a familiar printed medium
US6775264B1 (en) 1997-03-03 2004-08-10 Webley Systems, Inc. Computer, internet and telecommunications based network
US6144938A (en) * 1998-05-01 2000-11-07 Sun Microsystems, Inc. Voice user interface with personality
FR2779597B1 (en) * 1998-06-03 2000-09-22 France Telecom APPARATUS FOR QUERYING A SERVER CENTER
US20110082705A1 (en) * 1998-06-16 2011-04-07 Paul Kobylevsky Remote Prescription Refill System
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof
US6343116B1 (en) * 1998-09-21 2002-01-29 Microsoft Corporation Computer telephony application programming interface
US7251315B1 (en) * 1998-09-21 2007-07-31 Microsoft Corporation Speech processing for telephony API
JP4067276B2 (en) * 1998-09-22 2008-03-26 ノキア コーポレイション Method and system for configuring a speech recognition system
US9037451B2 (en) * 1998-09-25 2015-05-19 Rpx Corporation Systems and methods for multiple mode voice and data communications using intelligently bridged TDM and packet buses and methods for implementing language capabilities using the same
US6385583B1 (en) * 1998-10-02 2002-05-07 Motorola, Inc. Markup language for interactive services and methods thereof
CA2345665C (en) * 1998-10-02 2011-02-08 International Business Machines Corporation Conversational computing via conversational virtual machine
US7283973B1 (en) * 1998-10-07 2007-10-16 Logic Tree Corporation Multi-modal voice-enabled content access and delivery system
US6941273B1 (en) * 1998-10-07 2005-09-06 Masoud Loghmani Telephony-data application interface apparatus and method for multi-modal access to data applications
US6163794A (en) * 1998-10-23 2000-12-19 General Magic Network system extensible by users
US6807254B1 (en) * 1998-11-06 2004-10-19 Nms Communications Method and system for interactive messaging
US7082397B2 (en) * 1998-12-01 2006-07-25 Nuance Communications, Inc. System for and method of creating and browsing a voice web
US7263489B2 (en) 1998-12-01 2007-08-28 Nuance Communications, Inc. Detection of characteristics of human-machine interactions for dialog customization and analysis
US6707891B1 (en) * 1998-12-28 2004-03-16 Nms Communications Method and system for voice electronic mail
US6635089B1 (en) * 1999-01-13 2003-10-21 International Business Machines Corporation Method for producing composite XML document object model trees using dynamic data retrievals
US6804333B1 (en) 1999-01-28 2004-10-12 International Business Machines Corporation Dynamically reconfigurable distributed interactive voice response system
US7966078B2 (en) * 1999-02-01 2011-06-21 Steven Hoffberg Network media appliance system and method
US6223165B1 (en) 1999-03-22 2001-04-24 Keen.Com, Incorporated Method and apparatus to connect consumer to expert
US8321411B2 (en) 1999-03-23 2012-11-27 Microstrategy, Incorporated System and method for management of an automatic OLAP report broadcast system
US20050261907A1 (en) 1999-04-12 2005-11-24 Ben Franklin Patent Holding Llc Voice integration platform
US6408272B1 (en) * 1999-04-12 2002-06-18 General Magic, Inc. Distributed voice user interface
JP2002543676A (en) * 1999-04-26 2002-12-17 ノキア モービル フォーンズ リミテッド Wireless terminal for browsing the Internet
GB2349545A (en) * 1999-04-26 2000-11-01 Nokia Mobile Phones Ltd Terminal for providing an application using a browser
US6535506B1 (en) * 1999-05-11 2003-03-18 Click Interconnect, Inc. Method and apparatus for establishing communications with a remote node on a switched network based on hypertext calling received from a packet network
US6434527B1 (en) * 1999-05-17 2002-08-13 Microsoft Corporation Signalling and controlling the status of an automatic speech recognition system for use in handsfree conversational dialogue
US9208213B2 (en) 1999-05-28 2015-12-08 Microstrategy, Incorporated System and method for network user interface OLAP report formatting
US8607138B2 (en) 1999-05-28 2013-12-10 Microstrategy, Incorporated System and method for OLAP report generation with spreadsheet report within the network user interface
IL131135A0 (en) * 1999-07-27 2001-01-28 Electric Lighthouse Software L A method and system for electronic mail
US7457397B1 (en) * 1999-08-24 2008-11-25 Microstrategy, Inc. Voice page directory system in a voice page creation and delivery system
US6792086B1 (en) * 1999-08-24 2004-09-14 Microstrategy, Inc. Voice network access provider system and method
US6952800B1 (en) * 1999-09-03 2005-10-04 Cisco Technology, Inc. Arrangement for controlling and logging voice enabled web applications using extensible markup language documents
US6912691B1 (en) * 1999-09-03 2005-06-28 Cisco Technology, Inc. Delivering voice portal services using an XML voice-enabled web server
US8448059B1 (en) * 1999-09-03 2013-05-21 Cisco Technology, Inc. Apparatus and method for providing browser audio control for voice enabled web applications
US6901431B1 (en) * 1999-09-03 2005-05-31 Cisco Technology, Inc. Application server providing personalized voice enabled web application services using extensible markup language documents
US6711618B1 (en) 1999-09-03 2004-03-23 Cisco Technology, Inc. Apparatus and method for providing server state and attribute management for voice enabled web applications
US6766298B1 (en) * 1999-09-03 2004-07-20 Cisco Technology, Inc. Application server configured for dynamically generating web pages for voice enabled web applications
US6578000B1 (en) * 1999-09-03 2003-06-10 Cisco Technology, Inc. Browser-based arrangement for developing voice enabled web applications using extensible markup language documents
US6738803B1 (en) * 1999-09-03 2004-05-18 Cisco Technology, Inc. Proxy browser providing voice enabled web application audio control for telephony devices
US6847999B1 (en) * 1999-09-03 2005-01-25 Cisco Technology, Inc. Application server for self-documenting voice enabled web applications defined using extensible markup language documents
US7457279B1 (en) 1999-09-10 2008-11-25 Vertical Communications Acquisition Corp. Method, system, and computer program product for managing routing servers and services
US7123608B1 (en) * 1999-09-10 2006-10-17 Array Telecom Corporation Method, system, and computer program product for managing database servers and service
WO2001018679A2 (en) 1999-09-10 2001-03-15 Everypath, Inc. Method for converting two-dimensional data into a canonical representation
US6964012B1 (en) 1999-09-13 2005-11-08 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, including deployment through personalized broadcasts
US6836537B1 (en) 1999-09-13 2004-12-28 Microstrategy Incorporated System and method for real-time, personalized, dynamic, interactive voice services for information related to existing travel schedule
US6829334B1 (en) * 1999-09-13 2004-12-07 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with telephone-based service utilization and control
US7266181B1 (en) * 1999-09-13 2007-09-04 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized dynamic and interactive voice services with integrated inbound and outbound voice services
US6873693B1 (en) 1999-09-13 2005-03-29 Microstrategy, Incorporated System and method for real-time, personalized, dynamic, interactive voice services for entertainment-related information
US7127403B1 (en) * 1999-09-13 2006-10-24 Microstrategy, Inc. System and method for personalizing an interactive voice broadcast of a voice service based on particulars of a request
US8130918B1 (en) 1999-09-13 2012-03-06 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with closed loop transaction processing
US6940953B1 (en) * 1999-09-13 2005-09-06 Microstrategy, Inc. System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services including module for generating and formatting voice services
US7330815B1 (en) 1999-10-04 2008-02-12 Globalenglish Corporation Method and system for network-based speech recognition
US7143042B1 (en) * 1999-10-04 2006-11-28 Nuance Communications Tool for graphically defining dialog flows and for establishing operational links between speech applications and hypermedia content in an interactive voice response environment
US7685252B1 (en) * 1999-10-12 2010-03-23 International Business Machines Corporation Methods and systems for multi-modal browsing and implementation of a conversational markup language
US6807574B1 (en) * 1999-10-22 2004-10-19 Tellme Networks, Inc. Method and apparatus for content personalization over a telephone interface
US7941481B1 (en) 1999-10-22 2011-05-10 Tellme Networks, Inc. Updating an electronic phonebook over electronic communication networks
US6510411B1 (en) 1999-10-29 2003-01-21 Unisys Corporation Task oriented dialog model and manager
US7050977B1 (en) 1999-11-12 2006-05-23 Phoenix Solutions, Inc. Speech-enabled server for internet website and method
US6633846B1 (en) 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
US6665640B1 (en) 1999-11-12 2003-12-16 Phoenix Solutions, Inc. Interactive speech based learning/training system formulating search queries based on natural language parsing of recognized user queries
US6615172B1 (en) 1999-11-12 2003-09-02 Phoenix Solutions, Inc. Intelligent query engine for processing voice based queries
US9076448B2 (en) 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US7725307B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
US6640210B1 (en) * 1999-11-12 2003-10-28 Frederick Anthony Schaefer Customer service operation using wav files
US7392185B2 (en) 1999-11-12 2008-06-24 Phoenix Solutions, Inc. Speech based learning/training system using semantic decoding
US7716077B1 (en) 1999-11-22 2010-05-11 Accenture Global Services Gmbh Scheduling and planning maintenance and service in a network-based supply chain environment
US8032409B1 (en) 1999-11-22 2011-10-04 Accenture Global Services Limited Enhanced visibility during installation management in a network-based supply chain environment
US8271336B2 (en) 1999-11-22 2012-09-18 Accenture Global Services Gmbh Increased visibility during order management in a network-based supply chain environment
US7130807B1 (en) * 1999-11-22 2006-10-31 Accenture Llp Technology sharing during demand and supply planning in a network-based supply chain environment
US7124101B1 (en) 1999-11-22 2006-10-17 Accenture Llp Asset tracking in a network-based supply chain environment
US6349132B1 (en) * 1999-12-16 2002-02-19 Talk2 Technology, Inc. Voice interface for electronic documents
US8271287B1 (en) * 2000-01-14 2012-09-18 Alcatel Lucent Voice command remote control system
US6760697B1 (en) * 2000-01-25 2004-07-06 Minds And Technology, Inc. Centralized processing of digital speech data originated at the network clients of a set of servers
US6721705B2 (en) 2000-02-04 2004-04-13 Webley Systems, Inc. Robust voice browser system and voice activated device controller
US7516190B2 (en) 2000-02-04 2009-04-07 Parus Holdings, Inc. Personal voice-based information retrieval system
WO2001059759A1 (en) * 2000-02-10 2001-08-16 Randolphrand.Com Llp Recorder adapted to interface with internet browser
US6633311B1 (en) * 2000-02-18 2003-10-14 Hewlett-Packard Company, L.P. E-service to manage and export contact information
US6889213B1 (en) * 2000-02-18 2005-05-03 Hewlett-Packard Development Company, L.P. E-service to manage contact information with privacy levels
US7054487B2 (en) * 2000-02-18 2006-05-30 Anoto Ip Lic Handelsbolag Controlling and electronic device
CN1279730C (en) * 2000-02-21 2006-10-11 株式会社Ntt都科摩 Information distribution method, information distribution system and information distribution server
WO2001065808A2 (en) * 2000-02-28 2001-09-07 Iperia, Inc. Apparatus and method for telephony service interface
US6757362B1 (en) * 2000-03-06 2004-06-29 Avaya Technology Corp. Personal virtual assistant
US6999448B1 (en) * 2000-03-14 2006-02-14 Avaya Technology Corp. Internet protocol standards-based multi-media messaging
AU2001244906A1 (en) * 2000-03-17 2001-09-24 Susanna Merenyi On line oral text reader system
US6510417B1 (en) * 2000-03-21 2003-01-21 America Online, Inc. System and method for voice access to internet-based information
US7213027B1 (en) 2000-03-21 2007-05-01 Aol Llc System and method for the transformation and canonicalization of semantically structured data
US8131555B1 (en) * 2000-03-21 2012-03-06 Aol Inc. System and method for funneling user responses in an internet voice portal system to determine a desired item or service
US6662163B1 (en) * 2000-03-30 2003-12-09 Voxware, Inc. System and method for programming portable devices from a remote computer system
US6883015B1 (en) * 2000-03-30 2005-04-19 Cisco Technology, Inc. Apparatus and method for providing server state and attribute management for multiple-threaded voice enabled web applications
US7415537B1 (en) 2000-04-07 2008-08-19 International Business Machines Corporation Conversational portal for providing conversational browsing and multimedia broadcast on demand
US6560576B1 (en) * 2000-04-25 2003-05-06 Nuance Communications Method and apparatus for providing active help to a user of a voice-enabled application
JP2001306601A (en) * 2000-04-27 2001-11-02 Canon Inc Device and method for document processing and storage medium stored with program thereof
US7050993B1 (en) * 2000-04-27 2006-05-23 Nokia Corporation Advanced service redirector for personal computer
US6973617B1 (en) * 2000-05-24 2005-12-06 Cisco Technology, Inc. Apparatus and method for contacting a customer support line on customer's behalf and having a customer support representative contact the customer
US6801793B1 (en) * 2000-06-02 2004-10-05 Nokia Corporation Systems and methods for presenting and/or converting messages
US6438575B1 (en) * 2000-06-07 2002-08-20 Clickmarks, Inc. System, method, and article of manufacture for wireless enablement of the world wide web using a wireless gateway
US7219136B1 (en) * 2000-06-12 2007-05-15 Cisco Technology, Inc. Apparatus and methods for providing network-based information suitable for audio output
US6662157B1 (en) * 2000-06-19 2003-12-09 International Business Machines Corporation Speech recognition system for database access through the use of data domain overloading of grammars
US6654722B1 (en) * 2000-06-19 2003-11-25 International Business Machines Corporation Voice over IP protocol based speech system
US7117215B1 (en) 2001-06-07 2006-10-03 Informatica Corporation Method and apparatus for transporting data for data warehousing applications that incorporates analytic data interface
JP2002023777A (en) * 2000-06-26 2002-01-25 Internatl Business Mach Corp <Ibm> Voice synthesizing system, voice synthesizing method, server, storage medium, program transmitting device, voice synthetic data storage medium and voice outputting equipment
US7653748B2 (en) * 2000-08-10 2010-01-26 Simplexity, Llc Systems, methods and computer program products for integrating advertising within web content
AU2001283579A1 (en) * 2000-08-21 2002-03-04 Yahoo, Inc. Method and system of interpreting and presenting web content using a voice browser
US6631350B1 (en) * 2000-08-28 2003-10-07 International Business Machines Corporation Device-independent speech audio system for linking a speech driven application to specific audio input and output devices
US6856958B2 (en) * 2000-09-05 2005-02-15 Lucent Technologies Inc. Methods and apparatus for text to speech processing using language independent prosody markup
US7110963B2 (en) * 2000-09-07 2006-09-19 Manuel Negreiro Point-of-sale customer order system utilizing an unobtrusive transmitter/receiver and voice recognition software
US6580786B1 (en) 2000-09-11 2003-06-17 Yahoo! Inc. Message store architecture
US6556563B1 (en) 2000-09-11 2003-04-29 Yahoo! Inc. Intelligent voice bridging
US7095733B1 (en) * 2000-09-11 2006-08-22 Yahoo! Inc. Voice integrated VOIP system
US6567419B1 (en) 2000-09-11 2003-05-20 Yahoo! Inc. Intelligent voice converter
US6785651B1 (en) * 2000-09-14 2004-08-31 Microsoft Corporation Method and apparatus for performing plan-based dialog
US7406657B1 (en) * 2000-09-22 2008-07-29 International Business Machines Corporation Audible presentation and verbal interaction of HTML-like form constructs
US7240006B1 (en) * 2000-09-27 2007-07-03 International Business Machines Corporation Explicitly registering markup based on verbal commands and exploiting audio context
US7454346B1 (en) * 2000-10-04 2008-11-18 Cisco Technology, Inc. Apparatus and methods for converting textual information to audio-based output
CA2425844A1 (en) * 2000-10-16 2002-04-25 Eliza Corporation Method of and system for providing adaptive respondent training in a speech recognition application
US6636590B1 (en) * 2000-10-30 2003-10-21 Ingenio, Inc. Apparatus and method for specifying and obtaining services through voice commands
US7483983B1 (en) 2000-11-13 2009-01-27 Telecommunication Systems, Inc. Method and system for deploying content to wireless devices
US8135589B1 (en) 2000-11-30 2012-03-13 Google Inc. Performing speech recognition over a network and using speech recognition results
US6823306B2 (en) * 2000-11-30 2004-11-23 Telesector Resources Group, Inc. Methods and apparatus for generating, updating and distributing speech recognition models
US6915262B2 (en) 2000-11-30 2005-07-05 Telesector Resources Group, Inc. Methods and apparatus for performing speech recognition and using speech recognition results
GB0029576D0 (en) * 2000-12-02 2001-01-17 Hewlett Packard Co Voice site personality setting
US7487440B2 (en) * 2000-12-04 2009-02-03 International Business Machines Corporation Reusable voiceXML dialog components, subdialogs and beans
US7016847B1 (en) * 2000-12-08 2006-03-21 Ben Franklin Patent Holdings L.L.C. Open architecture for a voice user interface
US7170979B1 (en) * 2000-12-08 2007-01-30 Ben Franklin Patent Holding Llc System for embedding programming language content in voiceXML
GB0030330D0 (en) * 2000-12-13 2001-01-24 Hewlett Packard Co Idiom handling in voice service systems
US6678354B1 (en) * 2000-12-14 2004-01-13 Unisys Corporation System and method for determining number of voice processing engines capable of support on a data processing system
US20020091527A1 (en) * 2001-01-08 2002-07-11 Shyue-Chin Shiau Distributed speech recognition server system for mobile internet/intranet communication
US7289623B2 (en) 2001-01-16 2007-10-30 Utbk, Inc. System and method for an online speaker patch-through
US6940820B2 (en) * 2001-01-19 2005-09-06 General Instrument Corporation Voice-aided diagnostic for voice over internet protocol (VOIP) based device
US6845356B1 (en) * 2001-01-31 2005-01-18 International Business Machines Corporation Processing dual tone multi-frequency signals for use with a natural language understanding system
US6964023B2 (en) * 2001-02-05 2005-11-08 International Business Machines Corporation System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
US7200142B1 (en) 2001-02-08 2007-04-03 Logic Tree Corporation System for providing multi-phased, multi-modal access to content through voice and data devices
US6948129B1 (en) 2001-02-08 2005-09-20 Masoud S Loghmani Multi-modal, multi-path user interface for simultaneous access to internet data over multiple media
US8000320B2 (en) * 2001-02-08 2011-08-16 Logic Tree Corporation System for providing multi-phased, multi-modal access to content through voice and data devices
US20020193997A1 (en) * 2001-03-09 2002-12-19 Fitzpatrick John E. System, method and computer program product for dynamic billing using tags in a speech recognition framework
US20020133402A1 (en) * 2001-03-13 2002-09-19 Scott Faber Apparatus and method for recruiting, communicating with, and paying participants of interactive advertising
US6832196B2 (en) * 2001-03-30 2004-12-14 International Business Machines Corporation Speech driven data selection in a voice-enabled program
US20020156895A1 (en) * 2001-04-20 2002-10-24 Brown Michael T. System and method for sharing contact information
US20030023439A1 (en) * 2001-05-02 2003-01-30 Gregory Ciurpita Method and apparatus for automatic recognition of long sequences of spoken digits
US20050028085A1 (en) * 2001-05-04 2005-02-03 Irwin James S. Dynamic generation of voice application information from a web server
US7409349B2 (en) * 2001-05-04 2008-08-05 Microsoft Corporation Servers for web enabled speech recognition
US7610547B2 (en) * 2001-05-04 2009-10-27 Microsoft Corporation Markup language extensions for web enabled recognition
US7506022B2 (en) * 2001-05-04 2009-03-17 Microsoft.Corporation Web enabled recognition architecture
US20030083882A1 (en) * 2001-05-14 2003-05-01 Schemers Iii Roland J. Method and apparatus for incorporating application logic into a voice responsive system
US7111787B2 (en) * 2001-05-15 2006-09-26 Hand Held Products, Inc. Multimode image capturing and decoding optical reader
US7366712B2 (en) * 2001-05-31 2008-04-29 Intel Corporation Information retrieval center gateway
US6601762B2 (en) * 2001-06-15 2003-08-05 Koninklijke Philips Electronics N.V. Point-of-sale (POS) voice authentication transaction system
US7162643B1 (en) 2001-06-15 2007-01-09 Informatica Corporation Method and system for providing transfer of analytic application data over a network
US7174006B2 (en) * 2001-06-18 2007-02-06 Nms Communications Corporation Method and system of VoiceXML interpreting
US6941268B2 (en) * 2001-06-21 2005-09-06 Tellme Networks, Inc. Handling of speech recognition in a declarative markup language
US7609829B2 (en) * 2001-07-03 2009-10-27 Apptera, Inc. Multi-platform capable inference engine and universal grammar language adapter for intelligent voice application execution
US20030007609A1 (en) * 2001-07-03 2003-01-09 Yuen Michael S. Method and apparatus for development, deployment, and maintenance of a voice software application for distribution to one or more consumers
US7720842B2 (en) 2001-07-16 2010-05-18 Informatica Corporation Value-chained queries in analytic applications
US20030023431A1 (en) * 2001-07-26 2003-01-30 Marc Neuberger Method and system for augmenting grammars in distributed voice browsing
US7133899B2 (en) * 2001-07-31 2006-11-07 Cingular Wireless Ii, Llc Method and apparatus for providing interactive text messages during a voice call
GB0121160D0 (en) * 2001-08-31 2001-10-24 Mitel Knowledge Corp Split browser
US6704403B2 (en) 2001-09-05 2004-03-09 Ingenio, Inc. Apparatus and method for ensuring a real-time connection between users and selected service provider using voice mail
US20030046710A1 (en) * 2001-09-05 2003-03-06 Moore John F. Multi-media communication system for the disabled and others
US20030055649A1 (en) * 2001-09-17 2003-03-20 Bin Xu Methods for accessing information on personal computers using voice through landline or wireless phones
US7191233B2 (en) 2001-09-17 2007-03-13 Telecommunication Systems, Inc. System for automated, mid-session, user-directed, device-to-device session transfer system
DE10147341B4 (en) * 2001-09-26 2005-05-19 Voiceobjects Ag Method and device for constructing a dialog control implemented in a computer system from dialog objects and associated computer system for carrying out a dialog control
US7752266B2 (en) 2001-10-11 2010-07-06 Ebay Inc. System and method to facilitate translation of communications between entities over a network
US8229753B2 (en) * 2001-10-21 2012-07-24 Microsoft Corporation Web server controls for web enabled recognition and/or audible prompting
US7711570B2 (en) 2001-10-21 2010-05-04 Microsoft Corporation Application abstraction with dialog purpose
US20030078775A1 (en) * 2001-10-22 2003-04-24 Scott Plude System for wireless delivery of content and applications
DE10200855A1 (en) * 2001-10-29 2003-05-08 Siemens Ag Method and system for the dynamic generation of announcement content
US7187762B2 (en) * 2001-11-15 2007-03-06 International Business Machines Corporation Conferencing additional callers into an established voice browsing session
US7483832B2 (en) * 2001-12-10 2009-01-27 At&T Intellectual Property I, L.P. Method and system for customizing voice translation of text to speech
US20060069567A1 (en) * 2001-12-10 2006-03-30 Tischer Steven N Methods, systems, and products for translating text to speech
US7580850B2 (en) 2001-12-14 2009-08-25 Utbk, Inc. Apparatus and method for online advice customer relationship management
US7206744B2 (en) * 2001-12-14 2007-04-17 Sbc Technology Resources, Inc. Voice review of privacy policy in a mobile environment
GB2387927B (en) * 2001-12-20 2005-07-13 Canon Kk Control apparatus
US7937439B2 (en) 2001-12-27 2011-05-03 Utbk, Inc. Apparatus and method for scheduling live advice communication with a selected service provider
WO2003058938A1 (en) * 2001-12-28 2003-07-17 V-Enable, Inc. Information retrieval system including voice browser and data conversion server
US20030125953A1 (en) * 2001-12-28 2003-07-03 Dipanshu Sharma Information retrieval system including voice browser and data conversion server
US20040030554A1 (en) * 2002-01-09 2004-02-12 Samya Boxberger-Oberoi System and method for providing locale-specific interpretation of text data
US20030145062A1 (en) * 2002-01-14 2003-07-31 Dipanshu Sharma Data conversion server for voice browsing system
WO2003063137A1 (en) * 2002-01-22 2003-07-31 V-Enable, Inc. Multi-modal information delivery system
US20030139929A1 (en) * 2002-01-24 2003-07-24 Liang He Data transmission system and method for DSR application over GPRS
US7062444B2 (en) * 2002-01-24 2006-06-13 Intel Corporation Architecture for DSR client and server development platform
US7324942B1 (en) * 2002-01-29 2008-01-29 Microstrategy, Incorporated System and method for interactive voice services using markup language with N-best filter element
US6820077B2 (en) 2002-02-22 2004-11-16 Informatica Corporation Method and system for navigating a large amount of data
US7917581B2 (en) 2002-04-02 2011-03-29 Verizon Business Global Llc Call completion via instant communications client
US8856236B2 (en) 2002-04-02 2014-10-07 Verizon Patent And Licensing Inc. Messaging response system
AU2003223408A1 (en) * 2002-04-02 2003-10-20 Worldcom, Inc. Communications gateway with messaging communications interface
WO2003091827A2 (en) * 2002-04-26 2003-11-06 Fluency Voice Technology Limited A system and method for creating voice applications
GB2388286A (en) * 2002-05-01 2003-11-05 Seiko Epson Corp Enhanced speech data for use in a text to speech system
US20030216923A1 (en) * 2002-05-15 2003-11-20 Gilmore Jeffrey A. Dynamic content generation for voice messages
US20030214523A1 (en) * 2002-05-16 2003-11-20 Kuansan Wang Method and apparatus for decoding ambiguous input using anti-entities
US7546382B2 (en) 2002-05-28 2009-06-09 International Business Machines Corporation Methods and systems for authoring of mixed-initiative multi-modal interactions and related browsing mechanisms
US20030225622A1 (en) * 2002-05-28 2003-12-04 Doan William T. Method and system for entering orders of customers
US7212615B2 (en) * 2002-05-31 2007-05-01 Scott Wolmuth Criteria based marketing for telephone directory assistance
US7398209B2 (en) * 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US20050149331A1 (en) * 2002-06-14 2005-07-07 Ehrilich Steven C. Method and system for developing speech applications
US7502730B2 (en) * 2002-06-14 2009-03-10 Microsoft Corporation Method and apparatus for federated understanding
US20030235183A1 (en) * 2002-06-21 2003-12-25 Net2Phone, Inc. Packetized voice system and method
US7693720B2 (en) 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
US6876727B2 (en) 2002-07-24 2005-04-05 Sbc Properties, Lp Voice over IP method for developing interactive voice response system
US7216287B2 (en) * 2002-08-02 2007-05-08 International Business Machines Corporation Personal voice portal service
US7249019B2 (en) * 2002-08-06 2007-07-24 Sri International Method and apparatus for providing an integrated speech recognition and natural language understanding for a dialog system
US20040034532A1 (en) * 2002-08-16 2004-02-19 Sugata Mukhopadhyay Filter architecture for rapid enablement of voice access to data repositories
US7602704B2 (en) * 2002-08-20 2009-10-13 Cisco Technology, Inc. System and method for providing fault tolerant IP services
US20040037399A1 (en) * 2002-08-21 2004-02-26 Siemens Information And Communication Mobile, Llc System and method for transferring phone numbers during a voice call
US20040128136A1 (en) * 2002-09-20 2004-07-01 Irani Pourang Polad Internet voice browser
US7426535B2 (en) * 2002-10-08 2008-09-16 Telecommunication Systems, Inc. Coordination of data received from one or more sources over one or more channels into a single context
US7254541B2 (en) * 2002-10-30 2007-08-07 Hewlett-Packard Development Company, L.P. Systems and methods for providing users with information in audible form
US7136804B2 (en) * 2002-10-30 2006-11-14 Hewlett-Packard Development Company, L.P. Systems and methods for providing users with information in audible form
JP4173718B2 (en) * 2002-10-31 2008-10-29 富士通株式会社 Window switching device and window switching program
US7421389B2 (en) * 2002-11-13 2008-09-02 At&T Knowledge Ventures, L.P. System and method for remote speech recognition
US20040095754A1 (en) * 2002-11-19 2004-05-20 Yuan-Shun Hsu Candle lamp
US7571100B2 (en) * 2002-12-03 2009-08-04 Speechworks International, Inc. Speech recognition and speaker verification using distributed speech processing
US7177817B1 (en) 2002-12-12 2007-02-13 Tuvox Incorporated Automatic generation of voice content for a voice response system
US6834265B2 (en) * 2002-12-13 2004-12-21 Motorola, Inc. Method and apparatus for selective speech recognition
US7197331B2 (en) * 2002-12-30 2007-03-27 Motorola, Inc. Method and apparatus for selective distributed speech recognition
DE10304229A1 (en) * 2003-01-28 2004-08-05 Deutsche Telekom Ag Communication system, communication terminal and device for recognizing faulty text messages
US7783475B2 (en) * 2003-01-31 2010-08-24 Comverse, Inc. Menu-based, speech actuated system with speak-ahead capability
US7395505B1 (en) * 2003-03-17 2008-07-01 Tuvox, Inc. Graphical user interface for creating content for a voice-user interface
JP2004302300A (en) * 2003-03-31 2004-10-28 Canon Inc Information processing method
US7698435B1 (en) 2003-04-15 2010-04-13 Sprint Spectrum L.P. Distributed interactive media system and method
US7260535B2 (en) * 2003-04-28 2007-08-21 Microsoft Corporation Web server controls for web enabled recognition and/or audible prompting for call controls
US20040230637A1 (en) * 2003-04-29 2004-11-18 Microsoft Corporation Application controls for speech enabled recognition
US7366777B2 (en) * 2003-05-15 2008-04-29 Sap Aktiengesellschaft Web application router
US20040254787A1 (en) * 2003-06-12 2004-12-16 Shah Sheetal R. System and method for distributed speech recognition with a cache feature
US7698183B2 (en) * 2003-06-18 2010-04-13 Utbk, Inc. Method and apparatus for prioritizing a listing of information providers
US7243072B2 (en) * 2003-06-27 2007-07-10 Motorola, Inc. Providing assistance to a subscriber device over a network
US20040266418A1 (en) * 2003-06-27 2004-12-30 Motorola, Inc. Method and apparatus for controlling an electronic device
DE10330263B3 (en) * 2003-07-04 2005-03-03 Lisa Dräxlmaier GmbH Device for extracting or inserting a fuse
US7886009B2 (en) 2003-08-22 2011-02-08 Utbk, Inc. Gate keeper
US8311835B2 (en) * 2003-08-29 2012-11-13 Microsoft Corporation Assisted multi-modal dialogue
US7490286B2 (en) * 2003-09-25 2009-02-10 International Business Machines Corporation Help option enhancement for interactive voice response systems
US7120235B2 (en) * 2003-10-06 2006-10-10 Ingenio, Inc. Method and apparatus to provide pay-per-call performance based advertising
US7424442B2 (en) 2004-05-04 2008-09-09 Utbk, Inc. Method and apparatus to allocate and recycle telephone numbers in a call-tracking system
US7428497B2 (en) 2003-10-06 2008-09-23 Utbk, Inc. Methods and apparatuses for pay-per-call advertising in mobile/wireless applications
US9202220B2 (en) * 2003-10-06 2015-12-01 Yellowpages.Com Llc Methods and apparatuses to provide application programming interface for retrieving pay per call advertisements
US7366683B2 (en) 2003-10-06 2008-04-29 Utbk, Inc. Methods and apparatuses for offline selection of pay-per-call advertisers
US8121898B2 (en) 2003-10-06 2012-02-21 Utbk, Inc. Methods and apparatuses for geographic area selections in pay-per-call advertisement
US8024224B2 (en) 2004-03-10 2011-09-20 Utbk, Inc. Method and apparatus to provide pay-per-call advertising and billing
US8837698B2 (en) * 2003-10-06 2014-09-16 Yp Interactive Llc Systems and methods to collect information just in time for connecting people for real time communications
US8027878B2 (en) 2003-10-06 2011-09-27 Utbk, Inc. Method and apparatus to compensate demand partners in a pay-per-call performance based advertising system
US9984377B2 (en) 2003-10-06 2018-05-29 Yellowpages.Com Llc System and method for providing advertisement
US8140389B2 (en) 2003-10-06 2012-03-20 Utbk, Inc. Methods and apparatuses for pay for deal advertisements
US7421458B1 (en) 2003-10-16 2008-09-02 Informatica Corporation Querying, versioning, and dynamic deployment of database objects
US20050163136A1 (en) * 2003-11-17 2005-07-28 Leo Chiu Multi-tenant self-service VXML portal
US7697673B2 (en) * 2003-11-17 2010-04-13 Apptera Inc. System for advertisement selection, placement and delivery within a multiple-tenant voice interaction service system
US8799001B2 (en) * 2003-11-17 2014-08-05 Nuance Communications, Inc. Method and system for defining standard catch styles for speech application code generation
US7756256B1 (en) * 2003-11-26 2010-07-13 Openwave Systems Inc. Unified and best messaging systems for communication devices
US20050119892A1 (en) 2003-12-02 2005-06-02 International Business Machines Corporation Method and arrangement for managing grammar options in a graphical callflow builder
US7254590B2 (en) * 2003-12-03 2007-08-07 Informatica Corporation Set-oriented real-time data processing based on transaction boundaries
US20050144015A1 (en) * 2003-12-08 2005-06-30 International Business Machines Corporation Automatic identification of optimal audio segments for speech applications
US20050125236A1 (en) * 2003-12-08 2005-06-09 International Business Machines Corporation Automatic capture of intonation cues in audio segments for speech applications
US9378187B2 (en) * 2003-12-11 2016-06-28 International Business Machines Corporation Creating a presentation document
US7162692B2 (en) * 2003-12-11 2007-01-09 International Business Machines Corporation Differential dynamic content delivery
US20050132274A1 (en) * 2003-12-11 2005-06-16 International Business Machine Corporation Creating a presentation document
US7487451B2 (en) * 2003-12-11 2009-02-03 International Business Machines Corporation Creating a voice response grammar from a presentation grammar
US20050132273A1 (en) * 2003-12-11 2005-06-16 International Business Machines Corporation Amending a session document during a presentation
US20050132271A1 (en) * 2003-12-11 2005-06-16 International Business Machines Corporation Creating a session document from a presentation document
US7634412B2 (en) * 2003-12-11 2009-12-15 Nuance Communications, Inc. Creating a voice response grammar from a user grammar
US20050129196A1 (en) * 2003-12-15 2005-06-16 International Business Machines Corporation Voice document with embedded tags
US8160883B2 (en) * 2004-01-10 2012-04-17 Microsoft Corporation Focus tracking in dialogs
US7552055B2 (en) 2004-01-10 2009-06-23 Microsoft Corporation Dialog component re-use in recognition systems
US7890848B2 (en) 2004-01-13 2011-02-15 International Business Machines Corporation Differential dynamic content delivery with alternative content presentation
US8499232B2 (en) * 2004-01-13 2013-07-30 International Business Machines Corporation Differential dynamic content delivery with a participant alterable session copy of a user profile
US7571380B2 (en) 2004-01-13 2009-08-04 International Business Machines Corporation Differential dynamic content delivery with a presenter-alterable session copy of a user profile
US7430707B2 (en) 2004-01-13 2008-09-30 International Business Machines Corporation Differential dynamic content delivery with device controlling action
US7542971B2 (en) * 2004-02-02 2009-06-02 Fuji Xerox Co., Ltd. Systems and methods for collaborative note-taking
FR2868036B1 (en) * 2004-03-24 2006-06-02 Eca Societe Par Actions Simpli DEVICE FOR LAUNCHING AND RECOVERING A SUBMERSIBLE VEHICLE
US8027458B1 (en) 2004-04-06 2011-09-27 Tuvox, Inc. Voice response system with live agent assisted information selection and machine playback
US7647227B1 (en) 2004-04-06 2010-01-12 Tuvox, Inc. Machine assisted speech generation for a conversational voice response system
US7519683B2 (en) 2004-04-26 2009-04-14 International Business Machines Corporation Dynamic media content for collaborators with client locations in dynamic client contexts
US7827239B2 (en) * 2004-04-26 2010-11-02 International Business Machines Corporation Dynamic media content for collaborators with client environment information in dynamic client contexts
US20050271186A1 (en) * 2004-06-02 2005-12-08 Audiopoint, Inc. System, method and computer program product for interactive voice notification
US7308083B2 (en) * 2004-06-30 2007-12-11 Glenayre Electronics, Inc. Message durability and retrieval in a geographically distributed voice messaging system
US7921362B2 (en) * 2004-07-08 2011-04-05 International Business Machines Corporation Differential dynamic delivery of presentation previews
US8185814B2 (en) * 2004-07-08 2012-05-22 International Business Machines Corporation Differential dynamic delivery of content according to user expressions of interest
US7487208B2 (en) 2004-07-08 2009-02-03 International Business Machines Corporation Differential dynamic content delivery to alternate display device locations
US7519904B2 (en) * 2004-07-08 2009-04-14 International Business Machines Corporation Differential dynamic delivery of content to users not in attendance at a presentation
US9167087B2 (en) 2004-07-13 2015-10-20 International Business Machines Corporation Dynamic media content for collaborators including disparate location representations
US7426538B2 (en) 2004-07-13 2008-09-16 International Business Machines Corporation Dynamic media content for collaborators with VOIP support for client communications
US7624016B2 (en) * 2004-07-23 2009-11-24 Microsoft Corporation Method and apparatus for robustly locating user barge-ins in voice-activated command systems
US7912206B2 (en) * 2004-07-30 2011-03-22 Miller John S Technique for providing a personalized electronic messaging service through an information assistance provider
US7580837B2 (en) 2004-08-12 2009-08-25 At&T Intellectual Property I, L.P. System and method for targeted tuning module of a speech recognition system
US8396712B2 (en) 2004-08-26 2013-03-12 West Corporation Method and system to generate finite state grammars using sample phrases
US20080154601A1 (en) * 2004-09-29 2008-06-26 Microsoft Corporation Method and system for providing menu and other services for an information processing system using a telephone or other audio interface
US8494855B1 (en) 2004-10-06 2013-07-23 West Interactive Corporation Ii Method, system, and computer readable medium for comparing phonetic similarity of return words to resolve ambiguities during voice recognition
WO2006058340A2 (en) * 2004-11-29 2006-06-01 Jingle Networks, Inc. Telephone search supported by response location advertising
US7242751B2 (en) 2004-12-06 2007-07-10 Sbc Knowledge Ventures, L.P. System and method for speech recognition-enabled automatic call routing
US7627638B1 (en) * 2004-12-20 2009-12-01 Google Inc. Verbal labels for electronic messages
EP1679867A1 (en) * 2005-01-06 2006-07-12 Orange SA Customisation of VoiceXML Application
US7751551B2 (en) 2005-01-10 2010-07-06 At&T Intellectual Property I, L.P. System and method for speech-enabled call routing
US9202219B2 (en) 2005-02-16 2015-12-01 Yellowpages.Com Llc System and method to merge pay-for-performance advertising models
US8538768B2 (en) 2005-02-16 2013-09-17 Ingenio Llc Methods and apparatuses for delivery of advice to mobile/wireless devices
US8934614B2 (en) * 2005-02-25 2015-01-13 YP Interatcive LLC Systems and methods for dynamic pay for performance advertisements
US7979308B2 (en) 2005-03-03 2011-07-12 Utbk, Inc. Methods and apparatuses for sorting lists for presentation
US7505569B2 (en) * 2005-03-18 2009-03-17 International Business Machines Corporation Diagnosing voice application issues of an operational environment
US7933399B2 (en) * 2005-03-22 2011-04-26 At&T Intellectual Property I, L.P. System and method for utilizing virtual agents in an interactive voice response application
US7475340B2 (en) * 2005-03-24 2009-01-06 International Business Machines Corporation Differential dynamic content delivery with indications of interest from non-participants
US7493556B2 (en) * 2005-03-31 2009-02-17 International Business Machines Corporation Differential dynamic content delivery with a session document recreated in dependence upon an interest of an identified user participant
US7657020B2 (en) 2005-06-03 2010-02-02 At&T Intellectual Property I, Lp Call routing system and method of using the same
US8396715B2 (en) * 2005-06-28 2013-03-12 Microsoft Corporation Confidence threshold tuning
US7321856B1 (en) 2005-08-03 2008-01-22 Microsoft Corporation Handling of speech recognition in a declarative markup language
US7640160B2 (en) 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7899160B2 (en) * 2005-08-24 2011-03-01 Verizon Business Global Llc Method and system for providing configurable application processing in support of dynamic human interaction flow
US7949529B2 (en) 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
EP1934971A4 (en) 2005-08-31 2010-10-27 Voicebox Technologies Inc Dynamic speech sharpening
US8599832B2 (en) 2005-09-28 2013-12-03 Ingenio Llc Methods and apparatuses to connect people for real time communications via voice over internet protocol (VOIP)
US8761154B2 (en) 2005-09-28 2014-06-24 Ebbe Altberg Methods and apparatuses to access advertisements through voice over internet protocol (VoIP) applications
US8023624B2 (en) * 2005-11-07 2011-09-20 Ack Ventures Holdings, Llc Service interfacing for telephony
US20070129950A1 (en) * 2005-12-05 2007-06-07 Kyoung Hyun Park Speech act-based voice XML dialogue apparatus for controlling dialogue flow and method thereof
US20070133511A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Composite services delivery utilizing lightweight messaging
US10332071B2 (en) * 2005-12-08 2019-06-25 International Business Machines Corporation Solution for adding context to a text exchange modality during interactions with a composite services application
US8189563B2 (en) * 2005-12-08 2012-05-29 International Business Machines Corporation View coordination for callers in a composite services enablement environment
US8005934B2 (en) * 2005-12-08 2011-08-23 International Business Machines Corporation Channel presence in a composite services enablement environment
US20070133509A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Initiating voice access to a session from a visual access channel to the session in a composite services delivery system
US20070136449A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Update notification for peer views in a composite services delivery environment
US7809838B2 (en) * 2005-12-08 2010-10-05 International Business Machines Corporation Managing concurrent data updates in a composite services delivery system
US7877486B2 (en) * 2005-12-08 2011-01-25 International Business Machines Corporation Auto-establishment of a voice channel of access to a session for a composite service from a visual channel of access to the session for the composite service
US20070133512A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Composite services enablement of visual navigation into a call center
US20070136793A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Secure access to a common session in a composite services delivery environment
US7818432B2 (en) * 2005-12-08 2010-10-19 International Business Machines Corporation Seamless reflection of model updates in a visual page for a visual channel in a composite services delivery system
US20070132834A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Speech disambiguation in a composite services enablement environment
US20070147355A1 (en) * 2005-12-08 2007-06-28 International Business Machines Corporation Composite services generation tool
US7890635B2 (en) * 2005-12-08 2011-02-15 International Business Machines Corporation Selective view synchronization for composite services delivery
US20070133773A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Composite services delivery
US20070133769A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Voice navigation of a visual view for a session in a composite services enablement environment
US20070136421A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Synchronized view state for composite services delivery
US7792971B2 (en) * 2005-12-08 2010-09-07 International Business Machines Corporation Visual channel refresh rate control for composite services delivery
US8259923B2 (en) * 2007-02-28 2012-09-04 International Business Machines Corporation Implementing a contact center using open standards and non-proprietary components
US11093898B2 (en) 2005-12-08 2021-08-17 International Business Machines Corporation Solution for adding context to a text exchange modality during interactions with a composite services application
US7827288B2 (en) * 2005-12-08 2010-11-02 International Business Machines Corporation Model autocompletion for composite services synchronization
US9197479B2 (en) 2006-01-10 2015-11-24 Yellowpages.Com Llc Systems and methods to manage a queue of people requesting real time communication connections
US8681778B2 (en) 2006-01-10 2014-03-25 Ingenio Llc Systems and methods to manage privilege to speak
US8125931B2 (en) 2006-01-10 2012-02-28 Utbk, Inc. Systems and methods to provide availability indication
US7720091B2 (en) 2006-01-10 2010-05-18 Utbk, Inc. Systems and methods to arrange call back
US20070180365A1 (en) * 2006-01-27 2007-08-02 Ashok Mitter Khosla Automated process and system for converting a flowchart into a speech mark-up language
DE102006004442A1 (en) * 2006-01-31 2007-08-09 Siemens Ag Apparatus and method for providing a voice browser functionality
US20070203875A1 (en) * 2006-02-24 2007-08-30 Intervoice Limited Partnership System and method for retrieving files from a file server using file attributes
US20070203874A1 (en) * 2006-02-24 2007-08-30 Intervoice Limited Partnership System and method for managing files on a file server using embedded metadata and a search engine
US20070203927A1 (en) * 2006-02-24 2007-08-30 Intervoice Limited Partnership System and method for defining and inserting metadata attributes in files
US20070201631A1 (en) * 2006-02-24 2007-08-30 Intervoice Limited Partnership System and method for defining, synthesizing and retrieving variable field utterances from a file server
US8311836B2 (en) * 2006-03-13 2012-11-13 Nuance Communications, Inc. Dynamic help including available speech commands from content contained within speech grammars
US20070291923A1 (en) * 2006-06-19 2007-12-20 Amy Hsieh Method and apparatus for the purchase, sale and facilitation of voice over internet protocol (VoIP) consultations
US20080037720A1 (en) * 2006-07-27 2008-02-14 Speechphone, Llc Voice Activated Communication Using Automatically Updated Address Books
US8732279B2 (en) 2006-08-18 2014-05-20 Cisco Technology, Inc. Secure network deployment
US8639782B2 (en) 2006-08-23 2014-01-28 Ebay, Inc. Method and system for sharing metadata between interfaces
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
KR100814641B1 (en) * 2006-10-23 2008-03-18 성균관대학교산학협력단 User driven voice service system and method thereof
US9317855B2 (en) 2006-10-24 2016-04-19 Yellowpages.Com Llc Systems and methods to provide voice connections via local telephone numbers
US8355913B2 (en) * 2006-11-03 2013-01-15 Nokia Corporation Speech recognition with adjustable timeout period
US8226416B2 (en) * 2006-12-08 2012-07-24 Sri International Method and apparatus for reading education
US8594305B2 (en) * 2006-12-22 2013-11-26 International Business Machines Corporation Enhancing contact centers with dialog contracts
US8451825B2 (en) 2007-02-22 2013-05-28 Utbk, Llc Systems and methods to confirm initiation of a callback
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8738393B2 (en) * 2007-02-27 2014-05-27 Telemanager Technologies, Inc. System and method for targeted healthcare messaging
US20080208628A1 (en) * 2007-02-27 2008-08-28 Telemanager Technologies, Inc. System and Method for Targeted Healthcare Messaging
US9247056B2 (en) * 2007-02-28 2016-01-26 International Business Machines Corporation Identifying contact center agents based upon biometric characteristics of an agent's speech
US9055150B2 (en) * 2007-02-28 2015-06-09 International Business Machines Corporation Skills based routing in a standards based contact center using a presence server and expertise specific watchers
US20080205625A1 (en) * 2007-02-28 2008-08-28 International Business Machines Corporation Extending a standardized presence document to include contact center specific elements
US20080263460A1 (en) * 2007-04-20 2008-10-23 Utbk, Inc. Methods and Systems to Connect People for Virtual Meeting in Virtual Reality
US20080262910A1 (en) * 2007-04-20 2008-10-23 Utbk, Inc. Methods and Systems to Connect People via Virtual Reality for Real Time Communications
US9277019B2 (en) 2007-06-18 2016-03-01 Yellowpages.Com Llc Systems and methods to provide communication references to connect people for real time communications
US9311420B2 (en) * 2007-06-20 2016-04-12 International Business Machines Corporation Customizing web 2.0 application behavior based on relationships between a content creator and a content requester
US20080319757A1 (en) * 2007-06-20 2008-12-25 International Business Machines Corporation Speech processing system based upon a representational state transfer (rest) architecture that uses web 2.0 concepts for speech resource interfaces
US9264483B2 (en) 2007-07-18 2016-02-16 Hammond Development International, Inc. Method and system for enabling a communication device to remotely execute an application
US8635069B2 (en) 2007-08-16 2014-01-21 Crimson Corporation Scripting support for data identifiers, voice recognition and speech in a telnet session
US8838476B2 (en) * 2007-09-07 2014-09-16 Yp Interactive Llc Systems and methods to provide information and connect people for real time communications
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US8386260B2 (en) * 2007-12-31 2013-02-26 Motorola Mobility Llc Methods and apparatus for implementing distributed multi-modal applications
US8370160B2 (en) * 2007-12-31 2013-02-05 Motorola Mobility Llc Methods and apparatus for implementing distributed multi-modal applications
EP2266269B1 (en) 2008-04-02 2019-01-02 Twilio Inc. System and method for processing telephony sessions
US8837465B2 (en) 2008-04-02 2014-09-16 Twilio, Inc. System and method for processing telephony sessions
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8589161B2 (en) 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US20090306983A1 (en) * 2008-06-09 2009-12-10 Microsoft Corporation User access and update of personal health records in a computerized health data store via voice inputs
CN102227904A (en) 2008-10-01 2011-10-26 特维里奥公司 Telephony web event system and method
US8605885B1 (en) 2008-10-23 2013-12-10 Next It Corporation Automated assistant for customer service representatives
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
EP2404412B1 (en) 2009-03-02 2019-05-01 Twilio Inc. Method and system for a multitenancy telephone network
US8509415B2 (en) 2009-03-02 2013-08-13 Twilio, Inc. Method and system for a multitenancy telephony network
US8811578B2 (en) * 2009-03-23 2014-08-19 Telemanager Technologies, Inc. System and method for providing local interactive voice response services
US8290780B2 (en) * 2009-06-24 2012-10-16 International Business Machines Corporation Dynamically extending the speech prompts of a multimodal application
JP5380543B2 (en) * 2009-09-25 2014-01-08 株式会社東芝 Spoken dialogue apparatus and program
JP5192468B2 (en) * 2009-09-29 2013-05-08 株式会社エヌ・ティ・ティ・ドコモ Data processing apparatus and program
US9210275B2 (en) 2009-10-07 2015-12-08 Twilio, Inc. System and method for running a multi-module telephony application
US8582737B2 (en) * 2009-10-07 2013-11-12 Twilio, Inc. System and method for running a multi-module telephony application
US20110083179A1 (en) * 2009-10-07 2011-04-07 Jeffrey Lawson System and method for mitigating a denial of service attack using cloud computing
US8996384B2 (en) * 2009-10-30 2015-03-31 Vocollect, Inc. Transforming components of a web page to voice prompts
US9502025B2 (en) 2009-11-10 2016-11-22 Voicebox Technologies Corporation System and method for providing a natural language content dedication service
US9171541B2 (en) 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
EP2526657B1 (en) * 2010-01-19 2019-02-20 Twilio Inc. Method and system for preserving telephony session state
US9338064B2 (en) 2010-06-23 2016-05-10 Twilio, Inc. System and method for managing a computing cluster
US9459925B2 (en) 2010-06-23 2016-10-04 Twilio, Inc. System and method for managing a computing cluster
US8416923B2 (en) 2010-06-23 2013-04-09 Twilio, Inc. Method for providing clean endpoint addresses
US9590849B2 (en) 2010-06-23 2017-03-07 Twilio, Inc. System and method for managing a computing cluster
US20120208495A1 (en) 2010-06-23 2012-08-16 Twilio, Inc. System and method for monitoring account usage on a platform
US9459926B2 (en) 2010-06-23 2016-10-04 Twilio, Inc. System and method for managing a computing cluster
US8838707B2 (en) 2010-06-25 2014-09-16 Twilio, Inc. System and method for enabling real-time eventing
US8649268B2 (en) 2011-02-04 2014-02-11 Twilio, Inc. Method for processing telephony sessions of a network
US9398622B2 (en) 2011-05-23 2016-07-19 Twilio, Inc. System and method for connecting a communication to a client
US9648006B2 (en) 2011-05-23 2017-05-09 Twilio, Inc. System and method for communicating with a client application
US20140044123A1 (en) 2011-05-23 2014-02-13 Twilio, Inc. System and method for real time communicating with a client application
EP2759123B1 (en) 2011-09-21 2018-08-15 Twilio, Inc. System and method for authorizing and connecting application developers and users
US10182147B2 (en) 2011-09-21 2019-01-15 Twilio Inc. System and method for determining and communicating presence information
US8595016B2 (en) 2011-12-23 2013-11-26 Angle, Llc Accessing content using a source-specific content-adaptable dialogue
US9495227B2 (en) 2012-02-10 2016-11-15 Twilio, Inc. System and method for managing concurrent events
US9240941B2 (en) 2012-05-09 2016-01-19 Twilio, Inc. System and method for managing media in a distributed communication network
US9602586B2 (en) 2012-05-09 2017-03-21 Twilio, Inc. System and method for managing media in a distributed communication network
US20130304928A1 (en) 2012-05-09 2013-11-14 Twilio, Inc. System and method for managing latency in a distributed telephony network
US9247062B2 (en) 2012-06-19 2016-01-26 Twilio, Inc. System and method for queuing a communication session
US8737962B2 (en) 2012-07-24 2014-05-27 Twilio, Inc. Method and system for preventing illicit use of a telephony platform
US8738051B2 (en) 2012-07-26 2014-05-27 Twilio, Inc. Method and system for controlling message routing
US20140088971A1 (en) * 2012-08-20 2014-03-27 Michael D. Metcalf System And Method For Voice Operated Communication Assistance
US8938053B2 (en) 2012-10-15 2015-01-20 Twilio, Inc. System and method for triggering on platform usage
US8948356B2 (en) 2012-10-15 2015-02-03 Twilio, Inc. System and method for routing communications
US9691377B2 (en) 2013-07-23 2017-06-27 Google Technology Holdings LLC Method and device for voice recognition training
US9253254B2 (en) 2013-01-14 2016-02-02 Twilio, Inc. System and method for offering a multi-partner delegated platform
US9275638B2 (en) 2013-03-12 2016-03-01 Google Technology Holdings LLC Method and apparatus for training a voice recognition model database
US9282124B2 (en) 2013-03-14 2016-03-08 Twilio, Inc. System and method for integrating session initiation protocol communication in a telecommunications platform
US9001666B2 (en) 2013-03-15 2015-04-07 Twilio, Inc. System and method for improving routing in a distributed communication platform
US9225840B2 (en) 2013-06-19 2015-12-29 Twilio, Inc. System and method for providing a communication endpoint information service
US9160696B2 (en) 2013-06-19 2015-10-13 Twilio, Inc. System for transforming media resource into destination device compatible messaging format
US9338280B2 (en) 2013-06-19 2016-05-10 Twilio, Inc. System and method for managing telephony endpoint inventory
GB2518128B (en) * 2013-06-20 2021-02-10 Nokia Technologies Oy Charging rechargeable apparatus
US9483328B2 (en) 2013-07-19 2016-11-01 Twilio, Inc. System and method for delivering application content
US9548047B2 (en) 2013-07-31 2017-01-17 Google Technology Holdings LLC Method and apparatus for evaluating trigger phrase enrollment
US9338018B2 (en) 2013-09-17 2016-05-10 Twilio, Inc. System and method for pricing communication of a telecommunication platform
US9274858B2 (en) 2013-09-17 2016-03-01 Twilio, Inc. System and method for tagging and tracking events of an application platform
US9137127B2 (en) 2013-09-17 2015-09-15 Twilio, Inc. System and method for providing communication platform metadata
US9325624B2 (en) 2013-11-12 2016-04-26 Twilio, Inc. System and method for enabling dynamic multi-modal communication
US9553799B2 (en) 2013-11-12 2017-01-24 Twilio, Inc. System and method for client communication in a distributed telephony network
US9344573B2 (en) 2014-03-14 2016-05-17 Twilio, Inc. System and method for a work distribution service
US9226217B2 (en) 2014-04-17 2015-12-29 Twilio, Inc. System and method for enabling multi-modal communication
US9516101B2 (en) 2014-07-07 2016-12-06 Twilio, Inc. System and method for collecting feedback in a multi-tenant communication platform
US9246694B1 (en) 2014-07-07 2016-01-26 Twilio, Inc. System and method for managing conferencing in a distributed communication network
US9251371B2 (en) 2014-07-07 2016-02-02 Twilio, Inc. Method and system for applying data retention policies in a computing platform
US9774687B2 (en) 2014-07-07 2017-09-26 Twilio, Inc. System and method for managing media and signaling in a communication platform
US10033797B1 (en) 2014-08-20 2018-07-24 Ivanti, Inc. Terminal emulation over HTML
WO2016044290A1 (en) 2014-09-16 2016-03-24 Kennewick Michael R Voice commerce
WO2016044321A1 (en) 2014-09-16 2016-03-24 Min Tang Integration of domain information into state transitions of a finite state transducer for natural language processing
US9747896B2 (en) 2014-10-15 2017-08-29 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US9749428B2 (en) 2014-10-21 2017-08-29 Twilio, Inc. System and method for providing a network discovery service platform
CN107077315B (en) * 2014-11-11 2020-05-12 瑞典爱立信有限公司 System and method for selecting speech to be used during communication with a user
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US9477975B2 (en) 2015-02-03 2016-10-25 Twilio, Inc. System and method for a media intelligence platform
US10419891B2 (en) 2015-05-14 2019-09-17 Twilio, Inc. System and method for communicating through multiple endpoints
US9948703B2 (en) 2015-05-14 2018-04-17 Twilio, Inc. System and method for signaling through data storage
US10659349B2 (en) 2016-02-04 2020-05-19 Twilio Inc. Systems and methods for providing secure network exchanged for a multitenant virtual private cloud
US10650046B2 (en) 2016-02-05 2020-05-12 Sas Institute Inc. Many task computing with distributed file system
US10642896B2 (en) 2016-02-05 2020-05-05 Sas Institute Inc. Handling of data sets during execution of task routines of multiple languages
US10795935B2 (en) 2016-02-05 2020-10-06 Sas Institute Inc. Automated generation of job flow definitions
US10686902B2 (en) 2016-05-23 2020-06-16 Twilio Inc. System and method for a multi-channel notification service
US10063713B2 (en) 2016-05-23 2018-08-28 Twilio Inc. System and method for programmatic device connectivity
US11100278B2 (en) 2016-07-28 2021-08-24 Ivanti, Inc. Systems and methods for presentation of a terminal application screen
WO2018023106A1 (en) 2016-07-29 2018-02-01 Erik SWART System and method of disambiguating natural language processing requests
USD898059S1 (en) 2017-02-06 2020-10-06 Sas Institute Inc. Display screen or portion thereof with graphical user interface
USD898060S1 (en) 2017-06-05 2020-10-06 Sas Institute Inc. Display screen or portion thereof with graphical user interface
US10503498B2 (en) 2017-11-16 2019-12-10 Sas Institute Inc. Scalable cloud-based time series analysis
US11259871B2 (en) 2018-04-26 2022-03-01 Vektor Medical, Inc. Identify ablation pattern for use in an ablation
US11065060B2 (en) 2018-04-26 2021-07-20 Vektor Medical, Inc. Identify ablation pattern for use in an ablation
US11576624B2 (en) 2018-04-26 2023-02-14 Vektor Medical, Inc. Generating approximations of cardiograms from different source configurations
US10860754B2 (en) 2018-04-26 2020-12-08 Vektor Medical, Inc. Calibration of simulated cardiograms
JP7153973B2 (en) 2018-11-13 2022-10-17 ベクトル メディカル インコーポレイテッド Dilation of images with source locations
US10595736B1 (en) 2019-06-10 2020-03-24 Vektor Medical, Inc. Heart graphic display system
US10709347B1 (en) 2019-06-10 2020-07-14 Vektor Medical, Inc. Heart graphic display system
US11338131B1 (en) 2021-05-05 2022-05-24 Vektor Medical, Inc. Guiding implantation of an energy delivery component in a body
US20230038493A1 (en) 2021-08-09 2023-02-09 Vektor Medical, Inc. Tissue state graphic display system
US11534224B1 (en) 2021-12-02 2022-12-27 Vektor Medical, Inc. Interactive ablation workflow system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799063A (en) * 1996-08-15 1998-08-25 Talk Web Inc. Communication system and method of providing access to pre-recorded audio messages via the Internet
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649117A (en) 1994-06-03 1997-07-15 Midwest Payment Systems System and method for paying bills and other obligations including selective payor and payee controls
US5655008A (en) 1995-06-07 1997-08-05 Dart, Inc. System and method for performing a variety of transactions having distributed decision-making capability
US5860073A (en) * 1995-07-17 1999-01-12 Microsoft Corporation Style sheets for publishing system
WO1997023973A1 (en) * 1995-12-22 1997-07-03 Rutgers University Method and system for audio access to information in a wide area computer network
US5953392A (en) * 1996-03-01 1999-09-14 Netphonic Communications, Inc. Method and apparatus for telephonically accessing and navigating the internet
GB2317070A (en) * 1996-09-07 1998-03-11 Ibm Voice processing/internet system
US5899975A (en) * 1997-04-03 1999-05-04 Sun Microsystems, Inc. Style sheets for speech-based presentation of web pages
US5877766A (en) * 1997-08-15 1999-03-02 International Business Machines Corporation Multi-node user interface component and method thereof for use in accessing a plurality of linked records
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof
US6385583B1 (en) * 1998-10-02 2002-05-07 Motorola, Inc. Markup language for interactive services and methods thereof
US6240391B1 (en) * 1999-05-25 2001-05-29 Lucent Technologies Inc. Method and apparatus for assembling and presenting structured voicemail messages
US6349132B1 (en) * 1999-12-16 2002-02-19 Talk2 Technology, Inc. Voice interface for electronic documents

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799063A (en) * 1996-08-15 1998-08-25 Talk Web Inc. Communication system and method of providing access to pre-recorded audio messages via the Internet
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1099213A4 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653545B1 (en) 1999-06-11 2010-01-26 Telstra Corporation Limited Method of developing an interactive system
AU777441B2 (en) * 1999-06-11 2004-10-14 Telstra Corporation Limited A method of developing an interactive system
WO2000078022A1 (en) * 1999-06-11 2000-12-21 Telstra New Wave Pty Ltd A method of developing an interactive system
GB2366010A (en) * 2000-03-23 2002-02-27 Canon Kk Machine interface including mark-up instructions and a word probability search
US6832197B2 (en) 2000-03-23 2004-12-14 Canon Kabushiki Kaisha Machine interface
GB2366010B (en) * 2000-03-23 2004-11-17 Canon Kk Machine interface
EP1139335A3 (en) * 2000-03-31 2001-12-05 Canon Kabushiki Kaisha Voice browser system
US7251602B2 (en) 2000-03-31 2007-07-31 Canon Kabushiki Kaisha Voice browser system
WO2002005264A1 (en) * 2000-07-07 2002-01-17 Siemens Aktiengesellschaft Voice-controlled system and method for voice input and voice recognition
WO2002007075A1 (en) * 2000-07-19 2002-01-24 Siemens Aktiengesellschaft Electronic calling card
US7062297B2 (en) 2000-07-21 2006-06-13 Telefonaktiebolaget L M Ericsson (Publ) Method and system for accessing a network using voice recognition
GB2373697B (en) * 2000-11-29 2005-01-12 Hewlett Packard Co Locality-Dependent presentation
GB2373697A (en) * 2000-11-29 2002-09-25 Hewlett Packard Co Locality-dependent presentation
WO2002044887A3 (en) * 2000-12-01 2003-04-24 Univ Columbia A method and system for voice activating web pages
US7640163B2 (en) 2000-12-01 2009-12-29 The Trustees Of Columbia University In The City Of New York Method and system for voice activating web pages
EP1881685A1 (en) * 2000-12-01 2008-01-23 The Trustees Of Columbia University In The City Of New York A method and system for voice activating web pages
EP1211861A1 (en) * 2000-12-04 2002-06-05 Alcatel Browser environment for accessing local and remote services on a phone
WO2002046959A3 (en) * 2000-12-08 2003-09-04 Koninkl Philips Electronics Nv Distributed speech recognition for internet access
WO2002046959A2 (en) * 2000-12-08 2002-06-13 Koninklijke Philips Electronics N.V. Distributed speech recognition for internet access
DE10064661A1 (en) * 2000-12-22 2002-07-11 Siemens Ag Communication arrangement and method for communication systems with interactive speech function
WO2002073599A1 (en) * 2001-03-12 2002-09-19 Mediavoice S.R.L. Method for enabling the voice interaction with a web page
WO2003039122A1 (en) * 2001-10-29 2003-05-08 Siemens Aktiengesellschaft Method and system for dynamic generation of announcement contents
US7212971B2 (en) 2001-12-20 2007-05-01 Canon Kabushiki Kaisha Control apparatus for enabling a user to communicate by speech with a processor-controlled apparatus
US7664649B2 (en) 2001-12-20 2010-02-16 Canon Kabushiki Kaisha Control apparatus, method and computer readable memory medium for enabling a user to communicate by speech with a processor-controlled apparatus
US7149287B1 (en) 2002-01-17 2006-12-12 Snowshore Networks, Inc. Universal voice browser framework
US7712031B2 (en) 2002-07-24 2010-05-04 Telstra Corporation Limited System and process for developing a voice application
US8046227B2 (en) 2002-09-06 2011-10-25 Telestra Corporation Limited Development system for a dialog system
US7917363B2 (en) 2003-02-11 2011-03-29 Telstra Corporation Limited System for predicting speech recognition accuracy and development for a dialog system
US8296129B2 (en) 2003-04-29 2012-10-23 Telstra Corporation Limited System and process for grammatical inference
US9202467B2 (en) 2003-06-06 2015-12-01 The Trustees Of Columbia University In The City Of New York System and method for voice activating web pages
EP3246828A1 (en) * 2016-05-19 2017-11-22 Palo Alto Research Center Incorporated Natural language web browser
US11599709B2 (en) 2016-05-19 2023-03-07 Palo Alto Research Center Incorporated Natural language web browser

Also Published As

Publication number Publication date
US6269336B1 (en) 2001-07-31
US6493673B1 (en) 2002-12-10
EP1099213A1 (en) 2001-05-16
EP1099213A4 (en) 2004-09-08
AU5227899A (en) 2000-02-14

Similar Documents

Publication Publication Date Title
US6539359B1 (en) Markup language for interactive services and methods thereof
US6269336B1 (en) Voice browser for interactive services and methods thereof
US20020006126A1 (en) Methods and systems for accessing information from an information source
WO2000005643A1 (en) Markup language for interactive services and methods thereof
US6751296B1 (en) System and method for creating a transaction usage record
US6996227B2 (en) Systems and methods for storing information associated with a subscriber
US6668046B1 (en) Method and system for generating a user&#39;s telecommunications bill
US6725256B1 (en) System and method for creating an e-mail usage record
US6668043B2 (en) Systems and methods for transmitting and receiving text data via a communication device
US20020118800A1 (en) Telecommunication systems and methods therefor
US7197461B1 (en) System and method for voice-enabled input for use in the creation and automatic deployment of personalized, dynamic, and interactive voice services
US6583716B2 (en) System and method for providing location-relevant services using stored location information
US7143147B1 (en) Method and apparatus for accessing a wide area network
US20030147518A1 (en) Methods and apparatus to deliver caller identification information
US20040203660A1 (en) Method of assisting a user placed on-hold
US20030185375A1 (en) Call transfer system and method
WO2001069422A2 (en) Multimodal information services
EP1323285A2 (en) Variable automated response system
US6570969B1 (en) System and method for creating a call usage record
US6700962B1 (en) System and method for creating a call detail record
US20040109543A1 (en) Method of accessing an information source
US20070121814A1 (en) Speech recognition based computer telephony system
US6711246B1 (en) System and method for creating a page usage record
US7512223B1 (en) System and method for locating an end user
EP1101343A1 (en) Telecommunication audio processing systems and methods therefor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1999937440

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999937440

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 1999937440

Country of ref document: EP