US20090290694A1 - Methods and system for creating voice files using a voicexml application - Google Patents

Methods and system for creating voice files using a voicexml application Download PDF

Info

Publication number
US20090290694A1
US20090290694A1 US12/536,040 US53604009A US2009290694A1 US 20090290694 A1 US20090290694 A1 US 20090290694A1 US 53604009 A US53604009 A US 53604009A US 2009290694 A1 US2009290694 A1 US 2009290694A1
Authority
US
United States
Prior art keywords
audio
audio file
extracted data
voice
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/536,040
Inventor
Senis Busayapongchai
Pichet Chintrakulchai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Priority to US12/536,040 priority Critical patent/US20090290694A1/en
Publication of US20090290694A1 publication Critical patent/US20090290694A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML

Definitions

  • This invention relates generally to methods and systems for creating voice files using a VoiceXML application. More particularly, the present invention relates to methods and systems for automating the assembly or creation of audio files from pre-recorded audio files, audio streams and/or synthesized speech files for presentation to listeners or for use in voice interactive services.
  • the caller may be connected to a voice interactive directory assistance system that may answer “welcome to the directory assistance service-please say the name of the party you wish to reach.”
  • a caller may call a goods provider, such as a department store, and the caller may receive an automated voice interactive answering service such as “if you know the number of the store department you would like to reach, please enter the number now.”
  • Such voice interactive services may be provided by on-the-premises equipment, or a goods/services provider may utilize the voice interactive services of a third party, such as a telecommunications services provider.
  • audio files must be prepared for providing initial contact with the caller and for providing responses to requests by the caller.
  • an audio file such as “welcome to the directory assistance service” must be prepared by the telecommunications services provider for playing to a caller when the caller calls the telecommunications services provider for directory assistance.
  • Users of the recorded audio file such as telecommunications services providers or other goods/services providers, may maintain a number of pre-recorded audio files for providing to listeners, as described above. That is, a pre-recorded audio file such as “welcome to the directory assistance service” may be established by a telecommunications services provider and may be saved for subsequent use.
  • Developers of audio files for use in voice interactive services systems typically create a number of pre-recorded files that may be utilized individually or that may be combined with other pre-recorded audio files to create a desired audio file. For example, because a telecommunications services provider knows that it will need the audio file “welcome to the directory assistance service” a pre-recorded audio file for that statement may be prepared using a number of different age and gender voice talents, such as male youth, male adult, female youth, female adult, etc. Additionally, a number of statement segments may be pre-recorded that may subsequently be combined to create a desired audio file.
  • a pre-recorded phrase such as “welcome to” may be recorded
  • a pre-recorded audio such as “directory assistance” may be recorded
  • a pre-recorded file such as “please say the name” may recorded
  • a pre-recorded file such as “of the party you wish to reach” may be recorded.
  • a developer of audio files for use in a voice interactive service desires to create an audio file for the statement “welcome to the directory assistance service-please say the name of the party you wish to reach,” the developer may be required to combine the pre-recorded audio statement segments to create the desired statement.
  • developers of audio files for use in voice interactive services must determine the file location and file name of audio files or audio file statement segments required by the developer.
  • the developer must manually search a database of audio files or audio file statement segments to locate desired audio files.
  • the developer may require a specific voice talent, such as female adult, the developer must select audio files, listen to the audio files, and then determine whether a selected audio file is appropriate, or determine whether a selected audio file statement fragment may be used in association with other audio file statement segments to create a desired audio file.
  • the developer may have to select and listen to a number of statement segments such as “welcome to” and a number of statement segments such as “the directory assistance services” to find audio file statement segments that may be used to create a desired audio file. Because there are many ways to break a desired audio file statement into segments, the task of finding the appropriate way to break the desired phrase and finding suitable recorded audio files for each segment is very tedious, time consuming and error prone. Furthermore, if there are no files that satisfy a particular audio file statement, or if there are insufficient audio file segments to combine to create a desired audio file statement, the developer must create a new audio file or audio file segment.
  • Embodiments of the present invention provide methods and systems for automating the assembly or creation of audio files for providing to listeners or for use in voice interactive services.
  • an audio file developer prepares a voice application script and inserts text associated with a desired audio file statement in the voice application in a location in the script where the developer would ordinarily insert an audio file name of a pre-recorded audio file of the desired audio file statement.
  • a recording manager software program passes the voice application script to an Extensible Markup Language (XML) parser that locates audio file tags in the voice application script associated with audio files or audio file text.
  • the XML parser extracts voice properties for each found audio tag, such as age and gender properties associated with each found audio tag. If no voice properties are found, default properties, such as female adult, are set for the audio file or audio file text associated with the audio file tag.
  • XML Extensible Markup Language
  • the XML parser extracts the text string entered by the developer, and the recording manager software module passes the text string and associated properties in a database query to an audio file recording library database for locating an audio file matching the text string and properties. For example, if the text string comprises “welcome to the directory assistance services,” the text string is passed by the recording manager software module along with the desired properties, such as female adult, in a database query to an audio file recording library to locate an audio file matching the desired text string and properties. If an exact matching audio file with matching voice properties is located, the file may be automatically accepted, or the file may be passed to the developer for review.
  • the file name for the audio file is populated into the voice application script being prepared by the developer so that upon execution of the voice application script, the located audio file will be called by the script for presentation to a user or for use in a voice interactive services system.
  • a determination as to whether partial matches for the desired audio file text are found. That is, a determination is made as to whether audio file segments are located that may be combined to provide the desired audio file statement.
  • a first attempt is made to locate audio file segments having the required properties for the desired audio file. If audio file segments are located that may be combined to create the desired audio file having the required properties, a combination of the audio file segments is created and is passed to the developer for review. If audio file segments containing the proper statement segments are found, but not containing the required voice properties, a second combination of audio file segments may be combined and passed to the developer for review.
  • a third combination of the located audio file segments may be prepared and passed to the developer for review.
  • the developer may accept one or more of the audio file segment combinations, and an audio file name associated with the selected combination is populated into the voice application script for subsequent execution for presenting the desired audio file to a listener or for use in a voice interactive services system.
  • a manual process may be followed for obtaining a voice talent having the required voice properties for creating a new voice audio file, or for creating a required voice audio file segment for combining with previously located voice audio file segments for creating an acceptable combination of voice audio file segments.
  • FIG. 1 is a simplified block diagram illustrating components of exemplary architecture for embodiments of the present invention.
  • FIG. 2 is a simplified block diagram of a computer and associate peripheral and networked devices that provide and exemplary operating environment for the present invention.
  • FIGS. 3 , 4 , and 5 are flow diagrams illustrating a method for automating the assembly or creations of voice audio files for presentation to listeners or for use in voice interactive services.
  • the present invention is directed to methods and system for automating the creation or assembly of voice audio files for presentation to listeners or for use in voice interactive services.
  • voice application audio files are constructed for presentation to listeners or for use in a voice interactive services system, as briefly described above.
  • voice software applications allow spoken dialogues between users and voice systems.
  • Such a system allows users to converse with the voice system where a user is provided with a voice prompt such as “for service in English, press 1” followed by a response from the user whereby the user may speak a response to the system or select a response mechanically such as by selecting a numeral on a telephone keypad.
  • a computer and associated peripheral and networked devices communicate with a caller via computer telephony interfaces.
  • a receiving computer When a voice request or manual request (selection of a keypad numeral), is received from a caller via a computer telephony interface, a receiving computer locates a responsive voice audio file for presentation to the caller.
  • a software application executed by the computer may obtain the required voice audio file and may play the voice audio file to the caller, and the computer may then play or caused to be played the selected voice audio file to the caller.
  • the computer may locate and execute additional voice audio files, or the computer may provide or cause to be provided a service, such as directory assistance services, responsive to the request received from the caller via the voice interactive session.
  • Voice Extensible Markup Language is a standard scripting language widely used for developing voice applications for executing voice audio files according to embodiments of the present invention.
  • Voice application developers may use a variety of text editors, or graphical user interface editors to write VoiceXML applications.
  • a suitable VoiceXML application editor is V-Builder provided by Nuance Company.
  • FIG. 1 is a simplified block diagram illustrating components of an exemplary architecture for embodiments of the present invention.
  • a recording manager 130 is a software application program module designed to assist the developer in automatically managing previously recorded audio files or audio file segments for developing desired VoiceXML applications.
  • the functionality of the recording manager 130 is provided in combination with a VoiceXML text editor module 110 , a VoiceXML parser 120 and local or remote recording library 140 .
  • the recording library 140 may be a local or remotely stored database containing audio files for use in accordance with embodiments of the present invention.
  • the recording manager module 130 works as a post-processor application and is applied to a VoiceXML application after the developer has edited the VoiceXML code for the application.
  • VoiceXML is a scripting language based on the Extensible Markup Language (XML).
  • XML Extensible Markup Language
  • an audio file name is an attribute specified in an “audio” tag.
  • the audio file name is specified via a uniform resource indicator “URI” in a source attribute, but it may also be specified as a variable specified in an expression attribute.
  • URI uniform resource indicator
  • a typical VoiceXML script may be as follows:
  • “hello.wav” may be an audio file which when executed by a computer executing the VoiceXML script plays to a listener the phrase “hello.”
  • voice attributes may be specified such as male adult, male youth, female adult, female youth, etc.
  • the following VoiceXML script specifies a voice gender of “male” a category of “adult” and voice talent named “Tom”.
  • Attributes of male and adult may be utilized to define the voice audio file as male and adult, and the voice talent of “Tom” may be utilized to locate a voice audio file recorded by a live voice talent named “Tom.”
  • the application developer using the VoiceXML text editor module 110 prepares a VoiceXML script, as set out above.
  • the VoiceXML parser 120 parses the VoiceXML script and searches for “audio” tags and “source” attributes, and extracts the text content, for example “hello” specified for the located audio tag.
  • the recording manager 130 passes the associated text and audio file properties or attributes, for example male youth, to the recording library 140 via a database query to search for an existing audio file reference matching the desired audio file. If an existing audio file is found in the recording library 140 , the recording manager 130 retrieves the audio file or a combination of audio files that may be combined to create the desired audio file.
  • the located single audio file or combination of audio files are presented to the developer who has the option of allowing the recording manager 130 to automatically populate the VoiceXML script with the audio file name associated with the located audio file or combination of audio files, or the developer may manually verify the recordings by playing the audio files for review. If no matching audio file or combination of audio file segments is found, new recording references may be created.
  • the recording manager 130 may interface with a VoiceXML graphical user interface editor in which case the recording manager 130 concentrates on parsing the VoiceXML code generated by the VoiceXML graphical user interface editor without having to search through the entire VoiceXML code for individual audio tags.
  • the VoiceXML text editor module 110 may be resident on the developer's computer 204 , described below.
  • the modules 110 , 120 , and 130 may be accessed by the developer from a local or remote server accessible to the developer from the computer 204 .
  • the recording library 140 may be a database of recorded audio files resident at the developer's computer 204 or resident at a local or remote server accessible by the developer via a distributed computing environment such as the Internet.
  • FIG. 2 illustrates the architecture of a suitable computing device and associated peripheral devices for use in implementing the methods and systems of the present invention. While the invention is described in the general context of program modules that execute in conjunction with an application program that runs on an operating system on a personal computer, those skilled in the art will recognize that the invention may also be implemented in combination with other program modules. Generally, program modules include routines, programs, components, data structures and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, multiprocessor-based or programmable consumer electronics, mini computers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory source devices.
  • the computer architecture shown in FIG. 2 illustrates a conventional server or personal computer 204 , including a central processing unit 216 (“CPU”), a system memory 224 , including a random access memory 226 (“RAM”) and a read-only memory (“ROM”) 228 , and a system bus 222 that couples the memory to the CPU 216 .
  • the computer 2044 further includes a mass storage device 234 for storing an operating system 232 suitable for controlling the operation of a networked computer, such as the WINDOWS NT or XP operating systems from MICROSOFT CORPORATION of Redmond, Wash.
  • the mass storage device 234 may also store application programs, such as the computer program 208 , the VoiceXML text editor 110 , the VoiceXML parser 120 and the recording manager 130 .
  • the mass storage device may also include data such as the
  • the mass storage device 234 is connected to the CPU 216 through a mass storage controller (not shown) connected to the bus 222 .
  • the mass storage device 234 and its associated computer-readable media provide non-volatile storage for the computer 204 .
  • computer-readable media can be any available media that can be accessed by the computer 204 .
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
  • the computer 204 may operate in a networked environment using logical connections to remote computers through a network 214 , such as the Internet or a LAN.
  • the computer 204 may connect to the network 214 through a network interface unit 218 connected to the bus 222 . It should be appreciated that the network interface unit 218 may also be utilized to connect to other types of networks and remote computer systems.
  • the computer 204 may also include an input/output controller 220 for receiving and processing input from a number of devices, including a keyboard, mouse, or electronic stylus (not shown in FIG. 2 ). Similarly, an input/output controller 220 may provide output to a display screen, a printer, or others type of output devices.
  • FIGS. 3 , 4 , and 5 are flow diagrams illustrating a method for automating the assembly or creations of voice audio files for presentation to listeners or for use in voice interactive services.
  • the method 300 begins at start step 302 and proceeds to step 304 where a VoiceXML script developer creates a VoiceXML script having desired audio tags, such as the illustrative VoiceXML script described above with reference to FIG. 1 .
  • a VoiceXML script developer creates a VoiceXML script having desired audio tags, such as the illustrative VoiceXML script described above with reference to FIG. 1 .
  • the developer may wish to create a VoiceXML script for playing an announcement to a caller such as “welcome to your telecommunications services provider—for services in English press 1 or say English.”
  • the developer inserts into an audio tag the required text “welcome to your telecommunications services provider—for services in English, press 1 or say English” into the VoiceXML script instead of a specified audio file name.
  • the developer utilizes her VoiceXML text editor 110 or graphical user interface editor for preparation of the VoiceXML script.
  • the recording manager software application 130 passes the VoiceXML script to the XML parser 120 .
  • the XML parser 120 parses the received VoiceXML script to locate any audio tags contained therein.
  • voice properties if a voice property such as “male adult” is specified for the located audio tag, as described above with reference to FIG. 1 , the XML parser locates the property and extracts the property from the script
  • the parser 120 extracts the text associate with the located audio tag, for example “welcome to your telecommunications services provider—for services in English, press 1 or say English.”
  • the extracted text strings and extracted voice properties, if any, are passed by the parser 120 to the recording manager 130 .
  • the recording manager 130 passes the extracted text string and voice properties including default voice properties, if required, in a database query to the recording library 140 .
  • a database lookup is performed to determine whether an exact matching audio file with matching voice properties is located in the recording library 140 .
  • a lookup is performed to determine whether partially matching audio files are located in the recording library 140 .
  • the recording manager 130 may pass a number of database queries made up of various combinations of the extracted text and proprieties. For example, the recording manager 130 may first pass the extracted text string and associated voice properties.
  • the recording manager may pass individual database queries containing each word in the extracted text string such as “welcome”, “to”, “your,” “telecommunications,” “service,” and so on to locate individual pre-recorded audio files for each individual word of the extracted text string.
  • a number of combinations of individual words may also be passed to the recording library, such as “telecommunications services provider,” where there is a high probability that a previously recorded audio file exists for the combined words.
  • various combinations of words and voice properties may also be passed by the recording manger 130 to the recording library 140 .
  • a determination is made as to whether any matching audio file references are located in the recording library 140 . If no references are found, the methods proceed to step 348 and manual process for creating a required audio file is followed, as described below.
  • the method proceeds to step 330 and a determination is made as to whether an exact match for the desired text and voice properties is located. If so, the method proceeds to step 332 , FIG. 5 , and a determination is made as to whether the audio file located in the recording library 140 should be automatically accepted. That is, the voice application developer may decide to automatically accept, without review, any audio file located by the recording manger 130 in the recording library 140 matching the desired text and voice properties. If the developer has designated automatic acceptance, the method proceeds to step 346 and the recording manager 130 populates the VoiceXML script audio tag with the audio file name located in the recording library 140 . Accordingly, when the VoiceXML script is subsequently executed, the designated audio file is played.
  • step 334 the located matching audio file is passed to the developer for review.
  • the developer may play the located audio file via a speaker associated with the developer's computer 204 to determine whether the located audio file meets the developer's requirements. If the developer is satisfied with the located audio file, the method proceeds to step 336 and the developer may accept the located audio file. If so, the method proceeds to step 346 and the audio file name is populated into the VoiceXML script, as described above. If the developer is not satisfied with the located matching audio file, the method proceeds to step 338 for a determination as to whether partially matching references may be combined to provide the developer with an audio file that is more satisfactory to the developer.
  • a matching audio file may have been located as described above having the desired text and the desired voice properties, but upon reviewing the located audio file, the developer may not be satisfied with the voice talent utilized for creation of the previously recorded file. That is, the developer may desire a more youthful voice, or the developer may determine that a voice of a different gender may be more satisfactory for the desired implementation.
  • step 330 if no audio files matching the exact text string and required voice properties are located, or if such a file is located but, the developer rejects the located file, then the method proceeds to step 338 , FIG. 5 , and a determination is made as to whether partially matching audio files are located in the recording library 140 . That is, a determination is made as to whether audio files matching segments of the text string and associated voice properties are found.
  • an audio file having the desired voice properties may be found which when executed plays “welcome to your telecommunications services provider,” and a second audio file may be located having the desired voice properties such as “male adult,” which when executed plays the phrase “for services in English, press 1 or say English.” If no partially matching audio files are located, the method proceeds to step 348 and a manual development process may be utilized, as described below. If partially matching audio files, are located in the recording library 140 , the method proceeds to step 340 and a combination of the references is prepared for presentation to the developer.
  • a combination of audio file references is presented to the developer for review.
  • the developer may then listen to the combination of audio file references, and the method proceeds to step 344 where the developer may accept or reject the combination of audio file references. If the developer reviews the combination of audio file references and determines that the combination will create a satisfactory audio file, the method proceeds to step 346 .
  • the recording manger 130 populates the VoiceXML script with an audio file name which when executed will play the combined references. For example, the XML script may be populated with the audio file name comprised of a first audio file plus a second audio file so that when the resulting VoiceXML script is executed, audio file 1 will be played followed by audio file 2 to provide the listener or caller with the desired audio announcement.
  • VoiceXML script is populated with audio file names for subsequent play when the script is executed.
  • Examples of how VoiceXML script may be structured according to the present invention are as follows. If a developer desires a file which when played provides an audio-formatted statement “hello world,” three different VoiceXML script statements may be structured as follows.
  • All three example script statements play “hello world” when executed.
  • Script statements 1 and 2 play two files, namely “hello.wav” and “world.wav.”
  • Script statement 3 plays a single file, namely “hello_world.wav.”
  • File 1 includes a reference pointing to the concatenation of two files, namely “hello.wav” and “world.wav.”
  • File 3 will be replaced by file 1 if the recording library 140 does not have a single file providing “hello world,” but that includes two files providing “hello” and “world.”
  • step 348 a manual development process may be performed by the developer. That is, the developer may decide that a voice talent such as a male adult speaker must be obtained who will record a new audio file that is satisfactory to the developer. Or, the developer may determine that the voice talent is required only to record a new audio file segment for combining with previously recorded audio file segments located in the recording library 140 .
  • an audio file name associated with the manually created audio file is populated into the VoiceXML script, as described above with reference to step 346 .
  • step 346 after an audio file name associated with a single audio file, a combination of audio files, or a newly created audio file is populated into the VoiceXML script, the method proceeds back to step 310 , and the XML parser may locate the next audio tag in the VoiceXML script prepared by the developer. The method then proceeds, as described above, for locating an acceptable audio file for association with the next located audio tag. The method ends at 350 .
  • a VoiceXML script developer may populate a script with audio file names located in a repository of previously recorded audio files without the need for manually locating potentially satisfactory audio files one file at a time. Only if the automated system is unable to locate satisfactory previously recorded audio files for use by the developer does the developer utilize a manual process for creating or otherwise obtaining a satisfactory audio file or a combination of audio files. It will be apparent to those skilled in the art that various modifications or variations may be made in the present invention without departing from the scope or spirit of the invention. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein.

Abstract

Methods and systems for audio file insertion in spoken dialog code for use in interactive voice services are provided. The method includes identifying an audio tag in spoken dialog code of an interactive voice system, extracting data associated with the audio tag, generating a database query based on the extracted data, and retrieving at least one audio filename associated with an audio file to be played to a user in the interactive voice system, and replacing the extracted data with the audio filename in the spoken dialog code if the audio file associated with the audio filename matches at least a portion of the extracted data. The system includes a processor and modules for performing the steps of the method.

Description

    PRIORITY INFORMATION
  • This application is a continuation application of U.S. patent application Ser. No. 10/458,532, filed Jun. 10, 2003, the contents of which is incorporated herein by reference in its entirety.
  • FIELD OF THE INVENTION
  • This invention relates generally to methods and systems for creating voice files using a VoiceXML application. More particularly, the present invention relates to methods and systems for automating the assembly or creation of audio files from pre-recorded audio files, audio streams and/or synthesized speech files for presentation to listeners or for use in voice interactive services.
  • BACKGROUND OF THE INVENTION
  • With the advent of modern telecommunications systems, users call a variety of goods and services providers for a number of goods and/or services related issues. Users call their wire line and wireless telecommunication services providers for services such as directory assistance, voice mail services, services maintenance, and the like. Likewise, customers call a variety of vendors for goods and services such as financial services, general information services, and the like. Because of the enormous volume of such calls, may services providers and goods vendors make use of voice interactive services systems for reducing the number of live personnel required to process incoming calls. For example, a caller may call her telecommunications services provider for directory assistance. Rather than connecting the caller to a live operator, the caller may be connected to a voice interactive directory assistance system that may answer “welcome to the directory assistance service-please say the name of the party you wish to reach.” Likewise, a caller may call a goods provider, such as a department store, and the caller may receive an automated voice interactive answering service such as “if you know the number of the store department you would like to reach, please enter the number now.” Such voice interactive services may be provided by on-the-premises equipment, or a goods/services provider may utilize the voice interactive services of a third party, such as a telecommunications services provider.
  • In order to provide such voice interactive services, audio files must be prepared for providing initial contact with the caller and for providing responses to requests by the caller. For example, following from the example described above, an audio file such as “welcome to the directory assistance service” must be prepared by the telecommunications services provider for playing to a caller when the caller calls the telecommunications services provider for directory assistance. Users of the recorded audio file, such as telecommunications services providers or other goods/services providers, may maintain a number of pre-recorded audio files for providing to listeners, as described above. That is, a pre-recorded audio file such as “welcome to the directory assistance service” may be established by a telecommunications services provider and may be saved for subsequent use.
  • Developers of audio files for use in voice interactive services systems, typically create a number of pre-recorded files that may be utilized individually or that may be combined with other pre-recorded audio files to create a desired audio file. For example, because a telecommunications services provider knows that it will need the audio file “welcome to the directory assistance service” a pre-recorded audio file for that statement may be prepared using a number of different age and gender voice talents, such as male youth, male adult, female youth, female adult, etc. Additionally, a number of statement segments may be pre-recorded that may subsequently be combined to create a desired audio file. For example, a pre-recorded phrase such as “welcome to” may be recorded, a pre-recorded audio such as “directory assistance” may be recorded, a pre-recorded file such as “please say the name” may recorded, and a pre-recorded file such as “of the party you wish to reach” may be recorded. Subsequently, if a developer of audio files for use in a voice interactive service, as described above, desires to create an audio file for the statement “welcome to the directory assistance service-please say the name of the party you wish to reach,” the developer may be required to combine the pre-recorded audio statement segments to create the desired statement.
  • According to prior art systems, developers of audio files for use in voice interactive services must determine the file location and file name of audio files or audio file statement segments required by the developer. Typically, the developer must manually search a database of audio files or audio file statement segments to locate desired audio files. Unfortunately, because the developer may require a specific voice talent, such as female adult, the developer must select audio files, listen to the audio files, and then determine whether a selected audio file is appropriate, or determine whether a selected audio file statement fragment may be used in association with other audio file statement segments to create a desired audio file. That is, the developer may have to select and listen to a number of statement segments such as “welcome to” and a number of statement segments such as “the directory assistance services” to find audio file statement segments that may be used to create a desired audio file. Because there are many ways to break a desired audio file statement into segments, the task of finding the appropriate way to break the desired phrase and finding suitable recorded audio files for each segment is very tedious, time consuming and error prone. Furthermore, if there are no files that satisfy a particular audio file statement, or if there are insufficient audio file segments to combine to create a desired audio file statement, the developer must create a new audio file or audio file segment.
  • It is with respect to these and other considerations that the present invention has been made.
  • SUMMARY OF THE INVENTION
  • Embodiments of the present invention provide methods and systems for automating the assembly or creation of audio files for providing to listeners or for use in voice interactive services. According to one aspect of the present invention, an audio file developer prepares a voice application script and inserts text associated with a desired audio file statement in the voice application in a location in the script where the developer would ordinarily insert an audio file name of a pre-recorded audio file of the desired audio file statement. A recording manager software program passes the voice application script to an Extensible Markup Language (XML) parser that locates audio file tags in the voice application script associated with audio files or audio file text. The XML parser extracts voice properties for each found audio tag, such as age and gender properties associated with each found audio tag. If no voice properties are found, default properties, such as female adult, are set for the audio file or audio file text associated with the audio file tag.
  • Next, the XML parser extracts the text string entered by the developer, and the recording manager software module passes the text string and associated properties in a database query to an audio file recording library database for locating an audio file matching the text string and properties. For example, if the text string comprises “welcome to the directory assistance services,” the text string is passed by the recording manager software module along with the desired properties, such as female adult, in a database query to an audio file recording library to locate an audio file matching the desired text string and properties. If an exact matching audio file with matching voice properties is located, the file may be automatically accepted, or the file may be passed to the developer for review. If the audio file is accepted by the developer, or if the audio file is automatically accepted, the file name for the audio file is populated into the voice application script being prepared by the developer so that upon execution of the voice application script, the located audio file will be called by the script for presentation to a user or for use in a voice interactive services system.
  • If an exact match for the audio file text and voice properties is not found, a determination as to whether partial matches for the desired audio file text are found. That is, a determination is made as to whether audio file segments are located that may be combined to provide the desired audio file statement. According to one aspect of the invention, a first attempt is made to locate audio file segments having the required properties for the desired audio file. If audio file segments are located that may be combined to create the desired audio file having the required properties, a combination of the audio file segments is created and is passed to the developer for review. If audio file segments containing the proper statement segments are found, but not containing the required voice properties, a second combination of audio file segments may be combined and passed to the developer for review. And, if audio file segments are found that may be combined to only partially create the desired file statement, a third combination of the located audio file segments may be prepared and passed to the developer for review. Once the developer receives and reviews the combined audio file segments, the developer may accept one or more of the audio file segment combinations, and an audio file name associated with the selected combination is populated into the voice application script for subsequent execution for presenting the desired audio file to a listener or for use in a voice interactive services system. If no acceptable audio file is provided to the developer, or if only a partially acceptable audio file is provided to the developer, a manual process may be followed for obtaining a voice talent having the required voice properties for creating a new voice audio file, or for creating a required voice audio file segment for combining with previously located voice audio file segments for creating an acceptable combination of voice audio file segments.
  • These and other features and advantages, which characterize the present invention, will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a simplified block diagram illustrating components of exemplary architecture for embodiments of the present invention.
  • FIG. 2 is a simplified block diagram of a computer and associate peripheral and networked devices that provide and exemplary operating environment for the present invention.
  • FIGS. 3, 4, and 5 are flow diagrams illustrating a method for automating the assembly or creations of voice audio files for presentation to listeners or for use in voice interactive services.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The following description of embodiments of the present invention is made with reference to the above-described drawings wherein like numerals refer to like parts or components throughout the several figures. The present invention is directed to methods and system for automating the creation or assembly of voice audio files for presentation to listeners or for use in voice interactive services.
  • According to embodiments of the present invention, voice application audio files are constructed for presentation to listeners or for use in a voice interactive services system, as briefly described above. As is known to those skilled in the art, voice software applications allow spoken dialogues between users and voice systems. Such a system allows users to converse with the voice system where a user is provided with a voice prompt such as “for service in English, press 1” followed by a response from the user whereby the user may speak a response to the system or select a response mechanically such as by selecting a numeral on a telephone keypad. In a typical voice interactive system, a computer and associated peripheral and networked devices communicate with a caller via computer telephony interfaces. When a voice request or manual request (selection of a keypad numeral), is received from a caller via a computer telephony interface, a receiving computer locates a responsive voice audio file for presentation to the caller. A software application executed by the computer may obtain the required voice audio file and may play the voice audio file to the caller, and the computer may then play or caused to be played the selected voice audio file to the caller. Based on the responses to the played voice audio file received from the caller, the computer may locate and execute additional voice audio files, or the computer may provide or cause to be provided a service, such as directory assistance services, responsive to the request received from the caller via the voice interactive session.
  • Voice Extensible Markup Language (VoiceXML) is a standard scripting language widely used for developing voice applications for executing voice audio files according to embodiments of the present invention. Voice application developers may use a variety of text editors, or graphical user interface editors to write VoiceXML applications. According to an embodiment of the present invention, a suitable VoiceXML application editor is V-Builder provided by Nuance Company.
  • FIG. 1 is a simplified block diagram illustrating components of an exemplary architecture for embodiments of the present invention. According to an embodiment of the present invention, a recording manager 130 is a software application program module designed to assist the developer in automatically managing previously recorded audio files or audio file segments for developing desired VoiceXML applications. The functionality of the recording manager 130 is provided in combination with a VoiceXML text editor module 110, a VoiceXML parser 120 and local or remote recording library 140. As should be understood, the recording library 140 may be a local or remotely stored database containing audio files for use in accordance with embodiments of the present invention.
  • According to an embodiment of the present invention, the recording manager module 130 works as a post-processor application and is applied to a VoiceXML application after the developer has edited the VoiceXML code for the application. As is known to those skilled in the art, VoiceXML is a scripting language based on the Extensible Markup Language (XML). In VoiceXML, an audio file name is an attribute specified in an “audio” tag. Typically, the audio file name is specified via a uniform resource indicator “URI” in a source attribute, but it may also be specified as a variable specified in an expression attribute. For example, a typical VoiceXML script may be as follows:
  • <audio src=”hello.wav>hello</audio>
    <assign name=”myclip” expr=”hello.wav”/>
    <audio expr=”myclip”/>
  • For example, “hello.wav” may be an audio file which when executed by a computer executing the VoiceXML script plays to a listener the phrase “hello.” In addition, voice attributes may be specified such as male adult, male youth, female adult, female youth, etc. For example the following VoiceXML script specifies a voice gender of “male” a category of “adult” and voice talent named “Tom”.

  • <voice gender=“male” category=“adult” name=“tom”/>
  • Attributes of male and adult may be utilized to define the voice audio file as male and adult, and the voice talent of “Tom” may be utilized to locate a voice audio file recorded by a live voice talent named “Tom.”
  • According to an embodiment of the present invention, the application developer using the VoiceXML text editor module 110 prepares a VoiceXML script, as set out above. The VoiceXML parser 120 parses the VoiceXML script and searches for “audio” tags and “source” attributes, and extracts the text content, for example “hello” specified for the located audio tag. For each located audio tag, the recording manager 130 passes the associated text and audio file properties or attributes, for example male youth, to the recording library 140 via a database query to search for an existing audio file reference matching the desired audio file. If an existing audio file is found in the recording library 140, the recording manager 130 retrieves the audio file or a combination of audio files that may be combined to create the desired audio file. The located single audio file or combination of audio files are presented to the developer who has the option of allowing the recording manager 130 to automatically populate the VoiceXML script with the audio file name associated with the located audio file or combination of audio files, or the developer may manually verify the recordings by playing the audio files for review. If no matching audio file or combination of audio file segments is found, new recording references may be created. According to an alternative embodiment, the recording manager 130 may interface with a VoiceXML graphical user interface editor in which case the recording manager 130 concentrates on parsing the VoiceXML code generated by the VoiceXML graphical user interface editor without having to search through the entire VoiceXML code for individual audio tags.
  • According to embodiments of the present invention, the VoiceXML text editor module 110, the VoiceXML parser 120, the recording manager of 130 and the recording library 140 may be resident on the developer's computer 204, described below. Alternatively, the modules 110, 120, and 130 may be accessed by the developer from a local or remote server accessible to the developer from the computer 204. Likewise, the recording library 140 may be a database of recorded audio files resident at the developer's computer 204 or resident at a local or remote server accessible by the developer via a distributed computing environment such as the Internet.
  • FIG. 2 illustrates the architecture of a suitable computing device and associated peripheral devices for use in implementing the methods and systems of the present invention. While the invention is described in the general context of program modules that execute in conjunction with an application program that runs on an operating system on a personal computer, those skilled in the art will recognize that the invention may also be implemented in combination with other program modules. Generally, program modules include routines, programs, components, data structures and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, multiprocessor-based or programmable consumer electronics, mini computers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory source devices.
  • The computer architecture shown in FIG. 2 illustrates a conventional server or personal computer 204, including a central processing unit 216 (“CPU”), a system memory 224, including a random access memory 226 (“RAM”) and a read-only memory (“ROM”) 228, and a system bus 222 that couples the memory to the CPU 216. A basic input/output system 220 containing the basic routines that help to transfer information between elements within the computer, such as during startup, is stored in the ROM 228. The computer 2044 further includes a mass storage device 234 for storing an operating system 232 suitable for controlling the operation of a networked computer, such as the WINDOWS NT or XP operating systems from MICROSOFT CORPORATION of Redmond, Wash. The mass storage device 234 may also store application programs, such as the computer program 208, the VoiceXML text editor 110, the VoiceXML parser 120 and the recording manager 130. The mass storage device may also include data such as the recording library 140.
  • The mass storage device 234 is connected to the CPU 216 through a mass storage controller (not shown) connected to the bus 222. The mass storage device 234 and its associated computer-readable media, provide non-volatile storage for the computer 204. Although the description of computer-readable media contained herein refers to a mass storage device, such as a hard disk or CD-ROM drive, it should be appreciated by those skilled in the art that computer-readable media can be any available media that can be accessed by the computer 204.
  • By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
  • According to various embodiments of the invention, the computer 204 may operate in a networked environment using logical connections to remote computers through a network 214, such as the Internet or a LAN. The computer 204 may connect to the network 214 through a network interface unit 218 connected to the bus 222. It should be appreciated that the network interface unit 218 may also be utilized to connect to other types of networks and remote computer systems. The computer 204 may also include an input/output controller 220 for receiving and processing input from a number of devices, including a keyboard, mouse, or electronic stylus (not shown in FIG. 2). Similarly, an input/output controller 220 may provide output to a display screen, a printer, or others type of output devices.
  • Having described an illustrative system architecture for embodiments of the present invention with reference to FIG. 1, and having described illustrative operating environments for embodiments of the present inventions with reference to FIG. 2, FIGS. 3, 4, and 5 are flow diagrams illustrating a method for automating the assembly or creations of voice audio files for presentation to listeners or for use in voice interactive services. The method 300 begins at start step 302 and proceeds to step 304 where a VoiceXML script developer creates a VoiceXML script having desired audio tags, such as the illustrative VoiceXML script described above with reference to FIG. 1. For example, the developer may wish to create a VoiceXML script for playing an announcement to a caller such as “welcome to your telecommunications services provider—for services in English press 1 or say English.”
  • At step 306, the developer inserts into an audio tag the required text “welcome to your telecommunications services provider—for services in English, press 1 or say English” into the VoiceXML script instead of a specified audio file name. The developer utilizes her VoiceXML text editor 110 or graphical user interface editor for preparation of the VoiceXML script. At step 308, the recording manager software application 130 passes the VoiceXML script to the XML parser 120. At step 310, the XML parser 120 parses the received VoiceXML script to locate any audio tags contained therein.
  • At step 312, a determination is made as to whether any audio tags are located in the VoiceXML script. If no audio tags are located in the VoiceXML script, the method ends at 350. If audio tags are located by the XML parser 120 in the VoiceXML script, the method proceeds to step 314, and the parser 120 extracts the voice properties, if any, associated with the audio tag. For example, if a voice property such as “male adult” is specified for the located audio tag, as described above with reference to FIG. 1, the XML parser locates the property and extracts the property from the script at step 318. If no voice properties are found by the parser 120 for the first located audio tag, the method proceeds to step 320, and default voice properties such as female adult may be set by the recording manager 130 for the associated audio tag.
  • Referring now to FIG. 4, at step 322, the parser 120 extracts the text associate with the located audio tag, for example “welcome to your telecommunications services provider—for services in English, press 1 or say English.” The extracted text strings and extracted voice properties, if any, are passed by the parser 120 to the recording manager 130. At step 324, the recording manager 130 passes the extracted text string and voice properties including default voice properties, if required, in a database query to the recording library 140. At step 326, a database lookup is performed to determine whether an exact matching audio file with matching voice properties is located in the recording library 140. Additionally, at step 326, a lookup is performed to determine whether partially matching audio files are located in the recording library 140. As should be understood, when the recording manager 130 passes the text string and voice properties to the recording library 140, the recording manager 130 may pass a number of database queries made up of various combinations of the extracted text and proprieties. For example, the recording manager 130 may first pass the extracted text string and associated voice properties.
  • The recording manager may pass individual database queries containing each word in the extracted text string such as “welcome”, “to”, “your,” “telecommunications,” “service,” and so on to locate individual pre-recorded audio files for each individual word of the extracted text string. As should be understood, a number of combinations of individual words may also be passed to the recording library, such as “telecommunications services provider,” where there is a high probability that a previously recorded audio file exists for the combined words. Likewise, various combinations of words and voice properties may also be passed by the recording manger 130 to the recording library 140. At step 328, a determination is made as to whether any matching audio file references are located in the recording library 140. If no references are found, the methods proceed to step 348 and manual process for creating a required audio file is followed, as described below.
  • If audio file references are found in the recording library 140, the method proceeds to step 330 and a determination is made as to whether an exact match for the desired text and voice properties is located. If so, the method proceeds to step 332, FIG. 5, and a determination is made as to whether the audio file located in the recording library 140 should be automatically accepted. That is, the voice application developer may decide to automatically accept, without review, any audio file located by the recording manger 130 in the recording library 140 matching the desired text and voice properties. If the developer has designated automatic acceptance, the method proceeds to step 346 and the recording manager 130 populates the VoiceXML script audio tag with the audio file name located in the recording library 140. Accordingly, when the VoiceXML script is subsequently executed, the designated audio file is played.
  • Referring to step 332 if the developer has not designated automatic acceptance of matching audio files, the method proceeds to step 334 and the located matching audio file is passed to the developer for review. As should be understood, the developer may play the located audio file via a speaker associated with the developer's computer 204 to determine whether the located audio file meets the developer's requirements. If the developer is satisfied with the located audio file, the method proceeds to step 336 and the developer may accept the located audio file. If so, the method proceeds to step 346 and the audio file name is populated into the VoiceXML script, as described above. If the developer is not satisfied with the located matching audio file, the method proceeds to step 338 for a determination as to whether partially matching references may be combined to provide the developer with an audio file that is more satisfactory to the developer. For example, a matching audio file may have been located as described above having the desired text and the desired voice properties, but upon reviewing the located audio file, the developer may not be satisfied with the voice talent utilized for creation of the previously recorded file. That is, the developer may desire a more youthful voice, or the developer may determine that a voice of a different gender may be more satisfactory for the desired implementation.
  • Referring back to step 330, if no audio files matching the exact text string and required voice properties are located, or if such a file is located but, the developer rejects the located file, then the method proceeds to step 338, FIG. 5, and a determination is made as to whether partially matching audio files are located in the recording library 140. That is, a determination is made as to whether audio files matching segments of the text string and associated voice properties are found. For example, an audio file having the desired voice properties may be found which when executed plays “welcome to your telecommunications services provider,” and a second audio file may be located having the desired voice properties such as “male adult,” which when executed plays the phrase “for services in English, press 1 or say English.” If no partially matching audio files are located, the method proceeds to step 348 and a manual development process may be utilized, as described below. If partially matching audio files, are located in the recording library 140, the method proceeds to step 340 and a combination of the references is prepared for presentation to the developer.
  • At step 342, a combination of audio file references is presented to the developer for review. The developer may then listen to the combination of audio file references, and the method proceeds to step 344 where the developer may accept or reject the combination of audio file references. If the developer reviews the combination of audio file references and determines that the combination will create a satisfactory audio file, the method proceeds to step 346. At step 346, the recording manger 130 populates the VoiceXML script with an audio file name which when executed will play the combined references. For example, the XML script may be populated with the audio file name comprised of a first audio file plus a second audio file so that when the resulting VoiceXML script is executed, audio file 1 will be played followed by audio file 2 to provide the listener or caller with the desired audio announcement.
  • As described above, once an audio file or a combination of audio files is found to be acceptable, the associated VoiceXML script is populated with audio file names for subsequent play when the script is executed. Examples of how VoiceXML script may be structured according to the present invention are as follows. If a developer desires a file which when played provides an audio-formatted statement “hello world,” three different VoiceXML script statements may be structured as follows.
  • 1. <assign name = “myclip” expr = “hello.wav + world.wav”/>
    <audio expr = “myclip”>hello world</audio>
    2. <audio src = “hello.wav”>hello</audio>
    <audio src = “world.wav”>world</audio>
    3. <audio src = “hello_world.wav”>hello world</audio>
  • All three example script statements play “hello world” when executed. Script statements 1 and 2 play two files, namely “hello.wav” and “world.wav.” Script statement 3 plays a single file, namely “hello_world.wav.” File 1 includes a reference pointing to the concatenation of two files, namely “hello.wav” and “world.wav.” File 3 will be replaced by file 1 if the recording library 140 does not have a single file providing “hello world,” but that includes two files providing “hello” and “world.”
  • Referring back to step 344, if the developer does not find the combination of located audio file references acceptable or otherwise satisfactory, the method proceeds to step 348, and a manual development process may be performed by the developer. That is, the developer may decide that a voice talent such as a male adult speaker must be obtained who will record a new audio file that is satisfactory to the developer. Or, the developer may determine that the voice talent is required only to record a new audio file segment for combining with previously recorded audio file segments located in the recording library 140. Once the manual process is completed, an audio file name associated with the manually created audio file is populated into the VoiceXML script, as described above with reference to step 346. Referring back to step 346, after an audio file name associated with a single audio file, a combination of audio files, or a newly created audio file is populated into the VoiceXML script, the method proceeds back to step 310, and the XML parser may locate the next audio tag in the VoiceXML script prepared by the developer. The method then proceeds, as described above, for locating an acceptable audio file for association with the next located audio tag. The method ends at 350.
  • As described herein, methods and systems for automating assembly or creation of voice audio files for presentations to listeners or for use in a voice interactive services are provided. Advantageously, a VoiceXML script developer may populate a script with audio file names located in a repository of previously recorded audio files without the need for manually locating potentially satisfactory audio files one file at a time. Only if the automated system is unable to locate satisfactory previously recorded audio files for use by the developer does the developer utilize a manual process for creating or otherwise obtaining a satisfactory audio file or a combination of audio files. It will be apparent to those skilled in the art that various modifications or variations may be made in the present invention without departing from the scope or spirit of the invention. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein.

Claims (20)

1. A method of audio file insertion in voice application code, the method comprising performing, via a processor, the steps:
identifying an audio tag in spoken dialog code of an interactive voice system;
extracting data associated with the audio tag;
generating a database query based on the extracted data, and retrieving at least one audio filename associated with an audio file to be played to a user in the interactive voice system; and
replacing the extracted data with the audio filename in the spoken dialog code if the audio file associated with the audio filename matches at least a portion of the extracted data.
2. The method of claim 1, further comprising identifying a plurality of audio tags, wherein the extracting, generating, and replacing steps are iteratively performed for each audio tag of the plurality of audio tags in the spoken dialog code.
3. The method of claim 1, wherein the extracted data comprises text to be spoken by the voice application in the interactive voice system.
4. The method of claim 3, wherein the extracted data further comprises voice parameters associated with a gender and age group of a speaker.
5. The method of claim 4, further comprising selecting default voice parameters if no voice parameters are associated with the audio tag.
6. The method of claim 1, further comprising replacing the extracted data with a plurality of audio filenames associated with audio files selected and ordered to match the text.
7. The method of claim 1, wherein the spoken dialog code comprises Voice Extensible Markup Language (VoiceXML) code.
8. The method of claim 1, further comprising replacing the extracted data with audio filenames only if the corresponding audio file is located in the database.
9. A method of operating an interactive voice system, the method comprising:
receiving, via a processor, user input;
generating, via a processor, a response to the user input by performing the steps:
identifying an audio tag in spoken dialog code of an interactive voice system;
extracting data associated with the audio tag;
generating a database query based on the extracted data, and retrieving at least one audio filename associated with an audio file to be played to a user in the interactive voice system;
replacing the extracted data with the audio filename in the spoken dialog code if the audio file associated with the audio filename matches at least a portion of the extracted data; and
interpreting the spoken dialog code and speaking the response to a user.
10. The method of claim 9, further comprising identifying a plurality of audio tags, wherein the extracting, generating, and replacing steps are iteratively performed for each audio tag of the plurality of audio tags in the spoken dialog code.
11. The method of claim 9, wherein the extracted data comprises text to be spoken by the voice application in the interactive voice system.
12. The method of claim 11, wherein the extracted data further comprises voice parameters associated with a gender and age group of a speaker.
13. The method of claim 12, further comprising selecting default voice parameters if no voice parameters are associated with the audio tag.
14. The method of claim 9, further comprising replacing the extracted data with a plurality of audio filenames associated with audio files selected and ordered to match the text.
15. The method of claim 9, wherein the spoken dialog code comprises Voice Extensible Markup Language (VoiceXML) code.
16. The method of claim 9, further comprising replacing the extracted data with audio filenames only if the corresponding audio file is located in the database.
17. A system of audio file insertion in voice application code, the system comprising:
a processor;
a module configured to control the processor to identify an audio tag in spoken dialog code of an interactive voice system;
a module configured to control the processor to extract data associated with the audio tag;
a module configured to control the processor to generate a database query based on the extracted data, and retrieving at least one audio filename associated with an audio file to be played to a user in the interactive voice system; and
a module configured to control the processor to replace the extracted data with the audio filename in the spoken dialog code if the audio file associated with the audio filename matches at least a portion of the extracted data.
18. The system of claim 17, further comprising a module configured to control the processor to identify a plurality of audio tags, and for each audio tag of the plurality of audio tags, perform the steps:
extracting data associated with the audio tag;
generating a database query based on the extracted data, and retrieving at least one audio filename associated with an audio file to be played to a user in the interactive voice system; and
replacing the extracted data with the audio filename in the spoken dialog code if the audio file associated with the audio filename matches at least a portion of the extracted data.
19. The system of claim 17, wherein the extracted data comprises text to be spoken by the voice application in the interactive voice system.
20. The system of claim 19, wherein the extracted data further comprises voice parameters associated with a gender and age group of a speaker.
US12/536,040 2003-06-10 2009-08-05 Methods and system for creating voice files using a voicexml application Abandoned US20090290694A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/536,040 US20090290694A1 (en) 2003-06-10 2009-08-05 Methods and system for creating voice files using a voicexml application

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/458,532 US7577568B2 (en) 2003-06-10 2003-06-10 Methods and system for creating voice files using a VoiceXML application
US12/536,040 US20090290694A1 (en) 2003-06-10 2009-08-05 Methods and system for creating voice files using a voicexml application

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/458,532 Continuation US7577568B2 (en) 2003-06-10 2003-06-10 Methods and system for creating voice files using a VoiceXML application

Publications (1)

Publication Number Publication Date
US20090290694A1 true US20090290694A1 (en) 2009-11-26

Family

ID=33510599

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/458,532 Expired - Fee Related US7577568B2 (en) 2003-06-10 2003-06-10 Methods and system for creating voice files using a VoiceXML application
US12/536,040 Abandoned US20090290694A1 (en) 2003-06-10 2009-08-05 Methods and system for creating voice files using a voicexml application

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/458,532 Expired - Fee Related US7577568B2 (en) 2003-06-10 2003-06-10 Methods and system for creating voice files using a VoiceXML application

Country Status (1)

Country Link
US (2) US7577568B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070118801A1 (en) * 2005-11-23 2007-05-24 Vizzme, Inc. Generation and playback of multimedia presentations
CN103945272A (en) * 2013-01-23 2014-07-23 腾讯科技(北京)有限公司 Video interaction method, apparatus and system
US11170051B2 (en) * 2017-08-30 2021-11-09 Fujitsu Limited Information processing device, information processing method, and dialog control system

Families Citing this family (133)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US7140004B1 (en) * 2000-06-13 2006-11-21 Tellme Networks, Inc. Method and apparatus for zero-footprint phone application development
US20050144015A1 (en) * 2003-12-08 2005-06-30 International Business Machines Corporation Automatic identification of optimal audio segments for speech applications
US20050125236A1 (en) * 2003-12-08 2005-06-09 International Business Machines Corporation Automatic capture of intonation cues in audio segments for speech applications
US8666746B2 (en) * 2004-05-13 2014-03-04 At&T Intellectual Property Ii, L.P. System and method for generating customized text-to-speech voices
US8233597B2 (en) * 2005-02-11 2012-07-31 Cisco Technology, Inc. System and method for the playing of key phrases in voice mail messages
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8036894B2 (en) * 2006-02-16 2011-10-11 Apple Inc. Multi-unit approach to text-to-speech synthesis
CN101046956A (en) * 2006-03-28 2007-10-03 国际商业机器公司 Interactive audio effect generating method and system
EP2059924A4 (en) * 2006-08-28 2010-08-25 Shaul Shalev Systems and methods for audio-marking of information items for identifying and activating links to information or processes related to the marked items
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8027837B2 (en) * 2006-09-15 2011-09-27 Apple Inc. Using non-speech sounds during text-to-speech synthesis
US20080091719A1 (en) * 2006-10-13 2008-04-17 Robert Thomas Arenburg Audio tags
JP2008225254A (en) * 2007-03-14 2008-09-25 Canon Inc Speech synthesis apparatus, method, and program
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8949122B2 (en) * 2008-02-25 2015-02-03 Nuance Communications, Inc. Stored phrase reutilization when testing speech recognition
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US9805082B2 (en) * 2008-09-10 2017-10-31 Sap Se Recording information about an item
US8321225B1 (en) 2008-11-14 2012-11-27 Google Inc. Generating prosodic contours for synthesized speech
WO2010067118A1 (en) 2008-12-11 2010-06-17 Novauris Technologies Limited Speech recognition involving a mobile device
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
KR102118209B1 (en) 2013-02-07 2020-06-02 애플 인크. Voice trigger for a digital assistant
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
KR101959188B1 (en) 2013-06-09 2019-07-02 애플 인크. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
CN105265005B (en) 2013-06-13 2019-09-17 苹果公司 System and method for the urgent call initiated by voice command
CN105453026A (en) 2013-08-06 2016-03-30 苹果公司 Auto-activating smart responses based on activities from remote devices
CN104794127B (en) * 2014-01-20 2018-03-13 曲立东 Data label delivery system and method based on audio
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10776419B2 (en) * 2014-05-16 2020-09-15 Gracenote Digital Ventures, Llc Audio file quality and accuracy assessment
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
CN105786805A (en) * 2014-12-15 2016-07-20 乐视移动智能信息技术(北京)有限公司 Intelligent mobile terminal, document manager and file display method of same
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US11062497B2 (en) * 2017-07-17 2021-07-13 At&T Intellectual Property I, L.P. Structuralized creation and transmission of personalized audiovisual data
CN107846521A (en) * 2017-10-12 2018-03-27 贵阳朗玛信息技术股份有限公司 Voice document player method and device
CN113312070B (en) * 2021-06-03 2023-02-24 海信集团控股股份有限公司 Application name updating method of vehicle-mounted application and vehicle

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5758323A (en) * 1996-01-09 1998-05-26 U S West Marketing Resources Group, Inc. System and Method for producing voice files for an automated concatenated voice system
US5771276A (en) * 1995-10-10 1998-06-23 Ast Research, Inc. Voice templates for interactive voice mail and voice response system
US5878423A (en) * 1997-04-21 1999-03-02 Bellsouth Corporation Dynamically processing an index to create an ordered set of questions
US6035272A (en) * 1996-07-25 2000-03-07 Matsushita Electric Industrial Co., Ltd. Method and apparatus for synthesizing speech
US6101470A (en) * 1998-05-26 2000-08-08 International Business Machines Corporation Methods for generating pitch and duration contours in a text to speech system
US20010032079A1 (en) * 2000-03-31 2001-10-18 Yasuo Okutani Speech signal processing apparatus and method, and storage medium
US6366883B1 (en) * 1996-05-15 2002-04-02 Atr Interpreting Telecommunications Concatenation of speech segments by use of a speech synthesizer
US20020072900A1 (en) * 1999-11-23 2002-06-13 Keough Steven J. System and method of templating specific human voices
US6446040B1 (en) * 1998-06-17 2002-09-03 Yahoo! Inc. Intelligent text-to-speech synthesis
US20030014407A1 (en) * 2001-04-11 2003-01-16 Green Arrow Media, Inc. System and method for making media recommendations
US20030055641A1 (en) * 2001-09-17 2003-03-20 Yi Jon Rong-Wei Concatenative speech synthesis using a finite-state transducer
US6629087B1 (en) * 1999-03-18 2003-09-30 Nativeminds, Inc. Methods for creating and editing topics for virtual robots conversing in natural language
US20030187656A1 (en) * 2001-12-20 2003-10-02 Stuart Goose Method for the computer-supported transformation of structured documents
US20030216923A1 (en) * 2002-05-15 2003-11-20 Gilmore Jeffrey A. Dynamic content generation for voice messages
US6697796B2 (en) * 2000-01-13 2004-02-24 Agere Systems Inc. Voice clip search
US6728682B2 (en) * 1998-01-16 2004-04-27 Avid Technology, Inc. Apparatus and method using speech recognition and scripts to capture, author and playback synchronized audio and video
US20040102975A1 (en) * 2002-11-26 2004-05-27 International Business Machines Corporation Method and apparatus for masking unnatural phenomena in synthetic speech using a simulated environmental effect
US20040123236A1 (en) * 2002-12-21 2004-06-24 International Business Machines Corporation Method and apparatus for caching documents
US20040199494A1 (en) * 2003-04-04 2004-10-07 Nikhil Bhatt Method and apparatus for tagging and locating audio data
US6862568B2 (en) * 2000-10-19 2005-03-01 Qwest Communications International, Inc. System and method for converting text-to-voice
US6891932B2 (en) * 2001-12-11 2005-05-10 Cisco Technology, Inc. System and methodology for voice activated access to multiple data sources and voice repositories in a single session
US6895084B1 (en) * 1999-08-24 2005-05-17 Microstrategy, Inc. System and method for generating voice pages with included audio files for use in a voice page delivery system
US6950798B1 (en) * 2001-04-13 2005-09-27 At&T Corp. Employing speech models in concatenative speech synthesis
US6990450B2 (en) * 2000-10-19 2006-01-24 Qwest Communications International Inc. System and method for converting text-to-voice
US7003719B1 (en) * 1999-01-25 2006-02-21 West Publishing Company, Dba West Group System, method, and software for inserting hyperlinks into documents
US7039585B2 (en) * 2001-04-10 2006-05-02 International Business Machines Corporation Method and system for searching recorded speech and retrieving relevant segments
US7039588B2 (en) * 2000-03-31 2006-05-02 Canon Kabushiki Kaisha Synthesis unit selection apparatus and method, and storage medium
US7055146B1 (en) * 2001-03-08 2006-05-30 Microsoft Corporation Method and system for dynamically inserting modifications for identified programs
US7107536B1 (en) * 2000-02-24 2006-09-12 Eric Morgan Dowling Remote-agent-object based multilevel browser
US7140004B1 (en) * 2000-06-13 2006-11-21 Tellme Networks, Inc. Method and apparatus for zero-footprint phone application development

Patent Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5771276A (en) * 1995-10-10 1998-06-23 Ast Research, Inc. Voice templates for interactive voice mail and voice response system
US6014428A (en) * 1995-10-10 2000-01-11 Ast Research, Inc. Voice templates for interactive voice mail and voice response system
US5758323A (en) * 1996-01-09 1998-05-26 U S West Marketing Resources Group, Inc. System and Method for producing voice files for an automated concatenated voice system
US6366883B1 (en) * 1996-05-15 2002-04-02 Atr Interpreting Telecommunications Concatenation of speech segments by use of a speech synthesizer
US6035272A (en) * 1996-07-25 2000-03-07 Matsushita Electric Industrial Co., Ltd. Method and apparatus for synthesizing speech
US5878423A (en) * 1997-04-21 1999-03-02 Bellsouth Corporation Dynamically processing an index to create an ordered set of questions
US6728682B2 (en) * 1998-01-16 2004-04-27 Avid Technology, Inc. Apparatus and method using speech recognition and scripts to capture, author and playback synchronized audio and video
US6101470A (en) * 1998-05-26 2000-08-08 International Business Machines Corporation Methods for generating pitch and duration contours in a text to speech system
US6446040B1 (en) * 1998-06-17 2002-09-03 Yahoo! Inc. Intelligent text-to-speech synthesis
US7003719B1 (en) * 1999-01-25 2006-02-21 West Publishing Company, Dba West Group System, method, and software for inserting hyperlinks into documents
US6629087B1 (en) * 1999-03-18 2003-09-30 Nativeminds, Inc. Methods for creating and editing topics for virtual robots conversing in natural language
US6895084B1 (en) * 1999-08-24 2005-05-17 Microstrategy, Inc. System and method for generating voice pages with included audio files for use in a voice page delivery system
US20020072900A1 (en) * 1999-11-23 2002-06-13 Keough Steven J. System and method of templating specific human voices
US6697796B2 (en) * 2000-01-13 2004-02-24 Agere Systems Inc. Voice clip search
US7107536B1 (en) * 2000-02-24 2006-09-12 Eric Morgan Dowling Remote-agent-object based multilevel browser
US20010032079A1 (en) * 2000-03-31 2001-10-18 Yasuo Okutani Speech signal processing apparatus and method, and storage medium
US7039588B2 (en) * 2000-03-31 2006-05-02 Canon Kabushiki Kaisha Synthesis unit selection apparatus and method, and storage medium
US7140004B1 (en) * 2000-06-13 2006-11-21 Tellme Networks, Inc. Method and apparatus for zero-footprint phone application development
US6862568B2 (en) * 2000-10-19 2005-03-01 Qwest Communications International, Inc. System and method for converting text-to-voice
US6990450B2 (en) * 2000-10-19 2006-01-24 Qwest Communications International Inc. System and method for converting text-to-voice
US7055146B1 (en) * 2001-03-08 2006-05-30 Microsoft Corporation Method and system for dynamically inserting modifications for identified programs
US7039585B2 (en) * 2001-04-10 2006-05-02 International Business Machines Corporation Method and system for searching recorded speech and retrieving relevant segments
US20030014407A1 (en) * 2001-04-11 2003-01-16 Green Arrow Media, Inc. System and method for making media recommendations
US6950798B1 (en) * 2001-04-13 2005-09-27 At&T Corp. Employing speech models in concatenative speech synthesis
US20030055641A1 (en) * 2001-09-17 2003-03-20 Yi Jon Rong-Wei Concatenative speech synthesis using a finite-state transducer
US6891932B2 (en) * 2001-12-11 2005-05-10 Cisco Technology, Inc. System and methodology for voice activated access to multiple data sources and voice repositories in a single session
US20030187656A1 (en) * 2001-12-20 2003-10-02 Stuart Goose Method for the computer-supported transformation of structured documents
US20030216923A1 (en) * 2002-05-15 2003-11-20 Gilmore Jeffrey A. Dynamic content generation for voice messages
US20040102975A1 (en) * 2002-11-26 2004-05-27 International Business Machines Corporation Method and apparatus for masking unnatural phenomena in synthetic speech using a simulated environmental effect
US20040123236A1 (en) * 2002-12-21 2004-06-24 International Business Machines Corporation Method and apparatus for caching documents
US7062709B2 (en) * 2002-12-21 2006-06-13 International Business Machines Corporation Method and apparatus for caching VoiceXML documents
US20040199494A1 (en) * 2003-04-04 2004-10-07 Nikhil Bhatt Method and apparatus for tagging and locating audio data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070118801A1 (en) * 2005-11-23 2007-05-24 Vizzme, Inc. Generation and playback of multimedia presentations
CN103945272A (en) * 2013-01-23 2014-07-23 腾讯科技(北京)有限公司 Video interaction method, apparatus and system
US11170051B2 (en) * 2017-08-30 2021-11-09 Fujitsu Limited Information processing device, information processing method, and dialog control system

Also Published As

Publication number Publication date
US20040254792A1 (en) 2004-12-16
US7577568B2 (en) 2009-08-18

Similar Documents

Publication Publication Date Title
US7577568B2 (en) Methods and system for creating voice files using a VoiceXML application
US6832196B2 (en) Speech driven data selection in a voice-enabled program
US10171660B2 (en) System and method for indexing automated telephone systems
US7609829B2 (en) Multi-platform capable inference engine and universal grammar language adapter for intelligent voice application execution
US8625773B2 (en) System and method for analyzing automatic speech recognition performance data
US7242752B2 (en) Behavioral adaptation engine for discerning behavioral characteristics of callers interacting with an VXML-compliant voice application
US7487095B2 (en) Method and apparatus for managing user conversations
US6839671B2 (en) Learning of dialogue states and language model of spoken information system
US8000973B2 (en) Management of conversations
US6400806B1 (en) System and method for providing and using universally accessible voice and speech data files
US7415415B2 (en) Computer generated prompting
US20110106527A1 (en) Method and Apparatus for Adapting a Voice Extensible Markup Language-enabled Voice System for Natural Speech Recognition and System Response
US7260530B2 (en) Enhanced go-back feature system and method for use in a voice portal
US20070203708A1 (en) System and method for providing transcription services using a speech server in an interactive voice response system
US20070143100A1 (en) Method &amp; system for creation of a disambiguation system
CN110244941B (en) Task development method and device, electronic equipment and computer readable storage medium
US20080154590A1 (en) Automated speech recognition application testing
US20220093103A1 (en) Method, system, and computer-readable recording medium for managing text transcript and memo for audio file
US20030055651A1 (en) System, method and computer program product for extended element types to enhance operational characteristics in a voice portal
US7895037B2 (en) Method and system for trimming audio files
US20200204679A1 (en) System for processing voice responses using a natural language processing engine
US20070201631A1 (en) System and method for defining, synthesizing and retrieving variable field utterances from a file server
US6947969B2 (en) System and method for accessing voice messaging system data
US20070168192A1 (en) Method and system of bookmarking and retrieving electronic documents
KR20210114328A (en) Method for managing information of voice call recording and computer program for the same

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION