WO2001035235A1

WO2001035235A1 - System and method for accessing web content using limited display devices

Info

Publication number: WO2001035235A1
Application number: PCT/US2000/030749
Authority: WO
Inventors: Garry Chinn; Benedict R. Dugan; Roger E. Hagen; Michael R. Sexton
Original assignee: Vocal Point, Inc.
Priority date: 1999-11-09
Filing date: 2000-11-08
Publication date: 2001-05-17
Also published as: AU1758501A

Abstract

A computer system is provided for allowing a user of a limited display device (34) to browse content available from a data network. The system includes an interface (32) which receives a request for the content from the user via the limited display device. A processor, coupled to the interface, retrieves a conventional markup language document containing the content from the data network. The processor converts the conventional markup language document into a navigation tree (40) which provides a semantic, hierarchical structure for the content.

Description

SYSTEM AND METHOD FOR ACCESSING WEB CONTENT USING LIMITED DISPLAY DEVICES

TECHNICAL FIELD OF THE INVENTION

The present invention relates generally to data networks and, in particular, to a system and method for accessing web content using limited display devices.

BACKGROUND OF THE INVENTION

The Internet, and the closely related application known as the "World Wide Web," have made an abundance of information (e.g., public news, stock quotes, product information, etc.) readily available to anyone with a desktop computer running a conventional web browser program, such as Microsoft's Internet

Explorer or Netscape's Communicator. Such a web browser provides an interface between the Internet or other data networks and the desktop computer, allowing a desktop computer user to view information at one or more web pages. Web pages are supported by documents formatted in a conventional markup language such as

Hyper-Text Markup Language (HTML) and extensible Markup Language (XML) . Although these markup languages are suitable for presenting information on a desktop computer, they are generally not well suited for new, emerging devices--such as, for example, cellular telephones, smart telephones, wireless personal digital assistants (PDAs) , and like devices with limited display capability- -through which Internet information could potentially be delivered. Furthermore, neither conventional web browsers nor conventional markup languages support or allow users to readily access information from the Internet using voice commands or commands from limited display devices. Efforts have been made to address such problems . For example, voice-enabling languages, such as, Voice Extensible Markup Language (VoiceXML) have been developed. Unlike the conventional markup languages of the Internet (e.g., HTML and XML) , VoiceXML enables the delivery of information via voice commands or commands from limited display devices. However, any information which is desirably delivered with VoiceXML must be separately constructed in that language, apart from the conventional markup languages . Because most websites on the Internet do not provide separate VoiceXML capability, much of the information on the Internet is still largely unavailable to people without desktop computers or via voice commands .

SUMMARY OF THE INVENTION

According to an embodiment of the present invention, a computer system is provided for allowing a user of a limited display device to browse content available from a data network. The system includes an interface that receives a request for the content from the user via the limited display device. A processor, coupled to the interface, retrieves a conventional markup language document containing the content from the data network. The processor converts the conventional markup language document into a navigation tree that provides a semantic, hierarchical structure for the content .

According to another embodiment of the present invention, a method performed on a computer is provided for allowing a user of a limited display device to browse content available from a data network. The method includes the following: receiving a request for the content from the user via the limited display device; retrieving a conventional markup language document containing the content from the data network; and converting the conventional markup language document into a navigation tree which provides a semantic, hierarchical structure for the content. This structure is suitable for presenting content in audible form, and thus, is appropriate for an environment with limited capacity for display. According to yet another embodiment of the present invention, a computer system for allowing a user of a limited display device to browse content available from a data network includes a markup language parser. The markup language parser receives a conventional markup language document in response to a request for the content from the user via the limited display device. Furthermore, the markup language parser generates a document tree from the conventional markup language document. A style sheet parser receives a style sheet document in response to the request, and generates a style tree from the style sheet document . The style tree comprises a plurality of style sheet rules. A tree converter, which is in communication with the markup language parser and the style sheet parser, converts the document tree into a navigation tree using the style sheet tree rules. The navigation tree provides a semantic, hierarchical structure for the content .

According to yet another embodiment of the present invention, a method performed on a computer for allowing a user of a limited display device to browse content available from a data network includes: receiving a conventional markup language document and a style sheet document in response to a request for the content from the user via the limited display device; generating a document tree from the conventional markup language document; generating a style tree from the style sheet document, the style tree comprising a plurality of style sheet rules; and converting the document tree into a navigation tree using the style sheet tree rules, the navigation tree providing a semantic, hierarchical structure for the content . According to yet another embodiment of the present invention, a computer system for allowing a user of a limited display device to browse content available from a data network includes a gateway module . The gateway module is operable to receive a spoken request for the content from the user via the limited display device, and to recognize the spoken request. A browser module, in communication with the gateway module, is operable to retrieve a conventional markup language document and a style sheet document from the data network in response to the spoken request. The conventional markup language document contains the content; the style sheet document contains metadata. The browser module is operable to generate a navigation tree using the conventional markup language document and the style sheet document. The navigation tree provides a semantic, hierarchical structure for the content . The gateway module and the browser module cooperate to enable the user to browse the content using the navigation tree and to output speech conveying the content to the user via the limited display device.

According to still yet another embodiment of the present invention, a method performed on a computer for allowing a user of a limited display device to browse content available from a data network includes: receiving a spoken request for the content from the user via the limited display device; recognizing the spoken request; retrieving a conventional markup language document and a style sheet document from the data network in response to the spoken request, the conventional markup language document containing the content, the style sheet document containing metadata; generating a navigation tree using the conventional markup language document and the style sheet document, the navigation tree providing a semantic, hierarchical structure for the content; enabling the user to browse the content using the navigation tree; and outputting speech conveying the content to the user via the limited display device. A technical advantage of the present invention includes providing a system and method for accessing or browsing content available from a data network (e.g., the Internet) using voice commands, for example, from any telephone, wireless personal digital assistant, or other device with limited display capability. This system and method for voice browsing navigates through the content and delivers the same, for example, in the form of generated speech. The system and method can voice-enable any content currently formatted in a conventional, Internet- accessible markup language (e.g., HTML and XML), thus offering an unparalleled experience for users. To accomplish this, in one embodiment, the system and method of the present invention build or generate navigation trees using the conventional, Internet-accessible markup language documents supporting current websites. A navigation tree organizes the content of a web page into an outline or hierarchical structure that takes into account the meaning of the content, and thus can be used for semantic retrieval of the content. As such, a navigation tree supports voice-based browsing of web pages by users . For documents formatted in various conventional markup languages, respective default style sheet (e.g., xCSS) documents may be provided for use in generating the navigation trees. Each style sheet document may contain metadata, such as declarative statements (rules) and procedural statements. For each conventional markup language document, the system and method may construct a document tree comprising a number of nodes. The rules or declarative statements contained in a suitable style sheet document are used to modify the document tree, for example, by adding or modifying attributes at each node of the document tree, deleting unnecessary nodes, filtering other nodes, etc. If procedural statements are present in the style sheet document, the system and method may apply these procedures directly to construct the navigation tree. If there are no such procedural statements, the system and method may apply a simple mapping procedure to convert the document tree into the navigation tree. Other aspects and advantages of the present invention will become apparent from the following descriptions and accompanying drawings .

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and for further features and advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which:

FIG. 1A illustrates an exemplary environment in which a voice browsing system, according to an embodiment of the present invention, may operate; FIG. IB illustrates another exemplary environment in which a voice browsing system, according to an embodiment of the present invention, may operate;

FIG. 2 is a block diagram of a voice browsing system, according to an embodiment of the present invention; FIG. 3 is a block diagram of a navigation tree builder component, according to an embodiment of the present invention;

FIG. 4 is a block diagram of a tree converter, according to an embodiment of the present invention;

FIG. 5 illustrates an exemplary document tree; FIG. 6 illustrates an exemplary navigation tree, according to an embodiment of the present invention; FIG. 7 illustrates a computer-based system which is an exemplary hardware implementation for the voice browsing system;

FIG. 8 is a flow diagram of an exemplary method for browsing content with voice commands, according to an embodiment of the present invention;

FIG. 9 is a flow diagram of an exemplary method for generating a navigation tree, according to an embodiment of the present invention;

FIG. 10 is a flow diagram of an exemplary method for applying style sheet rules to a document tree, according to an embodiment of the present invention;

FIG. 11 is a flow diagram of an exemplary method for applying heuristic rules to a document tree, according to an embodiment of the present invention; and FIG. 12 is a flow diagram of an exemplary method for mapping a document tree into a navigation tree, according to an embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS The preferred embodiments of the present invention and their advantages are best understood by referring to FIGS. 1-12 of the drawings. Like numerals are used for like and corresponding parts of the various drawings.

Turning first to the nomenclature of the specification, the detailed description which follows is represented largely in terms of processes and symbolic representations of operations performed by conventional computer components, such as a local or remote central processing unit (CPU) or processor associated with a general purpose computer system, memory storage devices for the processor, and connected local or remote pixel-oriented display devices. These operations include the manipulation of data bits by the processor and the maintenance of these bits within data structures resident in one or more of the memory storage devices. Such data structures impose a physical organization upon the collection of data bits stored within computer memory and represent specific electrical or magnetic elements. These symbolic representations are the means used by those skilled in the art of computer programming and computer construction to most effectively convey teachings and discoveries to others skilled in the art .

For purposes of this discussion, a process, method, routine, or sub-routine is generally considered to be a sequence of computer-executed steps leading to a desired result. These steps generally require manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, or otherwise manipulated. It is conventional for those skilled in the art to refer to these signals as bits, values, elements, symbols, characters, text, terms, numbers, records, files, or the like. It should be kept in mind, however, that these and some other terms should be associated with appropriate physical quantities for computer operations, and that these terms are merely conventional labels applied to physical quantities that exist within and during operation of the computer.

It should also be understood that manipulations within the computer are often referred to in terms such as adding, comparing, moving, searching, or the like, which are often associated with manual operations performed by a human operator.

It must be understood that no involvement of the human operator may be necessary, or even desirable, in the present invention.

The operations described herein are machine operations performed in conjunction with the human operator or user that interacts with the computer or computers .

In addition, it should be understood that the programs, processes, methods, and the like, described herein are but an exemplary implementation of the present invention and are not related, or limited, to any particular computer, apparatus, or computer language. Rather, various types of general purpose computing machines or devices may be used with programs constructed in accordance with the teachings described herein.

Similarly, it may prove advantageous to construct a specialized apparatus to perform the method steps described herein by way of dedicated computer systems with hard-wired logic or programs stored in non-volatile memory, such as read-only memory (ROM) .

Exemplary Environment FIG. 1A illustrate an exemplary environment in which a voice browsing system 10, according to an embodiment of the present invention, may operate. In this environment, one or more content providers 12 may provide content to any number of interested users . Each content provider can be an entity which operates or maintains a portal or any other website through which content can be delivered. Each portal or website, which can be supported by a suitable computer system or web server, may include one or more web pages at which content is made available. Each website or web page can be identified by a respective uniform resource locator (URL) .

Content can be any data or information that is presentable (visually, audibly, or otherwise) to users. Thus, content can include written text, images, graphics, animation, video, music, voice, and the like, or any combination thereof. Content can be stored in digital form, such as, for example, a text file, an image file, an audio file, a video file, etc. This content can be included in one or more web pages of the respective portal or website maintained by each content provider 12.

These web pages can be supported by documents formatted in a conventional, Internet-accessible markup language, such as, for example, Hyper-Text Markup Language (HTML) and extensible Markup Language (XML) . HTML and XML are markup language standards set by the World Wide Web Consortium (W3C) for Internet-accessible documents. In general, conventional markup languages provide formatting and structure for content that is to be presented visually. That is, conventional markup languages describe the way that content should be displayed, for example, by specifying that text should appear in boldface, which location a particular image should appear, etc. In markup languages, tags are added or embedded within content to describe how the content should be formatted and displayed. A conventional, Internet-accessible markup language document can be the source page for any browser on a computer. Along with the content, each content provider 12 may also maintain metadata that can be used to guide the construction of a semantic representation for the content. Metadata may include, for example, declarative statements (rules) and procedural statements. This metadata can be contained in one or more style sheet documents, which are essentially templates that apply formatting and style information to the elements of a web page. A style sheet document can be, for example, an extended Cascading Style Sheet (xCSS) document. In one embodiment, a separate default style sheet documents may be provided for each conventional markup language (e.g., HTML or XML) . As an alternative to style sheets, metadata can be contained in documents formatted in a suitable descriptive language such as Resource Description Framework. Using style sheet documents (or other appropriate documents) , auxiliary metadata can be applied to a web page supported by a conventional markup language document .

One or more data networks, such as the Internet 14, can be used to deliver content . Internet 14 is an interconnection of computer clients and servers located throughout the world and exchanging information according to Transmission Control Protocol/Internet Protocol (TCP/IP) , Internetwork Packet eXchange/Sequence Packet exchange (IPX/SPX) , AppleTalk, or other suitable protocol . Internet 14 supports the distributed application known as the "World Wide Web." As described herein, web servers maintain websites, each comprising one or more web pages at which information is made available for viewing. Each website or web page may be supported by documents formatted in any suitable conventional markup language (e.g., HTML or XML). Clients may locally execute a conventional web browser program. A conventional web browser is a computer program that allows exchange information with the World Wide Web. Any of a variety of conventional web browsers are available, such as NETSCAPE

NAVIGATOR from Netscape Communications Corp., INTERNET EXPLORER from Microsoft Corporation, and others that allow convenient access and navigation of the Internet 14. Information may be communicated from a web server to a client using a suitable protocol, such as, for example, Hypertext Transfer Protocol (HTTP) or File Transfer Protocol (FTP) .

A service provider 16 is connected to Internet 1 . As used herein, the terms "connected," "coupled," or any variant thereof, mean any connection or coupling, either direct or indirect, between two or more elements; such connection or coupling can be physical or logical. Service provider 16 may operate a computer system that appears as a client on Internet 14 to retrieve content and other information from content providers 12.

In general, service provider 16 can be an entity that delivers services to one or more users. These services may include telephony and voice services, including plain old telephone service (POTS), digital services, cellular service, wireless service, pager service, etc. To support the delivery of services, service provider 16 may maintain a system for communicating over a suitable communication network, such as, for example, a telecommunications network. Such telecommunications network allows communication via a telecommunications line, such as an analog telephone line, a digital Tl line, a digital T3 line, or an OC3 telephony feed. The telecommunications network may include a public switched telephone network (PSTN) and/or a private system (e.g., cellular system) implemented with a number of switches, wire lines, fiber-optic cable, land-based transmission towers, space-based satellite transponders, etc. In one embodiment, the telecommunications network may include any other suitable communication system, such as a specialized mobile radio (SMR) system. As such, the telecommunications network may support a variety of communications, including, but not limited to, local telephony, toll (i.e., long distance), and wireless (e.g., analog cellular system, digital cellular system, Personal Communication System (PCS) , Cellular Digital Packet Data (CDPD) , ARDIS, RAM Mobile Data, Metricom Ricochet, paging, and Enhanced Specialized Mobile Radio (ESMR) ) . The telecommunications network may utilize various calling protocols (e.g., Inband, Integrated Services Digital Network (ISDN) and Signaling System No. 7 (SS7) call protocols) and other suitable protocols (e.g., Enhanced Throughput Cellular (ETC) , Enhanced Cellular Control (EC²) , MNP10, MNP10-EC, Throughput Accelerator (TXCEL) , Mobile Data Link Protocol, etc.). Transmissions over the telecommunications network system may be analog or digital. Transmission may also include one or more infrared links (e.g., IRDA) .

One or more limited display devices 18 may be coupled to the network maintained by service provider 16. Each limited display device 18 may comprise a communication device with limited capability for visual display. Thus, a limited display device 18 can be, for example, a wired telephone, a wireless telephone, a smart phone, a wireless personal digital assistant (PDA), and Internet televisions . Each limited display device 18 supports communication by a respective user, for example, in the form of speech, voice, or other audible information. Limited display devices 18 may also support dual tone multi-frequency (DTMF) signals .

Voice browsing system 10, as depicted in FIG. 1A, may be incorporated into a system maintained by service provider 16. Voice browsing system 10 is a computer-based system which generally functions to allow users with limited display devices 18 to browse content provided by one or more content providers 12 using, for example, spoken/voice commands or requests. In response to these commands or requests, voice browsing system 10, acting as a client, interacts with content providers 12 via Internet 14 to retrieve the desired content. Then, voice browsing system 10 delivers the desired content in the form of audible information to the limited display devices 18. To accomplish this, in one embodiment, voice browsing system 10 constructs or generates navigation trees using style sheet documents to supply metadata to conventional markup language (e.g., HTML or XML) documents. Navigation trees are semantic representations of web pages that serve as interactive menu dialogs to support voice-based search by users. Each navigation tree may comprise a number of content nodes and routing nodes. Content nodes contain content from a web page that can be delivered to a user. Routing nodes implement options that can be selected to move to other nodes. For example, routing nodes may provide prompts for directing the user to content at content nodes. Thus, routing nodes link the content of a web page in a meaningful way. Navigation trees are described in more detail herein.

Voice browsing system 10 thus provides a technical advantage. A voice-based browser is crucial for users having limited display devices 18 since a visual browser is inappropriate for, or simply cannot work with, such devices.

Furthermore, voice browsing system 10 leverages on the existing content infrastructure (i.e., documents formatted in conventional markup languages, such as, HTML or XML) maintained by content providers 12. That is, the existing content infrastructure can serve as an easy-to-administer, single source for interaction by both complete computer systems (e.g., desktop computer) and limited display devices 18 (e.g., wireless telephones or wireless PDAs) . As such, content providers 12 are not required to recreate their content in other formats, deploy new markup languages (e.g., VoiceXML), or implement additional application programming interfaces (APIs) into their back-end systems to support other formats and markup languages. Another Exemplary Environment

FIG. IB illustrates another exemplary environment within which a voice browsing system 10, according to an embodiment of the present invention, can operate. In this environment, voice browsing system 10 may be implemented within the system of a content provider 12. Content provider 12 can be substantially similar to that previously described with reference to FIG. 1A. That is, content provider 12 can be an entity which operates or maintains a portal or any other website through which content can be delivered. Such content can be included in one or more web pages of the respective portal or website maintained by content provider 12. Each web page can be supported by documents formatted in a conventional markup language, such as Hyper-Text Markup Language (HTML) or extensible Markup Language (XML) .

Along with the conventional markup language documents, content provider 12 may also maintain one or more style sheet (e.g., extended Cascading Style Sheet (xCSS) ) documents containing metadata that can be used to guide the construction of a semantic representation for the content.

A network 20 is coupled to content provider 12. Network 20 can be any suitable network for communicating data and information. This network can be a telecommunications or other network, as described with reference to FIG. 1A, supporting telephony and voice services, including plain old telephone service (POTS) , digital services, cellular service, wireless service, pager service, etc. A number of limited display devices 18 are coupled to network 20. These limited display devices 18 can be substantially similar to those described with reference to FIG. 1A. That is, each limited display device 18 may comprise a communication device with limited capability for visual display, such as, for example, a wired telephone, a wireless telephone, a smart phone, or a wireless personal digital assistant (PDA) . Each limited display device 18 supports communication by a respective user, for example, in the form of speech, voice, or other audible information.

In operation for this environment, voice browsing system 10 again generally functions to allow users with limited display devices 18 to browse content provided by one or more content providers 12 using, for example, spoken/voice commands or requests. In this environment, however, because voice browsing system 10 is incorporated at content provider 12, content provider 12 may directly receive, process, and respond to these spoken/voice commands or requests from users. For each command/request, voice browsing system 10 retrieves the desired content and other information at content provider 12. The content can be in the form of markup language (e.g., HTML or XML) documents, and the other information may include metadata in the form of style sheet (e.g., xCSS) documents. Voice browsing system 10 may construct or generate navigation trees using the style sheet documents to supply metadata to the conventional markup language documents. These navigation trees then serve as interactive menu dialogs to support voice-based search by users. Voice Browsing System

FIG. 2 is a block diagram of a voice browsing system 10, according to an embodiment of the present invention. In general, voice browsing system 10 allows a user of a limited display device 18 to browse the content available from any one or more content providers 12 using spoken/voice commands or requests. As depicted, voice browsing system 10 includes a gateway module 30 and a browser module 32. Gateway module 30 generally functions as a gateway to translate data/information between one type of network/computer system and another, thereby acting as an interface. In the context for the present invention, gateway module 30 translates data/information between a network supporting limited display devices 18 (e.g., a telecommunications network) and the computer- based system of voice browsing system 10. For the network supporting the limited display devices, data/information can be in the form of speech or voice.

The functionality of gateway module 30 can be performed by one or more suitable processors, such as a main-frame, a file server, a work station, or other suitable data processing facility supported by memory (either internal or external) , running appropriate software, and operating under the control of any suitable operating system (OS) , such as MS-DOS, Macintosh OS, Windows NT, Windows 95, OS/2, Unix, Lynix, Xenix, and the like. Gateway module 30, as shown, comprises a computer telephony interface (CTI) /personal digital assistant (PDA) component 34, an automated speech recognition (ASR) component 36, and a text-to- speech (TTS) component 38. Each of these components 34, 36, and 38 may comprise one or more programs which, when executed, perform the functionality described herein. CTI/PDA component 34 generally functions to support communication between voice browsing system 10 and limited display devices. CTI/PDA component 34 may comprise one or more application programming interfaces (API) for communicating in any protocol suitable for public switch telephone network (PSTN) , cellular telephone network, smart phones, pager devices, and wireless personal digital assistant (PDA) devices. These protocols may include hypertext transport protocol (HTTP) , which supports PDA devices, and PSTN protocol, which supports cellular telephones . Automated speech recognition component 36 generally functions to recognize speech/voice commands and requests issued by users into respective limited display devices 18. Automated speech recognition component 36 may convert the spoken commands/requests into a text format . Automated speech recognition component 36 can be implemented with automatic speech recognition software commercially available, for example, from the following companies: Nuance Corporation of Menlo Park, CA; Applied Language Technologies, Inc. of Boston, MA; Dragon Systems of Newton, MA; and PureSpeech, Inc. of Cambridge, MA. Such commercially available software typically can be modified for particular applications, such as a computer telephony application. Text-to-speech component 36 generally functions to output speech or vocalized messages to users having a limited display device 18. This speech can be generated from content that has been retrieved from a content provider 12 and reformatted within voice browsing system 10, as described herein. Text-to-speech component 38 synthesizes human speech by "speaking" text, such as that which can be part of the content. Software for implementing text-to-speech component 76 is commercially available, for example, from the following companies: Lernout & Hauspie of leper, Belgium; AcuVoice, Inc. of San Jose, CA; Centigram

Communications Corporation of San Jose, CA; Digital Equipment Corporation (DEC) of Maynard, MA; Lucent Technologies of Murray Hill, NJ; and Entropic Research Laboratory, Inc. of Washington, D.C. Browser module 32, coupled to gateway module 30, functions to provide access to web pages (of any one or more content providers 12) using Internet protocols and controls navigation of the same. Browser module 32 may organize the content of any web page into a structure that is suitable for browsing by a user using a limited display device 18. Afterwards, browser module 32 allows a user to browse such structure, for example, using voice or speech commands/requests.

The functionality of browser module 32 can be performed by one or more suitable processors, such as a main-frame, a file server, a work station, or other suitable data processing facility supported by memory (either internal or external) , running appropriate software, and operating under the control of any suitable operating system (OS) , such as MS-DOS, Macintosh OS, Windows NT, Windows 95, OS/2, Unix, Lynix, Xenix, and the like. Such processors can be the same or separate from that which perform the functionality of gateway module 30. As depicted, browser module 32 comprises a navigation tree builder component 40 and a navigation agent component 42. Each of these components 40 and 42 may comprise one or more programs which, when executed, perform the functionality described herein. Navigation tree builder component 40 may receive conventional, Internet-accessible markup language (e.g., XML or

HTML) documents and associated style sheet (e.g., xCSS) documents from one or more content providers 12. Using these markup language and style sheet documents, navigation tree builder component 40 generates navigation trees that are semantic representations of web pages. In general, each navigation tree provides a hierarchical menu by which users can readily navigate the content of a conventional markup language document. Each navigation tree may include a number of nodes, each of which can be either a content node or a routing node. A content node comprises content that can be delivered to a user. A routing node may implement a prompt for directing the user to other nodes, for example, to obtain the content at a specific content node.

Navigation agent component 42 generally functions to support the navigation of navigation trees once they have been generated by navigation tree builder component 40. Navigation agent component 42 may act as an interface between browser module 32 and gateway module 30 to coordinate the movement along nodes of a navigation tree in response to any commands and requests received from users.

In exemplary operation, a user may communicate with voice browsing system 10 to obtain content from content providers 12. To do this, the user, via limited display device 18, places a call which initiates communication with voice browsing system 10, as supported by CTI/PDA component 34 of gateway module 30. The user then issues a spoken command or request for content, which is recognized or interpreted by automatic speech recognition component 36. In response to the recognized command/request, browser module 32 accesses a web page containing the desired content (at a website or portal operated by a content provider 12) via Internet 14 or other communication network. Browser module 32 retrieves one or more conventional markup language and associated style sheet documents from the content provider. Using these markup language and style sheet documents, navigation tree builder component 40 creates one or more navigation trees. The user may interact with voice browsing system 10, as supported by navigation agent component 42, to navigate along the nodes of the navigation trees. During navigation, gateway module 30 may convert the content at various nodes of the navigation trees into audible speech that is issued to the user, thereby delivering the desired content. Browser module 32 may generate and support the navigation of additional navigation trees in the event that any other command/request from the user invokes another web page of the same or a different content provider 12. When a user has obtained all desired content, the user may terminate the call, for example, by hanging up.

Navigation Tree Builder Component FIG. 3 is a block diagram of a navigation tree builder component 40, according to an embodiment of the present invention. Navigation tree builder component 40 generally functions to construct navigation trees 50 which can be used to readily and orderly provide the content of respective web pages to a user via a limited display device 18. As depicted, navigation tree builder 40 comprises a markup language parser 52, a style sheet parser 54, and a tree converter 56. Each of markup language parser 52, style sheet parser 54, and tree converter 56 may comprise one or more programs which, when executed, perform the functionality described herein.

Markup language parser 52 receives conventional, Internet- accessible markup language (e.g., HTML or XML) documents 58 from a content provider 12. Conventional markup languages describe how content should be structured, formatted, or displayed. To accomplish this, conventional markup languages may embed tags to specify spans, frames, paragraphs, ordered lists, unordered lists, headlines, tables, table rows, objects, and the like, for organizing content. Each markup language document 58 may serve as the source for a web page. Markup language parser 52 parses the content contained within a markup language document 58 in order to generate a document tree 60. In particular, markup language parser 52 can map each markup language document into a respective document tree 60.

Each document tree 60 is a basic data representation of content. An exemplary document tree 60 is illustrated in FIG. 5. Document tree 60 organizes the content of a web page based on, or according to, the formatting tags of a conventional markup language. The document tree is a graphic representation of a HTML document. A typical document tree 60 includes a number of document tree nodes. As depicted, these document tree nodes include an HTML designation (HTML) , a header (<HEAD>) and a body (<BODY>) , a title (<TITLE>) , metadata (<META>) , one or more headlines (<H1>, <H2>) , lists (<LI>) , unordered list (<UL>) , a paragraph (<P>) . The nodes of a document tree may comprise content and formatting information. For example, each node of the document tree may corresponds to either HTML markup tags or plain text. The content of a markup element appears as its child in the document tree. For example, the header (<HEAD>) may have content in the form of the phrase "About Our Organization" along with formatting information which specifies that the content should be presented as a header on the web page.

Document tree 60 is designed for presenting a number of content elements simultaneously. That is, the organization of web page content according to the formatting tags of conventional markup language documents is appropriate, for example, for a visual display in which textual information can be presented at once in the form of headers, lines, paragraphs, tables, arrays, lists, and the like, along with images, graphics, animation, etc. However, the structure of a document tree 60 is not particularly well-suited for presenting content serially, for example, as would be required for a audio presentation in which only a single element of content can be presented at a given moment . Specifically, in an audio context, the formatting information of a document tree 60 does not provide meaningful connections or links for the content of a web page. For example, formatting information specifying that content should be displayed as a header does not translate well for an audio presentation of the content. In addition, much of the formatting information of a document tree 60 does not constitute meaningful content which may be of interest to a user. For example, the nodes for header (<HEAD>) and body (<B0DY>) are not intrinsically interesting. In fact, the header (<HEAD>) --comprising title (<TITLE>) and metadata (<META>) --does not generally contain information that should be presented directly to the user.

Style sheet parser 54 receives one or more style sheet (e.g., xCSS) documents 62. Style sheet documents 62 provide templates for applying style information to the elements of various web pages supported by respective conventional markup language documents 58. Each style sheet document 62 may supply or provide metadata for the web pages. For example, using the metadata from a style sheet document 62, audio prompts can be added to a standard web page. This metadata can also be used to guide the construction of a semantic representation of a web page. The metadata may comprise or specify rules which can be applied to a document tree 60. Style sheet parser 54 parses the metadata from a style sheet document 62 to generate a style tree 64. Each style tree 64 may be associated with a particular document tree 60 according to the association between the respective style sheet documents 62 and conventional markup language documents 58. A style tree 64 organizes the rules (specified in metadata) into a structure by which they can be efficiently applied to a document tree 60. A tree structure for the rules is useful because the application of rules can be a hierarchical process. That is, some rules are logically applied only after other rules have been applied.

Tree converter 56, which is in communication with markup language parser 52 and style sheet parser 54, receives the document trees 60 and style trees 64 therefrom. Using the document trees 60 and style trees 64, tree converter 56 generates navigation trees 50. Among other things, tree converter 56 may apply the rules of a style tree 64 to the nodes of a document tree 60 when generating a navigation tree 50. Furthermore, tree converter 56 may apply other rules (heuristic rules) to each document tree, and thereafter, may map various nodes of the document tree into nodes of a navigation tree 50.

A navigation tree 50 organizes content of a conventional markup language document 58 into a hierarchical or outline structure. With the hierarchical structure, the various elements of content are separated into various levels (e.g., parts, sub- parts, sub-sub-parts etc.). Appropriate mechanisms are provided to allow movement from one level to another and across the levels. The hierarchical arrangement of a navigation tree 50 is suitable for presenting content sequentially, and thus can be used for "semantic" retrieval of the content at a web page. As such, the navigation tree 50 can serve as an index that is suitable for browsing content using voice commands. An exemplary navigation tree 50 is illustrated in FIG. 6. A navigation tree 50 is, in general, made up of routing nodes and content nodes. Content nodes may comprise content that can be delivered to a user. Content nodes can be of various types, such as, for example, general content nodes, table nodes, and form nodes. Table nodes present a table of information. Form nodes can be used to assist in the filling out of respective forms. Routing nodes are unique to navigation trees 50 and are generated according to rules applied by tree converter 56. Routing nodes can be used to move between nodes. The routing nodes are interconnected by directed arcs (edges or links) . These directed arcs are used to construct the hierarchical relationship between the various nodes in the navigation tree 50. That is, these arcs specify allowable navigation traversal paths to move from one node to another. In FIG. 6, for example, an unordered list node (<UL>) is a routing node for moving to list nodes (<L1>, <L2>) . The options for other nodes may be explicitly included in the routing node.

Content nodes are not reachable by tree traversal operations. The data found in content nodes must be accessed through a parent routing node called a group node. The group node organizes content nodes into a single presentational unit. The group node can be used for organizing multi-media content. For example, rather than present text and links as disjointed content, a group node can be used to organize a collection of text, audio wave files, and URI links together such as the following:

For more information about <A href = "http: ///www. vocalpoint . com/sound. av" > vocalpoint </A>, send email to: <A href = info@vocalpoint . com> info@vocalpoint . com </A>.

As such, routing nodes provide the nexus or connection between content nodes, and thus provide meaningful links for the content of a web page. In this way, routing nodes support or provide a semantic, hierarchical relationship for web page content in a navigation tree 50. An exemplary object-oriented implementation for routing and content nodes of a navigation tree is provided in attached Appendix A.

In one embodiment, a navigation tree 50 can be used to define a finite state machine. In particular, various nodes of the navigation tree may correspond to states in the finite state machine. Navigation agent component 42 may use the navigation tree to directly define the finite state machine. The finite state machine can be used by navigation agent 42 of browser module 32 to move throughout the hierarchical structure. At any current state/node, a user can advance to another state/node.

Tree Converter

FIG. 4 is a block diagram of a tree converter 56, according to an embodiment of the present invention. Tree converter 56 generally functions to convert document trees 60 into navigation trees 50, for example, using style trees 64. As depicted, tree converter 56 comprises a style sheet engine 68, a heuristic engine 70, and a mapping engine 72. Each of style sheet engine 68, heuristic engine 70, and mapping engine 72 may comprise one or more programs which, when executed, perform the functionality described herein.

Style sheet engine 68 generally functions to apply style sheet rules to a document tree 60. Application of style sheet rules can be done on a rule-by-rule basis to all applicable nodes of the document tree 60. These style sheet rules can be part of the metadata of a style sheet document 62. Each style sheet rule can be a rule generally available in a suitable style sheet language of style sheet document 62. In one embodiment, these style sheet rules may include, for example, clipping, pruning, filtering, and converting. In a clipping operation, a node of a document tree is marked as special so that the node will not be deleted or removed by other operations. Clipping may be performed for content that is important and suitable for audio presentation (e.g., text which can be "read" to a user) . In a pruning operation, a node of a document tree is eliminated or removed. Pruning may be performed for content that is not suitable for delivery via speech or audio. This can include visual information (e.g., images or animation) at a web page. Other content that can be pruned may be advertisements and legal disclaimers at each web page. In a filtering operation, auxiliary information is added at a node. This auxiliary information can be, for example, labels, prompts, etc. In a conversion operation, a node is changed from one type into another type. For example, some content in a conventional markup language document can be in the form of a table for presenting information in a grid-like fashion. In a conversion, such table may be converted into a routing node in a navigation tree to facilitate movement among nodes and to provide options or choices .

As depicted, style sheet engine 68 comprises a selector module 74 and a rule applicator module 76. In general, selector module 74 functions to select or identify various nodes in a document tree 60 to which the rules may be applied to modify the tree. After various nodes of a particular document tree 60 have been selected by selector module 74, rule applicator module 76 generally functions to apply the various style tree rules (e.g., clipping, pruning, filtering, or converting) to the selected nodes as appropriate in order to modify the tree.

Heuristic engine 70 is in communication with style sheet engine 68. Heuristic engine 70 generally functions to apply one or more heuristic rules to the document tree 60 as modified by style sheet engine 68. In one embodiment, these heuristic rules may be applied on a node-by-node basis to various nodes of document tree 60. Each heuristic rule comprises a rule which may be applied to a document tree according to a heuristic technique. A heuristic technique is a problem-solving technique in which the most appropriate solution of several found by alternative methods is selected at successive stages of a problem-solving process for use in the next step of the process. In the context of the present invention, the problem- solving process involves the process of converting a document tree 60 into a navigation tree 50. In this process, heuristic rules are selectively applied to a document tree after the application of style sheets rules and before a final mapping into navigation tree 50, as described below) .

In one embodiment, heuristic rules may include, for example, converting paragraph breaks and line breaks into space breaks (white space) , exploiting image alternate tags, deleting decorative nodes, merging content and links, and building outlines from headlines and ordered lists. The operation for converting paragraph breaks and line breaks into space breaks is done to eliminate unnecessary formatting in the textual content at a node while maintaining suitable delineations between elements of text (e.g., words) so that the elements are not concatenated. The operation for exploiting image alternative tags identifies and uses any image alternative tags that may be part of the content contained at a particular node. An image alternative tag is associated with a particular image and points to corresponding text that describes the image . Image alternative tags are generally designed for the convenience of users who are visually impaired so that alternative text is provided for the particular image. The operation for deleting decorative nodes eliminates content that is not useful in a navigation tree 50. For example, a node in the document tree 60 consisting of only an image file may be considered to be a decorative node since the image itself cannot be presented to a user in the form of speech or audio, and no alternative text is provided. The operation for merging content and links eliminates the formatting for a link (e.g., a hypertext link) is done so that the text for the link is read continuously as part of the content delivered to a user. The operation for building or generating outlines from headlines and ordered lists is performed to create the hierarchical structure of the navigation tree 50. A headline- -which can be, for example, a heading for a section of a web page- -is identified by suitable tags within a conventional markup language document. In a visually displayed web page, multiple headlines may be provided for a user's convenience. These headlines may be considered alternatives or options for the user's attention. An ordered list is a listing of various items, which in some cases, can be options. Heuristic engine 70 may arrange or organize headlines and ordered lists so that the underlying content is presented in the form of an outline.

Mapping engine 72 is in communication with heuristic engine 70. In general, mapping engine 72 performs a mapping function that changes certain elements in a modified document tree 60 into appropriate nodes for a navigation tree 50. Mapping engine 72 may operate on a node-by-node basis to provide such mapping function. In one embodiment, the content at a node in document tree 60 is mapped to create a content node in the navigation tree 50. Ordered lists, unordered lists, and table rows are mapped into suitable routing nodes of the navigation tree 50. Any table in document tree 60 may be mapped to create a table node in the navigation tree 50. A form in a document tree 60 can be mapped to create a form node in the navigation tree 50. A form may comprise a number of fields which can be filled in by a user to collect information. Form elements in the document tree 60 can be mapped into a form handling node in navigation tree 50. Form elements provide a standard interface for collecting input from the user and sending that information to a Web server.

Computer-Based System FIG. 7 illustrates a computer-based system 80 which is an exemplary hardware implementation for voice browsing system 10. In general, computer-based system 80 may include, among other things, a number of processing facilities, storage facilities, and work stations. As depicted, computer-based system 80 comprises a router/firewall 82, a load balancer 84, an Internet accessible network 86, an automated speech recognition (ASR) /text-to-speech (TTS) network 88, a telephony network 90, a database server 92, and a resource manager 94.

These computer-based system 80 may be deployed as a cluster of networked servers. Other clusters of similarly configured servers may be used to provide redundant processing resources for fault recovery. In one embodiment, each server may comprise a rack-mounted Intel Pentium processing system running Windows NT, Linux OS, UNIX, or any other suitable operating system. For purposes of the present invention, the primary processing servers are included in Internet accessible network 86, automated speech recognition (ASR) /text-to-speech (TTS) network 88, and telephony network 90. In particular, Internet accessible network 86 comprises one or more Internet access platform (IAP) servers. Each IAP servers implements the browser functionality that retrieves and parses conventional markup language documents supporting web pages. Each IAP servers builds the navigation trees 50 (which are the semantic representations of the web pages) and generates the navigation dialog with users. Telephony network 90 comprises one or more computer telephony interface (CTI) servers. Each CTI server connects the cluster to the telephone network which handles all call processing. ASR/TTS network 88 comprises one or more automatic speech recognition (ASR) servers and text-to-speech (TTS) servers. ASR and TTS servers are used to interface the text -based input/output of the IAP servers with the CTI servers . Each TTS server can also play digital audio data.

Load balancer 84 and resource manager 94 may cooperate to balance the computational load throughout computer-based system 10 and provide fault recovery. For example, when a CTI server receives an incoming call, resource manager 94 assigns resources (e.g., ASR server, TTS server, and/or IAP server) to handle the call. Resource manager 94 periodically monitors the status of each call and in the event of a server failure, new servers can be dynamically assigned to replace failed components. Load balancer 84 provides load balancing to maximize resource utilization, reducing hardware and operating costs.

Computer-based system 80 may have a modular architecture. An advantage of this modular architecture is flexibility. Any of these core servers- -i . e . , IAP servers, CTI servers, ASR servers, and TTS servers--can be rapidly upgraded ensuring that voice browsing system 10 always incorporate the most up-to-date technologies .

Method For Browsing Content With Voice Commands

FIG. 8 is a flow diagram of an exemplary method 100 for browsing content with voice commands, according to an embodiment of the present invention. Method 100 may correspond to an aspect of operation of web browsing system 10.

Method 100 begins at step 102 where voice browsing system 10 receives at gateway module 30 a call from a user, for example, via a limited display device 18. In the call, the user may convey or issue a command or request . Such command or request can be in the form of voice or speech, and may pertain to particular content 15. This content 15 may be contained in a web page at a website or portal maintained by a content provider 12. Automatic speech recognition (ASR) component 36 of gateway module 30 operates on the voice/speech to recognize the user' s command or request for content 15. Gateway module 30 forwards the request to browser module 32.

At step 104, in response to the request, voice browsing system 10 initiates a web browsing session for this interaction with the user. At step 106, browser module 32 loads or fetches a markup language document 58 supporting the web page that contains the desired content 15. This markup language document can be, for example, an HTML or an XML document. Browser module 32 may also load or retrieve one or more style sheet documents 62 which are associated with the markup language document 58. At step 108, browser module 32 adds an identifier (e.g., a uniform resource locator (URL) ) for the web page to a list maintained within voice browsing system 10. This is done so that voice browsing system 10 can keep track of each web page from which it has retrieved content; thus, at least some of the operations which voice browsing system 10 performs for any given web in response to an initial request do not need to be repeated in response to future requests relating to the same web page.

At step 110, navigation tree builder component 40 of browser module 32 builds a navigation tree 50 for the target web page. In one embodiment, to accomplish this, navigation tree builder component 40 may generate a document tree 60 from the conventional markup language document 58 and a style tree 64 from the style sheet document 62. The document tree 60 is then converted into the navigation tree 50, in part, using the style tree 64. The navigation tree 50 provides a semantic representation of the content contained in the target web page that is suitable for voice or audio commands. The navigation tree 50 comprises a plurality of nodes. Each such node may contain a portion of the content of the target web page or may provide prompts for directing the user to content. As such, navigation tree 50 enables a user to readily browse the content 15 of the web page with voice commands and requests.

At step 112, navigation agent component 42 of browser module 32 begins browsing of the content by starting at a root node of the navigation tree 50. The root node may comprise a number of different options from which a user can select, for example, to obtain content or to move to another node . To present these various options of the root node to the user, text-to-speech (TTS) component 38 of gateway module 30 may generate speech for the options, which is then delivered to the user via limited display device 18.

The user may then select one of the presented options, for example, by issuing a voice command which is recognized by automatic speech recognition component 36. At step 114, browsing module 32 moves to or "visits" the node of navigation tree 50 which is related to the user's selection.

At step 116, navigation agent component 42 determines whether the visited node is a routing node. A routing node is a node which may comprise a plurality of options from which the user may select in order to navigate through the navigation tree 50 in order to get to other nodes. If it is determined that the visited node is a routing node, then at step 118 browsing module 32 generates various prompts based upon the options available in the routing node. At step 120, text -to-speech component 38 plays the prompts to a user, for example, via limited display device 18. At step 122, gateway module 30 collects user input in response to the prompts played by text-to-speech component 38. This user input may specify or select among the various options offered by the prompts. At step 124, browser module 32 sets the current node to that node of navigation tree 50 which coincides with the userA choice. Method 100 then moves to step 114, where browser module 32 visits the node of the user's choice.

On the other hand, if it is determined at step 116 that the current node is not a routing node, then at step 126 browser module 32 determines whether the current node is a form node. A form node is a node that relates to a respective form for collecting information or data. Such form may comprise a number of fields that can be filled out by a user. An exemplary form can be an order form which may be filled out in order to complete an electronic transaction via the website or portal associated with content provider 12.

If it is determined at step 126 that the current node is a form node, then method 100 moves to step 128 where browser module 32 and gateway module 30 cooperate to generate a dialog for filling out the respective form. The dialog may involve a series of questions that can be presented to a user by text-to-speech component 38. That is, in the dialog, text-to-speech component 30 may issue a number of prompts asking the user for input to fill in various fields of the form. At step 130, gateway module 30 collects input from the user. One or more voice macros can be used to facilitate input collection. Voice macros map complex input to a simple voice command, increasing the convenience of data collection and improving the performance of a speech recognition task. For example, a voice macro can be created to map a user's credit card number to the phrase "my credit card." Then, when prompted at a form, the user may enter his or her credit card number simply by saying "my credit card" . At step 132, browser module 32 determines whether the form has been completed. If the form has not been completed, then method 100 returns to step 128, where the voice browsing system 10 continues to generate a dialog for filling out the form. In this case, prompts are played only for those fields that have not yet been filled out in the form. Steps 128 through 132 are repeated until the form is completed. When it is determined at step 132 that the form has been completed, then at step 134 voice browsing system 10 may submit the completed form to content provider 12 for further processing. The content provider 12 may then, for example, initiate completion of an electronic transaction, for example, by directing that a particular good be shipped to the user.

Referring again to step 126, if it is determined that the current node is not a form node, then at step 136 voice browsing system 10 determines whether the current node is a content node. A content node generally comprises information or content that can be presented to a user. If the current node is a content node, then at step 138 voice browsing system 10 plays the content to the user, for example, using text-to-speech component 38.

Afterwards, method 100 returns to step 114, where another node is visited.

Otherwise, if it is determined at step 136 that the current node is not a content node, then at step 140 voice browsing system 10 determines whether the current node is a help node. A help node may comprise instructions to assist or guide the users during an interactive voice browsing session. If it is determined that the current node is a help noαe, then at step 142 voice browsing system 10 plays the content of the help node in order to assist or guide the user. Afterwards, method 100 returns to step 114, where another node is visited. On the other hand, if it is determined at step 140 that the current node is not a help node, then at step 144 voice browsing system 10 determines whether the current node is unknown to the system. A node may be unknown to the system if the command or request received from the user is indecipherable or unknown. If the current node is unknown, then voice browsing system 10 may deliver an appropriate message or prompt for notifying the user of such fact. At step 146 voice browsing system 10 computes the next page to be presented to a user. This can be done to inform the user that the current selection or request is not appropriate. After the next page has been computed or calculated, method 100 moves to step 106, where the conventional markup language document 58 supporting the computed next page is retrieved. Otherwise, if it is determined at step 144 that the current node is not an unknown node, then at step 148 voice browsing system 10 determines whether the current interactive session with the user should be ended. This can be done if, for example, a predetermined time has elapsed in which a user has not responded or, alternatively, if the user has actively taken action to end the session, such as, for example, hanging up. If it is determined that the current session should not be ended, then method 100 returns to step 114, where another node is visited. Otherwise, if it is determined that the session should be ended, then method 100 ends.

Various steps in method 100 may be repeated throughout an interactive session with a user to generate navigation trees 50 and allow a user to obtain content and to move throughout the nodes of each navigation tree 50. Thus, a user is able to browse the content available at the web pages of a website or portal maintained by content provider 12 using voice commands or speech commands. This can be done with the existing infrastructure of conventional markup language documents of the website.

Accordingly, content provider 12 is not required to set up and maintain a separate site in order to provide access and content to users .

Method For Generating a Navigation Tree

FIG. 9 is a flow diagram of an exemplary method 200 for generating a navigation tree 50, according to an embodiment of the present invention. Method 200 may correspond to the operation of navigation tree builder component 40 of browser module 32.

Method 200 begins at step 202 where navigation tree builder component 40 receives a conventional markup language document 58 from a content provider 12. The conventional markup language document, which may support a respective web page, may comprise content 15 and formatting for the same. At step 204, markup language parser 52 parses the elements of the received markup language document 58. For example, content 15 in the markup language document 58 may be separated from formatting tags. At step 206, markup language parser 52 generates a document tree 60 using the parsed elements of the conventional markup language document 58. At step 208, navigation tree builder component 40 receives a style sheet document 62 from the same content provider 12. This style sheet document 62 may be associated with the received conventional markup language document 58. The style sheet document 62 provides metadata, such as declarative statements (rules) and procedural statements. At step 210, style sheet parser 54 parses the style sheet document 62 to generate a style tree 64.

Tree converter 56 receives the document tree 60 and the style tree 64 from markup language parser 52 and style sheet parser 54, respectively. At step 212, tree converter 56 generates a navigation tree 50 using the document tree 60 and the style tree 64. In one embodiment, among other things, tree converter 56 may apply style sheet rules and heuristic rules to the document tree 60, and map elements of the document tree 60 into nodes of the navigation tree 50. Afterwards, method 200 ends .

Method For Applying Style Sheet Rules To a Document Tree

FIG. 10 is a flow diagram of an exemplary method 300 for applying style sheet rules to a document tree 60, according to an embodiment of the present invention. Method 300 may correspond to the operation of style sheet engine 68 in tree converter 56 of voice browsing system 10. In general, style sneet engine 68 selects various nodes of a document tree 60 and applies style sheet rules to these nodes as part of the process of converting the document tree 60 into a navigation tree 50. Method 300 begins at step 302, where selector module 74 of style sheet engine 68 selects various nodes of a document tree 60 for clipping. As used herein, clipping may comprise saving the various selected nodes so that these nodes will remain or stay intact during the transition from document tree 60 into navigation tree 50. Nodes are clipped if they are sufficiently important. At step 304, rule applicator module 76 clips the selected nodes .

At step 306, selector module 74 selects various nodes of the document tree 60 for pruning. As used herein, pruning may comprise eliminating or removing certain nodes from the document tree 60. For example, nodes are desirably pruned if they have content (e.g., image or animation files) that is not suitable for audio presentation. At step 308, rule applicator module 76 prunes the selected nodes . At step 310, selector module 74 of style sheet engine 68 selects certain nodes of the document tree for filtering. As used herein, filtering may comprise adding data or information to the document tree 60 during the conversion into a navigation tree 50. This can be done, for example, to add information for a prompt or label at a node. At step 312, rule applicator module 76 filters the selected nodes. At step 314, selector module 74 selects certain nodes of document tree 60 for conversion. For example, a node in a document tree having content arranged in a table format can be converted into a routing node for the navigation tree. At step 316, rule applicator module 76 converts the selected nodes. Afterwards, method 300 ends.

Method For Applying Heuristic Rules To a Document Tree

FIG. 11 is a flow diagram of an exemplary method 400 for applying heuristic rules to a document tree 60, according to an embodiment of the present invention. In one embodiment, method 400 may correspond to the operation of heuristic engine 70 in tree converter 56 of voice browsing system 10. These heuristic rules can be learned by heuristic engine 70 during the operation of voice browsing system 10. Each of the heuristic rules can be applied separately to various nodes of the document tree 60. Application of heuristic rules can be done on a node-by-node basis during the transformation of a document tree 60 into a navigation tree 50. Method 400 begins at step 402, where heuristic engine 70 selects a node of document tree 60. At step 404, heuristic engine 70 may convert page and line breaks in the content contained at such node into white space . This is done to eliminate unnecessary formatting and yet not concatenate content (e.g., text) . At step 406, heuristic engine 70 exploits image alternative tags within the content of a web page. These image alternative tags generally point to content which is provided as an alternative to images in a web page. This content can be in the form of text which is read or spoken to a user with a hearing impairment (e.g., deaf) . Since this alternative content is appropriate for delivery by speech or audio, heuristic engine 70 exploits the image alternative tags.

At step 408, if the node is decorative, heuristic engine 70 deletes such node from the document tree 60. In one embodiment, nodes may be considered to be decorative if they do not provide any useful function in a navigation tree 50. For example, a content node consisting of only an image file may be considered to be decorative since the image cannot be presented to a user in the form of speech or audio.

At step 410, heuristic engine 70 merges together content and associated links at the node in order to provide a continuous flow of data to a user. Otherwise, the internal links would act as disruptive breaks during the delivery of content to users. At step 412, heuristic engine 70 builds outlines of headlines and ordered lists in the document tree.

After all applicable heuristic rules have been applied to the current node, then at step 414 heuristic engine 70 determines whether there are any other nodes in the document tree 60 which should be processed. If there are additional nodes, then method 400 returns to step 402, where the next node is selected. Steps 402 through 414 are repeated until the heuristic rules are applied to all nodes of the document tree 60. When it is determined at step 414 that there are no other nodes in the document tree, method 400 ends. Method For Mapping a Document Tree Into a Navigation Tree

FIG. 12 is a flow diagram of an exemplary method 500 for mapping a document tree 60 into a navigation tree 50, according to an embodiment of the present invention. Method 500 may correspond to the operation of mapping engine 72 in tree converter 56 of navigation tree builder component 40. Method 500 may be performed on a node-by-node basis during the transformation of a document tree 60 into a navigation tree 50. Method 500 begins at step 502, where mapping engine 72 selects a node of the document tree 60. At step 504, mapping engine 72 determines whether the selected node contains content. If the selected node contains content, then at step 506 mapping engine 72 creates a content node in the navigation tree 50. A content node of the navigation tree 50 comprises content that can be presented or played to a user, for example, in the form of speech or audio, during navigation of the navigation tree 50. Afterwards, method 500 returns to step 502, where the next node in the document tree is selected. Otherwise, if it is determined at step 504 that the current node is not a content node, then at step 508 mapping engine 72 determines whether the selected node contains an ordered list, an unordered list, or a table row. If the currently selected node comprises an ordered list, an unordered list, or a TR, then at step 510 mapping engine 72 creates a suitable routing node for the navigation tree 50. Such routing node may comprise a plurality options which can be selected in the alternative to move to another node in the navigation tree 50. Afterwards, method 500 returns to step 502, where the next node is selected. On the other hand, if it is determined at step 508 that the currently selected node does not contain any of an ordered list, an unordered list, or a TR, then at step 512 mapping engine 72 determines whether the currently selected node of the document tree is a node for a table. If it is determined at step 512 that the node is a table node, then at step 514 mapping engine 72 creates a suitable table node for the navigation tree 50. A table node in the navigation tree 50 is used to hold an array of information. A table node in navigation tree 50 can be a routing node. Afterwards, method 500 returns to step 502, where the next node is selected.

Alternatively, if it is determined at step 512 that the currently selected node is not a table node, then at step 516 mapping engine 72 determines whether the node of the document tree 60 contains a form. Such form may have a number of fields which can be filled out in order to collect information from a user. If it is determined that the current node of the document tree 40 contains a form, then at step 518 mapping engine 72 creates an appropriate form node for the navigation tree 50. A form node may comprise a plurality prompts which assist a user in filling out fields. Afterwards, method 500 returns to step 502, where the next node is selected. Otherwise, if it is determined at step 516 that the current node does not contain a form, then at step 520 mapping engine 72 determines whether there are form elements at the node. Form elements can be used to collect input from a user. The information is then sent to be processed by a Web server. If there are form elements at the node, then at step 522 mapping engine 72 maps a form handling node to the form elements. Form handling nodes are provided in navigation tree 50 to collect input . This can be done either with direct input or with voice macros. Afterwards, method 500 returns to step 502 where another node is selected.

On the other hand, if it is determined at step 520 that the current node of the document tree 60 does not contain form elements, then at step 524 mapping engine 72 determines whether there are any more nodes in the document tree 60. If there are other nodes, then method 500 returns to step 502, where the next node is selected. Steps 502 through 524 are repeated until mapping engine 72 has processed all nodes of the document tree 60, for example, to map suitable nodes into navigation tree 50. Thus, when it is determined at step 524 that there are no other nodes in the document tree, method 500 ends.

Although particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that changes and modifications may be made without departing from the present invention in its broader aspects, and therefore, the appended claims are to encompass within their scope all such changes and modifications that fall within the true scope of the present invention. Appendix A

Classes/Types of Nodes

There are two broad classes of nodes found in a navigation tree: routing nodes and content nodes. Routing nodes can be of different types, including, for example, general routing nodes, group nodes, input nodes, array nodes, and form nodes. Content nodes can also by of different types, including, for example, text and element. The allowable children type for each node can be as follows:

General Routing Node <ROUTE> Group Node, Routing Node

Group Node <GROUP> Content Node, Group Node

Input Node <INPUT> Content Array Node <ARRAY> Group Node

Form Node <FORM> : Input Node Text Node <TEXT> Element Node <ELEM>

Each of the routing node types can be "visited" by a tree traversal operation, which can be either step navigation or rapid access navigation. General routing nodes (<R0UTE>) permit stepping to their children. Group nodes (<GR0UP>) do not permit stepping to their children. Content nodes are the container objects for text and markup elements. Content nodes are not routing nodes and hence are not reachable other than through a routing node. Note that all content nodes should have a group node for a paren . A group node can retrieve data contained in the children content nodes. Element nodes correspond to various generic tags including anchor, formatting, and unknown tags. Element nodes can be implemented either by retaining an original SGML/XML tag or setting a tag attribute of the <ELEM> markup tag could contain to the SGML/XML tag.

Data fields

Every node has a basic set of attributes . These attributes can be used to generate interactive dialogs (e.g., voice commands and speech prompts) with the user.

// Attributes used by style sheet

String class; // class attribute String id; // id attribute

String style; // style attributes

// Properties best defined in a style sheet String element; // tag element of node

String node-type; // node type (e.g., Routing)

The "element" attribute stores the name of an SGML/XML element tag before conversion into the navigation tree. The "class" and "id" attributes are labels that can be used to reference the node. The "style" attribute specifies text to be used by the style sheet parser.

Group Node

A group node is a container for text, links, and other markup elements such as scripts or audio objects. A contiguous block of unmarked text, structured text markup, links, and text formatting markup are parsed into a set of content nodes. The group node is a parent that organizes these content nodes into a single presentational unit.

For example, the following HTML line:

Go to <A HREF = "http: ://www. vocalpoint . com" > Vocal Point </A>.

could be parsed into the form shown below:

<GROUP>

Go to <A HREF = http : //www. vocalpoint . com> Vocal Point </A> </GROUP>

This particular group node specifies that the three children nodes "Go to", anchor link "Vocal Point", and " ." should be presented as a single unit, not separately.

Text Link Text

A group node does not allow its children to be visited by a tree traversal operation. Content nodes can only have group nodes for parents. Consequently, content nodes are not directly reachable, but rather can only be accessed from the parent group node.

A group node can sometimes be the child of another content group. In this case, the child group node is also unreachable by tree traversal operations. A special class of group node called an array node must be used to access data in nested group nodes.

Input Node An input node is similar to a group node except for two differences. First, an input node can retrieve and store input from the user. Second, an input node can only be a child of a form node .

General routing Node

A general routing node is the basic building block for constructing hierarchical menus. General routing nodes serve as way points in the navigation tree to help guide users to content. The children of general routing nodes are other general routing nodes or group nodes. When visited, a general routing node will supply prompt cues describing its children. An exemplary structure for a general routing node and its children is as follows :

Array Node

An array node is used to build a multi -dimensional array representation of content . The HTML <TABLE> tag directly maps to an array node. To build up an array node from a document tree, information is extracted from the children element nodes.

Form Node

A form node is a parent of an input node . Form nodes collect input information from the user and execute the appropriate script to process the forms. Form nodes also control review and editing of information entered into the form. The HTML <F0RM> tag directly maps to a form node.

A Brief Introduction to HML Hierarchical markup language (HML) is designed to provide a file representation of the navigation tree. HML uses the specification for XML. Content providers may create content files using HML or translation servers can generate HML files from HTML/XML and XCSS documents. HML documents provide efficient representations of navigation trees, thus reducing the computation time needed to parse HTML/XML and XCSS.

Syntax

HML elements use the "h l" namespace. A list of these elements is provided below:

Abbreviated Document Type Definition

XML syntax is described using a document type definition (DTD). An abbreviated, partially complete, DTD for HML follows,

Generic Attributes

< ! ENTITY % coreattrs " id ID # -- document-wide unique id class CDATA # -- space sep. list of classes style %StyleSheet # — associated style info" >

< ! ENTITY % navattrs keys CDATA # -- space sep. list of keys descriptor CDATA # -- short description of node prompt CDATA # - - prompt greeting CDATA # -- greeting

<! ENTITY % attrs "coreattrs; navattrs;" >

<!--================ Text Markup _=_______-

<! ENTITY % special "A I OBJECT I SCRIPT" > <! ENTITY % inline "iPCDATA I .special;" >

Content Group

<! ELEMENT HML: GROUP - (.inline;)* (GROUP)* -- content group <!ATTLIST

%attrs ;

>

Routing Node <! ELEMENT HML: ROUTE - - (.inline )* (GROUP)* (ROUTE)* -- route <!ATTLIST

%attrs;

>

<.--========__=_=_== HTML Elements __=__=_=_=_=_=_=-->

< 1 ELEMENT A - - - - anchor -->

<!ELEMENT OBJECT - - -- object -->

<! ELEMENT SCRIPT - - - script —>

-Si

Claims

WHAT IS CLAIMED IS:

1. A computer system for allowing a user of a limited display device to browse content available from a data network, the system comprising: an interface operable to receive a request for the content from the user via the limited display device; and a processor coupled to the interface, the processor operable to retrieve a conventional markup language document containing the content from the data network, the processor operable to convert the conventional markup language document into a navigation tree which provides a semantic, hierarchical structure for the content .

2. The computer system of Claim 1 wherein the limited display device comprises a wireless telephone, a smart telephone, or a wireless personal digital assistant (PDA) .

3. The computer system of Claim 2 wherein the interface comprises a computer telephony interface (CTI) /personal digital assistant (PDA) component operable to support communication with the limited display device.

. The computer system of Claim 1 wherein the interface comprises an automated speech recognition (ASR) component operable to recognize speech input from a user.

5. The computer system of Claim 1 wherein the interface comprises a text-to-speech component operable to output speech for the content .

6. The computer system of Claim 1 wherein the conventional markup language document comprises a Hypertext Markup Language (HTML) document .

7. The computer system of Claim 1 wherein the conventional markup language document comprises an extensible Markup Language

(XML) document.

8. The computer system of Claim 1 wherein the processor is operable to retrieve a style sheet document from the data network, the style sheet document associated with the conventional markup language document and containing metadata which can be applied to the conventional markup language document .

9. The computer system of Claim 8 wherein the style sheet document comprises an extended Cascading Style Sheet (xCSS) document .

10. The computer system of Claim 8 wherein the processor is operable to generate a document tree from the conventional markup language document and to generate a style tree from the style sheet document .

11. The computer system of Claim 1 wherein the navigation tree comprises: a plurality of content nodes comprising the content; and at least one routing node comprising a plurality of options for moving between the nodes .

12. A method performed on a computer for allowing a user of a limited display device to browse content available from a data network, the method comprising: receiving a request for the content from the user via the limited display device; retrieving a conventional markup language document containing the content from the data network; and converting the conventional markup language document into a navigation tree which provides a semantic, hierarchical structure for the content .

13. The method of Claim 12 wherein the request is in the form of speech, the method further comprising recognizing the speech.

14. The method of Claim 12 wherein the conventional markup language document comprises a Hypertext Markup Language (HTML) document.

15. The method of Claim 12 wherein the conventional markup language document comprises an extensible Markup Language (XML) document .

16. The method of Claim 12 further comprising retrieving a style sheet document from the data network, the style sheet document associated with the conventional markup language document and containing metadata which can be applied to the conventional markup language document.

17. The method of Claim 16 wherein the style sheet document comprises an extended Cascading Style Sheet (xCSS) document.

18. The method of Claim 16 wherein converting comprises: generating a document tree from the conventional markup language document; and generating a style tree from the style sheet document .

19. The method of Claim 12 wherein converting comprises: generating a document tree from the conventional markup language document; and applying a plurality of style sheet rules to the document tree.

20. The method of Claim 12 wherein the navigation tree comprises : a plurality of content nodes comprising the content; and at least one routing node comprising a plurality of options for moving between the nodes .

21. The method of Claim 20 further comprising navigating between the content nodes of the navigation tree using the routing node.

22. A computer system for allowing a user of a limited display device to browse content available from a data network, the system comprising: a markup language parser operable to receive a conventional markup language document in response to a request for the content from the user via the limited display device, the markup language parser operable to generate a document tree from the conventional markup language document; a style sheet parser operable to receive a style sheet document in response to the request, the style sheet parser operable to generate a style tree from the style sheet document, the style tree comprising a plurality of style sheet rules; and a tree converter in communication with the markup language parser and the style sheet parser, the tree converter operable convert the document tree into a navigation tree using the style sheet tree rules, the navigation tree providing a semantic, hierarchical structure for the content .

23. The computer system of Claim 22 wherein the tree converter comprises a style sheet engine operable to apply the style sheet rules to the document tree.

24. The computer system of Claim 22 wherein the tree converter comprises a heuristic engine operable to apply a plurality of heuristic rules to the document tree.

25. The computer system of Claim 22 wherein the conventional markup language document comprises a Hypertext Markup Language (HTML) document.

26. The computer system of Claim 22 wherein the conventional markup language document comprises an extensible Markup Language (XML) document.

27. The computer system of Claim 22 wherein the style sheet document comprises an extended Cascading Style Sheet (xCSS) document .

28. A method performed on a computer for allowing a user of a limited display device to browse content available from a data network, the method comprising: receiving a conventional markup language document and a style sheet document in response to a request for the content from the user via the limited display device; generating a document tree from the conventional markup language document ,- generating a style tree from the style sheet document, the style tree comprising a plurality of style sheet rules; and converting the document tree into a navigation tree using the style sheet tree rules, the navigation tree providing a semantic, hierarchical structure for the content.

29. The method of Claim 28 wherein the conventional markup language document comprises a Hypertext Markup Language (HTML) document .

30. The method of Claim 28 wherein the conventional markup language document comprises an extensible Markup Language (XML) document .

31. The method of Claim 28 wherein the style sheet document comprises an extended Cascading Style Sheet (xCSS) document.

32. The method of Claim 28 wherein converting comprises a applying the style sheet rules to the document tree .

33. The method of Claim 28 wherein converting comprises applying a plurality of heuristic rules to the document tree.

34. A computer system for allowing a user of a limited display device to browse content available from a data network, the system comprising: a gateway module operable to receive a spoken request for the content from the user via the limited display device, the gateway module operable to recognize the spoken request; and a browser module in communication with the gateway module, the browser module operable to retrieve a conventional markup language document and a style sheet document from the data network in response to the spoken request, the conventional markup language document containing the content, the style sheet document containing metadata, the browser module operable to generate a navigation tree using the conventional markup language document and the style sheet document, the navigation tree providing a semantic, hierarchical structure for the content; wherein the gateway module and the browser module are operable to cooperate to enable the user to browse the content using the navigation tree and to output speech conveying the content to the user via the limited display device.

35. The computer system of Claim 34 wherein the gateway module is operable to convert the spoken request into a text format .

36. A method performed on a computer for allowing a user of a limited display device to browse content available from a data network, the method comprising: receiving a spoken request for the content from the user via the limited display device; recognizing the spoken request; retrieving a conventional markup language document and a style sheet document from the data network in response to the spoken request, the conventional markup language document containing the content, the style sheet document containing metadata; generating a navigation tree using the conventional markup language document and the style sheet document, the navigation tree providing a semantic, hierarchical structure for the content; enabling the user to browse the content using the navigation ree ; and outputting speech conveying the content to the user via the limited display device.