WO2001019052A2 - Method and apparatus for compressing scripting language content - Google Patents

Method and apparatus for compressing scripting language content Download PDF

Info

Publication number
WO2001019052A2
WO2001019052A2 PCT/US2000/040754 US0040754W WO0119052A2 WO 2001019052 A2 WO2001019052 A2 WO 2001019052A2 US 0040754 W US0040754 W US 0040754W WO 0119052 A2 WO0119052 A2 WO 0119052A2
Authority
WO
WIPO (PCT)
Prior art keywords
scripting language
tag
data
codewords
text
Prior art date
Application number
PCT/US2000/040754
Other languages
French (fr)
Other versions
WO2001019052A3 (en
Inventor
Robert Charles Booth
Original Assignee
General Instrument Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Instrument Corporation filed Critical General Instrument Corporation
Priority to AU80351/00A priority Critical patent/AU8035100A/en
Priority to EP00971058A priority patent/EP1279267A2/en
Priority to CA002384687A priority patent/CA2384687A1/en
Publication of WO2001019052A2 publication Critical patent/WO2001019052A2/en
Publication of WO2001019052A3 publication Critical patent/WO2001019052A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2355Processing of additional data, e.g. scrambling of additional data or processing content descriptors involving reformatting operations of additional data, e.g. HTML pages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Definitions

  • the present invention relates to a method and apparatus for compressing scripting language content, such as HyperText Markup Language (HTML) .
  • HTML HyperText Markup Language
  • HTML is a system for marking documents to indicate how the document should be displayed, and how various documents should be linked together.
  • HTML has been used extensively to provide documents (e.g., Web pages) on the Internet.
  • the documents are organized into Web spaces, where a Web space includes a home page and links to other documents which may be in the local Web space or in an external Web space. Such links are known as hyperlinks .
  • Documents may include moving images, text, graphical displays, and sound.
  • HTML is a form of Standard Generalized Markup Language (SGML) , defined by the International Standards Organization (ISO), reference number ISO 8879:1986.
  • SGML Standard Generalized Markup Language
  • ISO International Standards Organization
  • HTML specifies the grammar and syntax of markup tags which are inserted into a data file to define how the data will be presented (e.g., rendered) when read by a computer program known as a browser.
  • the computer's browser and/or graphics engine processes the data to format a layout for the page so the page can be viewed by the user on a display terminal or device.
  • a SGML document includes three parts.
  • the first part describes the character set, or codes, which are used in the language.
  • the second part defines the document type, and which markup tags are recognized.
  • the third part is known as the document instance and contains the actual text and markup tags .
  • the three parts may be stored in different files.
  • HTML browsers assume that files of different pages contain a common character set and document type, so only the text and markup tags will change for different pages .
  • HTML elements include tags and character entities.
  • Character entities are predefined characters from the ISO Latin-1 alphabet that are not defined in ASCII, and characters used to mark the beginning and end of an HTML element. For example, the character entity "&lt” designates the character " ⁇ " ("less than” sign).
  • HTML tags are enclosed in angled brackets to distinguish them from the page text. The tags may appear alone (as standalone or empty tags) , or may appear at the start and end of a field of the page text (as non-empty or container tags) .
  • ⁇ P> is an empty tag that indicates the start of a new paragraph
  • ⁇ I> and ⁇ /I> are container tags that modify the contained text (e.g., ⁇ I>Welcome to my home page ⁇ /I> indicates the phrase "Welcome to my home page” should be italicized)
  • ⁇ I> is the starting tag
  • ⁇ /I> is the ending tag.
  • Patent and Trademark Office to appear on a browser with special highlighting (such as a special color and/or underlining) that designates the text as a hyperlink.
  • special highlighting such as a special color and/or underlining
  • tags can have secondary, or sub- attributes.
  • the tag ⁇ IMG> is an empty tag that designates that an inline image is to be placed in a page.
  • ALIGN is an attribute
  • TOP, MIDDLE and BOTTOM are sub-attributes.
  • HTML tags and attributes are referred to herein generally as HTML "elements". Moreover, the term “attributes” generally encompasses the different levels of sub-attributes.
  • FTP is a high level protocol for transferring files (as is HTTP) . The said translation would occur at the protocol level. For example, a client browser may send the HTTP request 'GET http: //www.myserver .com/somefile. txt HTTP/1.1'. This would be translated at a proxy into an FTP 'GET' request to be forwarded to the FTP origin server. The FTP response from the origin server back to the proxy (which has the requested file attached) is then translated (at the proxy) into an HTTP response that includes the attached file. The file being transferred is not translated or modified. However, in some cases, the browser may indicate that it can decode certain encoding or compression formats. Thus, the proxy may translate (encode or compress) the attached file before it is transmitted to the client.
  • the proxy server can be a program running on the same machine as the browser, or a free-standing machine somewhere in a network that serves many browsers.
  • the headend of a subscriber communication network may provide a proxy server function.
  • HTTP defines a set of rules that servers and browsers follow when communicating with each other.
  • the process begins when a user clicks on an icon in an HTML page which is the anchor of a hyperlink, or the user types in a Uniform Resource Locator (URL) .
  • the URL contains a host name that is typically resolved into an IP address via a domain name system (DNS) lookup.
  • DNS domain name system
  • a connection is then made to the An HTML application is made available to users on the Web by storing the HTML file in a directory that is accessible to a server.
  • a server is typically a Web server which conforms to a Web browser-supported protocol known as Hypertext Transfer Protocol (HTTP) .
  • HTML content may be stored at the headend of a subscriber communication network, such as a cable/satellite television network.
  • HTML content may be selected and provided directly by the headend, or the headend may merely act as a conduit in a high speed link between the subscriber and remote Web servers .
  • Servers that conform to other protocols, such as the File Transfer Protocol (FTP) or GOPHER may also be accessed by an HTTP browser by using a proxy server.
  • FTP File Transfer Protocol
  • GOPHER may also be accessed by an HTTP browser by using a proxy server.
  • a proxy server is a type of gateway that allows a browser using HTTP to communicate with a server that does not understand HTTP, but which uses, e.g., FTP, Gopher or other protocols .
  • the proxy server accepts HTTP requests from the browser and translates them into a format that is suitable for the origin server, such as an FTP request.
  • the proxy server translates FTP replies from the server into HTTP replies so that the browser can understand them.
  • the FTP file itself is not translated.
  • FTP is a high level protocol for transferring files (as is HTTP) . The said translation would occur at the protocol level. For example, a client browser may send the HTTP request 'GET http: //www.myserver .com/somefile . txt HTTP/1.1' .
  • the FTP response from the origin server back to the proxy (which has the requested file attached) is then translated (at the proxy) into an HTTP response that includes the attached file.
  • the file being transferred is not translated or modified.
  • the browser may indicate that it can decode certain encoding or compression formats.
  • the proxy may translate (encode or compress) the attached file before it is transmitted to the client.
  • the proxy server can be a program running on the same machine as the browser, or a free-standing machine somewhere in a network that serves many browsers.
  • the headend of a subscriber communication network may provide a proxy server function.
  • HTTP defines a set of rules that servers and browsers follow when communicating with each other.
  • the process begins when a user clicks on an icon in an HTML page which is the anchor of a hyperlink, or the user types in a Uniform Resource Locator (URL) .
  • the URL contains a host name that is typically resolved into an IP address via a domain name system (DNS) lookup.
  • DNS domain name system
  • a connection is then made to the host server using the IP address (and possibly a port number) returned by the DNS lookup.
  • the browser sends a request to retrieve an object from the server, or to post data to an object on the server.
  • the server sends a response to the browser including a status code and the response data.
  • the connection between the browser and server is then closed.
  • the URL is a unique address which identifies virtually all files and resources on the Internet.
  • HTML due to the flexibility of HTML, and the variety of tags with their attributes and sub- attributes that are supported, the amount of data needed to represent any given Web page can be very large. Accordingly, the amount of processing power required by a user's terminal and browser may not be sufficient to keep up with the flow of data, thereby resulting in undesirable delays in rendering the data on the user's screen, or other problems.
  • the HTML data may be transmitted via a Public Switched Telephone Network (PSTN) , via a cable or satellite television network, via a local wireless network, or via a combination of the above, for example .
  • PSTN Public Switched Telephone Network
  • cable or satellite television network via a local wireless network, or via a combination of the above, for example .
  • the base character set for HTML is Latin-1 (ISO 8859/1) , which is an eight-bit alphabet with characters for most American and European languages.
  • the 128-character standard ASCII (ISO 646) is a seven-bit subset of Latin-1. For simplicity and compatibility with different browsers, many Web pages include only an ASCII character set.
  • the system should reduce the amount of bandwidth required to communicate HTML data to a browser or other (graphics) rendering engine.
  • the system should be suitable for use with existing networks over which HTML data is communicated.
  • the system should allow a browser that is implemented in a terminal (e.g., set-top box/decoder), in a subscriber television network, to directly process and render the compressed data without decompressing it.
  • a terminal e.g., set-top box/decoder
  • the system should reduce the required processing power of a browser in a user terminal in a subscriber television network.
  • the system should provide a consistent and deterministic processing time for all HTML elements and attributes within a given page.
  • the system should be usable on a client/browser side or server side of a network.
  • the system should be usable on a proxy server that interfaces between a client/browser and a server, or other proxy servers .
  • the system should be compatible with networks that communicate HTML data using a digital video communication protocol, such as MPEG-2.
  • the system should be compatible with networks that communicate HTML data using the Transmission Control Protocol/internet Protocol (TCP/IP) .
  • TCP/IP Transmission Control Protocol/internet Protocol
  • the system should provide compression for current versions of HTML, as well as derivations thereof and other analogous markup languages.
  • the system should be compatible with other bit level compression techniques .
  • the present invention provides a system having the above and other advantages .
  • the present invention relates to a method and apparatus for compressing scripting language content, such as HTML.
  • Codewords are provided for HTML or other scripting language elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page.
  • the codeword may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or to provide other information about the tag to aid in processing.
  • the technique is compatible with other compression techniques to provide even greater compression.
  • the invention provides a significant reduction in the amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal.
  • the invention enables the use of a graphics engine or browser at the subscriber terminal that processes/renders the compressed HTML data directly, without decompressing it, thereby resulting in significant savings in processing time and complexity.
  • a particular method for processing scripting language data includes the step of parsing the HTML data to separate text thereof from scripting language elements thereof.
  • the scripting language elements include tags and their attributes, if any. Respective codewords, such as two-byte codewords, are provided for each different tag.
  • the text is coded, such as with 10
  • the codewords may have reserved bits to designate specific information, such as whether the associated tag is an empty tag or a container tag.
  • the codeword may designate whether the container tag is a starting tag or an ending tag .
  • the codewords may designate whether a tag is a style markup tag or a structural markup tag.
  • the codeword may designate whether the structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
  • a respective codeword may be provided for each different attribute of a tag, including sub-attributes.
  • the codewords may also indicate the number of attributes that are associated with a tag.
  • the compressed scripting language is communicated from an scripting language content server or headend to a subscriber terminal in a communication network.
  • the compressed scripting language data is parsed to separate the coded thereof from the codewords thereof .
  • the respective scripting language elements are provided for each corresponding different codeword, and the coded text is decoded to provide decoded text.
  • the scripting language elements are combined with the 11
  • the compressed scripting language data is communicated to a subscriber terminal in a communication network, and processed without recovering the scripting language elements to provide data suitable for display.
  • the codewords are processed directly.
  • an optimal solution would cache (e.g., temporarily store) the compressed data in a proxy server for content that is accessed frequently by subscriber terminals.
  • a corresponding apparatus is also disclosed.
  • FIG. 1 illustrates a subscriber television network that uses HTML compression in accordance with the present invention.
  • FIG. 2 illustrates HTML compression in accordance with the present invention.
  • FIG. 3 illustrates HTML decompression in accordance with the present invention.
  • FIG. 1 illustrates a subscriber television network that uses HTML compression in accordance with the present invention.
  • the invention may be implemented in a variety of networks, it is particularly suitable for use in subscriber television networks that allow users (subscribers) to access HTML data, such as on the Internet.
  • the user can access HTML content, such as Web pages, that is delivered via a downstream channel on the network.
  • HTML content such as Web pages
  • a variety of techniques can be used to deliver HTML data via cable and satellite television networks.
  • the user is typically provided with an upstream link via a conventional telephone network to enter commands, such as a URL address to request to view a particular Web page.
  • Some cable television networks have an upstream user data channel that can be used for this purpose.
  • the request is received at a headend or other central location, and forwarded to the content server that is designated by the URL.
  • the content returned by the server to the headend is then prepared for transport to the user.
  • the HTML data may be encapsulated in digital MPEG-2 packets that are in- band or out-of-band with programming service data (e.g., television programs, audio, etc.).
  • programming service data e.g., television programs, audio, etc.
  • the HTML data may be carried in the vertical 14
  • VBI blanking interval
  • the invention is compatible with essentially any communication technique for providing the HTML data to the end user.
  • the HTML content is subsequently recovered at the user's terminal and rendered by a browser application or graphics processing engine for viewing on a video monitor, such as a television or computer monitor.
  • the headend may act as a proxy server when interacting with the content server, e.g., when the URL request from the user is in a format that is not compatible with the content server.
  • the proxy server converts the URL request into the necessary format, and converts the content returned by the server into a format that the user's terminal can understand.
  • FIG. 1 shows an example embodiment wherein a network 100 includes a content server 110, a headend 130, and a user terminal 150.
  • the content server 110 is representative of any number of available origin or proxy servers that store HTML data in a computer network such as the Internet .
  • the user terminal 150 is representative of a population of terminals that can receive broadcast signals from a common service provider, such as the headend 130 in a cable/optical fiber or satellite television network.
  • An optional upstream channel 160 such as a conventional telephone link and modems, allows the terminal 150 to communicate directly with content 15
  • a channel 162 is used by the headend 130, e.g., to broadcast programming services from function 136 (such as television programs, weather and stock data, shop at home data and the like) to a subscriber terminal population, including the example terminal 150. HTML content is also communicated to the terminal 150 via the one-way or bi-directional channel 162.
  • the channel 162 may physically be implemented as coaxial cable, a satellite link, optical fiber, local wireless channel
  • MMDS multi-point microwave distribution - MMDS
  • telephone link for example, or a combination thereof.
  • a channel 164 allows the headend 130 and the example content server 110 to communicate with each other.
  • This channel typically is implemented as a telephone link or Ethernet network.
  • the server 110 is generally remote from the headend 130, although it is possible for the headend to store HTML content on a local storage media, such as digital video disc or magnetic tape, or on a hard drive of a file server.
  • Known networking architectures are used to provide the channel 164.
  • the headend 130 When the headend 130 provides the content locally, a limited amount of content is provided.
  • the content may be selected to correspond to the programming services.
  • a graphic may be overlaid with a television program to inform the user that related HTML content is available. For example, during a televised baseball game, the user can be directed to a Web site for baseball scores.
  • the entire local content may be 16
  • conditional access techniques may be used to provide access to the HTML content on a fee basis .
  • the present invention is suitable with any of the above scenarios .
  • the user has some upstream channel (either 160 or 162) to cause selected HTML content to be recovered from the content server 110 and provided to the terminal 150 via the headend 130.
  • some upstream channel either 160 or 162 to cause selected HTML content to be recovered from the content server 110 and provided to the terminal 150 via the headend 130.
  • the content server 110, headend 132, and terminal 150 are shown with HTML compression functions 112, 132 and 152, respectively, and HTML decompression functions 114, 134 and 154, respectively. Not all of these functions are required, however.
  • the HTML data output from the terminal is generally small. This can vary, however, for example, if the user is sending HTML content to another user, or is authorized to send HTML content to modify the remote server 110.
  • the compression function 152 is used to compress HTML data transmitted from the terminal 150 to the headend 130 or the content server 110.
  • the decompression function 154 is used to decompress 17
  • compressed HTML data received from the headend 130 or content server 110.
  • the compression function 132 is used to compress HTML data transmitted from the headend 130 to the content server 110 or the terminal 150.
  • the decompression function 134 is used to decompress compressed HTML data received from the content server 110 or the terminal 150.
  • the compression function 112 is used to compress HTML data transmitted from the content server 110 to the headend 130 or the terminal 160.
  • the decompression function 114 is used to decompress compressed HTML data received from the headend 130 or the terminal 150.
  • the terminal 150 includes a user interface 158 for receiving user commands, e.g., via a keyboard or infrared remote control. For example, the user may click on a graphic on the display 170 that is associated with a URL, to initiate the downloading of the corresponding HTML content to the terminal 150.
  • a browser 159 may be a full-featured browser application such as used on a personal computer, or a minimal browser that has only some basic functionality, such as text rendering or limited graphics rendering capabilities. The browser 159 is used in conjunction with the graphics engine 156 for rendering text and images for the display 170 from the HTML content received at the terminal 150.
  • a video decoder 157 may be used for rendering video, associated with the compressed (or uncompressed) scripting language content, for the display 170.
  • the display 170 may be a television screen or a 18
  • the processing power of the terminal 150 will dictate the level of features that can be supported by the browser 159 and the graphics engine 156.
  • the compression functions 112, 132 and 152 can implement an HTML compression scheme as shown in FIG. 2, while the decompression functions 114, 134 and 154 can implement an HTML decompression scheme as shown in FIG. 3.
  • FIG. 2 illustrates HTML compression in accordance with the present invention.
  • the compression function 200 corresponds to the compression functions 112, 132, 152 of FIG. 1.
  • a buffer/parser 210 receives uncompressed HTML data. Note that the HTML data may reference locations where audio, video or graphics data can be found.
  • the text is parsed and provide to a conventional text coding function 215 to provide coded text, e.g., as ASCII data.
  • the HTML elements such as tags, including their attributes, sub-attributes, sub-sub-attributes, if any, and so forth, are parsed and provided to a compression function 220, which optionally has a look-up table 225 that can be implemented using known techniques.
  • the look-up table 225 associates a codeword with each HTML element (tag and attribute) .
  • the length of the codeword should be selected based on the number of different tags and attributes that are to be coded.
  • a sixteen-bit codeword (two bytes) is believed to be appropriate to handle the existing tags while also allowing for future growth. 19
  • the sixteen bits can be reserved to designate whether the tag is an empty tag or a container tag. For example, the most significant bit can be selected. For container tags, one or more other reserved bits can also designate whether the tag is a starting tag or ending tag.
  • style markup tags designating bold style, font, quoted text, and so forth
  • structural markup tags designating lists, tables, anchors, and so forth
  • the codeword can designate whether the tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup, for example.
  • a codeword can also indicate a number of attributes that are associated with each tag.
  • the number of bits reserved for this purpose should correspond to the maximum expected number of attributes. For example, three bits can indicate that here are up to eight attributes associated with a tag.
  • bits should be reserved in the codeword to designate characteristics of the tag to the extent that this aids in rendering of the HTML data. For example, the designation of starting and ending container tags is useful because it signals a processor of the bounds of the text to modified. For example, with eight bits of data required for each character (including a letter, number, punctuation 20
  • the fourteen bytes need to code these elements is reduced to eight bytes, for a savings of six bytes.
  • the amount of savings with the present invention increases for longer elements (e.g., compare " ⁇ BLOCKQUOTE>, which is reduced from twelve to two bytes, to " ⁇ A>", which is reduced from three to two bytes) , and the number of elements in a page.
  • a codeword is output from the compression function 220 and provided to a combiner 230 to be combined with the coded text in the appropriate sequence to provide compressed HTML data in accordance with the present invention.
  • This data comprises text codes for the text, and codewords from the compression function 220 for the HTML elements.
  • known compression techniques such as the Lempel-Ziv algorithm and Huffman coding, can be used with the compressed HTML data output from the combiner 230, or for the coded text alone or the codewords alone.
  • associated video/audio data may be compressed using known techniques .
  • FIG. 3 illustrates HTML decompression in accordance with the present invention.
  • the decompression function 300 corresponds to the decompression functions 114, 134, 154 of FIG. 1.
  • the compressed HTML is received at a 21
  • the present invention provides a method and apparatus for compressing scripting language content, such as HTML.
  • Codewords are provided for HTML elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page.
  • the codeword may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or to provide other information about the tag to aid in processing.
  • the technique is compatible with other compression techniques to provide even greater compression.
  • the invention provides a significant reduction in the amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal. Additionally, the invention allows the use of a graphics engine or browser, e.g. in a subscriber terminal that processes/renders the compressed HTML data (e.g., codewords) directly without decompressing them. This can provide significant savings in processing time and complexity.
  • a graphics engine or browser e.g. in a subscriber terminal that processes/renders the compressed HTML data (e.g., codewords) directly without decompressing them. This can provide significant savings in processing time and complexity.
  • each codeword has the same length 22
  • the techniques of the present invention may be implemented using any known hardware, software and/or firmware .
  • LANs local area networks
  • MANs metropolitan area networks
  • WANs wide area networks
  • internets intranets
  • intranets and the Internet, or combinations thereof
  • the invention is suitable for use in compressing any scripting language content, including HTML or any similar language (e.g. - Extensible Markup Language (XML) or Synchronized Multimedia Integration Language (SMIL) .
  • XML Extensible Markup Language
  • SMIL Synchronized Multimedia Integration Language

Abstract

A method and apparatus for compressing scripting language content, such as HyperText Markup Language (HTML). Codewords are provided for HTML elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page. The codewords are combined (230) with translated (or coded) text (215) to provide comprssed HTML data. The codewords may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or the provide other info rmation about the tag to aid in processing. The amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal, is thereby reduced. Additionally, the invention allows the use of a graphics engine (156) or browser (159), e.g. in a subscriber terminal (150) that processes/renders the compressed HTML data (e.g., codewords) directly without decompressing them. The technique is compatible with other compression techniques to provide even greater compression.

Description

METHOD AND APPARATUS FOR COMPRESSING SCRIPTING LANGUAGE
CONTENT
BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for compressing scripting language content, such as HyperText Markup Language (HTML) .
HTML is a system for marking documents to indicate how the document should be displayed, and how various documents should be linked together. HTML has been used extensively to provide documents (e.g., Web pages) on the Internet. The documents are organized into Web spaces, where a Web space includes a home page and links to other documents which may be in the local Web space or in an external Web space. Such links are known as hyperlinks . Documents may include moving images, text, graphical displays, and sound.
HTML is a form of Standard Generalized Markup Language (SGML) , defined by the International Standards Organization (ISO), reference number ISO 8879:1986. HTML specifies the grammar and syntax of markup tags which are inserted into a data file to define how the data will be presented (e.g., rendered) when read by a computer program known as a browser. The computer's browser and/or graphics engine processes the data to format a layout for the page so the page can be viewed by the user on a display terminal or device.
A SGML document includes three parts. The first part describes the character set, or codes, which are used in the language. The second part defines the document type, and which markup tags are recognized. The third part is known as the document instance and contains the actual text and markup tags . The three parts may be stored in different files. Furthermore, HTML browsers assume that files of different pages contain a common character set and document type, so only the text and markup tags will change for different pages . HTML elements include tags and character entities.
Character entities are predefined characters from the ISO Latin-1 alphabet that are not defined in ASCII, and characters used to mark the beginning and end of an HTML element. For example, the character entity "&lt" designates the character "<" ("less than" sign). HTML tags are enclosed in angled brackets to distinguish them from the page text. The tags may appear alone (as standalone or empty tags) , or may appear at the start and end of a field of the page text (as non-empty or container tags) . For example, <P> is an empty tag that indicates the start of a new paragraph, while <I> and </I> are container tags that modify the contained text (e.g., <I>Welcome to my home page</I> indicates the phrase "Welcome to my home page" should be italicized) . "<I>" is the starting tag, and "</I>" is the ending tag.
Generally, HTML tags provide text formatting, hypertext links to other pages, and links to sound and picture elements. HTML tags also define input fields for interactive Web pages. Additionally, some tags have one or more associated attributes that can be specified with the tag. For example, the tags <A> and </A> are anchor codes that define a section of text as a hyperlink, or target of another hyperlink. The attributes of the tag include HREF=url , NAME=name, and TITLE=text . Thus, the HTML code "<A HREF="http://www.uspto.gov">U.S. Patent and Trademark Office</A>" will cause the text "U.S. Patent and Trademark Office" to appear on a browser with special highlighting (such as a special color and/or underlining) that designates the text as a hyperlink. When the user clicks on the text, the Web address "www.uspto.gov" is accessed.
Moreover, tags can have secondary, or sub- attributes. For example, the tag <IMG> is an empty tag that designates that an inline image is to be placed in a page. The attributes include SRC=url, which specifies the URL of the file containing the image to be embedded, ALT=text, which specifies a text string that can be displayed in the image is not available, and ALIGN= [TOP I MIDDLE I BOTTOM] , which identifies how the image should be aligned with the adjacent text and other HTML elements. Thus, ALIGN is an attribute, and TOP, MIDDLE and BOTTOM are sub-attributes. An example HTML code is: "<IMG SRC=" filename .GIF" ALT="filename" ALIGN=middle>" .
HTML tags and attributes are referred to herein generally as HTML "elements". Moreover, the term "attributes" generally encompasses the different levels of sub-attributes. Generally, the FTP file itself is not translated. FTP is a high level protocol for transferring files (as is HTTP) . The said translation would occur at the protocol level. For example, a client browser may send the HTTP request 'GET http: //www.myserver .com/somefile. txt HTTP/1.1'. This would be translated at a proxy into an FTP 'GET' request to be forwarded to the FTP origin server. The FTP response from the origin server back to the proxy (which has the requested file attached) is then translated (at the proxy) into an HTTP response that includes the attached file. The file being transferred is not translated or modified. However, in some cases, the browser may indicate that it can decode certain encoding or compression formats. Thus, the proxy may translate (encode or compress) the attached file before it is transmitted to the client.
The proxy server can be a program running on the same machine as the browser, or a free-standing machine somewhere in a network that serves many browsers. For example, the headend of a subscriber communication network may provide a proxy server function.
HTTP defines a set of rules that servers and browsers follow when communicating with each other.
Typically, the process begins when a user clicks on an icon in an HTML page which is the anchor of a hyperlink, or the user types in a Uniform Resource Locator (URL) . The URL contains a host name that is typically resolved into an IP address via a domain name system (DNS) lookup. A connection is then made to the An HTML application is made available to users on the Web by storing the HTML file in a directory that is accessible to a server. Such a server is typically a Web server which conforms to a Web browser-supported protocol known as Hypertext Transfer Protocol (HTTP) . Alternatively, HTML content may be stored at the headend of a subscriber communication network, such as a cable/satellite television network. There is an increasing trend toward providing HTML content to subscribers via such networks due to the network's high speed data rates, the potential commercial benefits for tying in the HTML content with traditional television programming services, the expected convergence of telephone, television and computer networks, and the expected rise of in-home computer networks. The HTML content may be selected and provided directly by the headend, or the headend may merely act as a conduit in a high speed link between the subscriber and remote Web servers . Servers that conform to other protocols, such as the File Transfer Protocol (FTP) or GOPHER may also be accessed by an HTTP browser by using a proxy server. A proxy server is a type of gateway that allows a browser using HTTP to communicate with a server that does not understand HTTP, but which uses, e.g., FTP, Gopher or other protocols . The proxy server accepts HTTP requests from the browser and translates them into a format that is suitable for the origin server, such as an FTP request. Similarly, the proxy server translates FTP replies from the server into HTTP replies so that the browser can understand them. Generally, the FTP file itself is not translated. FTP is a high level protocol for transferring files (as is HTTP) . The said translation would occur at the protocol level. For example, a client browser may send the HTTP request 'GET http: //www.myserver .com/somefile . txt HTTP/1.1' . This would be translated at a proxy into an FTP 'GET' request to be forwarded to the FTP origin server. The FTP response from the origin server back to the proxy (which has the requested file attached) is then translated (at the proxy) into an HTTP response that includes the attached file. The file being transferred is not translated or modified. However, in some cases, the browser may indicate that it can decode certain encoding or compression formats. Thus, the proxy may translate (encode or compress) the attached file before it is transmitted to the client.
The proxy server can be a program running on the same machine as the browser, or a free-standing machine somewhere in a network that serves many browsers. For example, the headend of a subscriber communication network may provide a proxy server function.
HTTP defines a set of rules that servers and browsers follow when communicating with each other.
Typically, the process begins when a user clicks on an icon in an HTML page which is the anchor of a hyperlink, or the user types in a Uniform Resource Locator (URL) . The URL contains a host name that is typically resolved into an IP address via a domain name system (DNS) lookup. A connection is then made to the host server using the IP address (and possibly a port number) returned by the DNS lookup. Next, the browser sends a request to retrieve an object from the server, or to post data to an object on the server. The server sends a response to the browser including a status code and the response data. The connection between the browser and server is then closed.
The URL is a unique address which identifies virtually all files and resources on the Internet. However, due to the flexibility of HTML, and the variety of tags with their attributes and sub- attributes that are supported, the amount of data needed to represent any given Web page can be very large. Accordingly, the amount of processing power required by a user's terminal and browser may not be sufficient to keep up with the flow of data, thereby resulting in undesirable delays in rendering the data on the user's screen, or other problems.
Moreover, an increasing amount of bandwidth for transmitting the HTML data is consumed, thereby reducing the available bandwidth for other uses, or taxing the capacity of the channel .
The HTML data may be transmitted via a Public Switched Telephone Network (PSTN) , via a cable or satellite television network, via a local wireless network, or via a combination of the above, for example .
In particular, the base character set for HTML is Latin-1 (ISO 8859/1) , which is an eight-bit alphabet with characters for most American and European languages. The 128-character standard ASCII (ISO 646) is a seven-bit subset of Latin-1. For simplicity and compatibility with different browsers, many Web pages include only an ASCII character set.
With eight bits or one byte of data required for each character (including a letter, number, punctuation symbol or blank space), for example, the HTML code: "<IMG SRC="filename.GIF" ALT=" filename" ALIGN=middle>" has 52 characters, or 52 bytes of data.
Accordingly, it would be desirable to provide a system for compressing scripting language content such as HTML or any similar language.
The system should reduce the amount of bandwidth required to communicate HTML data to a browser or other (graphics) rendering engine. The system should be suitable for use with existing networks over which HTML data is communicated.
The system should allow a browser that is implemented in a terminal (e.g., set-top box/decoder), in a subscriber television network, to directly process and render the compressed data without decompressing it.
The system should reduce the required processing power of a browser in a user terminal in a subscriber television network. The system should provide a consistent and deterministic processing time for all HTML elements and attributes within a given page.
The system should be usable on a client/browser side or server side of a network. The system should be usable on a proxy server that interfaces between a client/browser and a server, or other proxy servers .
The system should be compatible with networks that communicate HTML data using a digital video communication protocol, such as MPEG-2. The system should be compatible with networks that communicate HTML data using the Transmission Control Protocol/internet Protocol (TCP/IP) .
The system should provide compression for current versions of HTML, as well as derivations thereof and other analogous markup languages.
The system should be compatible with other bit level compression techniques .
The present invention provides a system having the above and other advantages .
SUMMARY OF THE INVENTION
The present invention relates to a method and apparatus for compressing scripting language content, such as HTML. Codewords are provided for HTML or other scripting language elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page. Moreover, the codeword may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or to provide other information about the tag to aid in processing. The technique is compatible with other compression techniques to provide even greater compression. The invention provides a significant reduction in the amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal. Moreover, the invention enables the use of a graphics engine or browser at the subscriber terminal that processes/renders the compressed HTML data directly, without decompressing it, thereby resulting in significant savings in processing time and complexity.
A particular method for processing scripting language data includes the step of parsing the HTML data to separate text thereof from scripting language elements thereof. The scripting language elements include tags and their attributes, if any. Respective codewords, such as two-byte codewords, are provided for each different tag. The text is coded, such as with 10
ASCII codes. The codewords are then combined with the coded text in the appropriate sequence to provide compressed scripting language data.
The codewords may have reserved bits to designate specific information, such as whether the associated tag is an empty tag or a container tag.
For container tags, the codeword may designate whether the container tag is a starting tag or an ending tag . The codewords may designate whether a tag is a style markup tag or a structural markup tag.
For structural markup tags, the codeword may designate whether the structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
A respective codeword may be provided for each different attribute of a tag, including sub-attributes. The codewords may also indicate the number of attributes that are associated with a tag. In a particularly advantageous implementation, the compressed scripting language is communicated from an scripting language content server or headend to a subscriber terminal in a communication network.
For decompression of the compressed scripting language, e.g., at a subscriber terminal, the compressed scripting language data is parsed to separate the coded thereof from the codewords thereof . The respective scripting language elements are provided for each corresponding different codeword, and the coded text is decoded to provide decoded text. Lastly, the scripting language elements are combined with the 11
decoded text to provide the uncompressed scripting language data.
Optionally, the compressed scripting language data is communicated to a subscriber terminal in a communication network, and processed without recovering the scripting language elements to provide data suitable for display. Thus, the codewords are processed directly.
In addition, an optimal solution would cache (e.g., temporarily store) the compressed data in a proxy server for content that is accessed frequently by subscriber terminals.
A corresponding apparatus is also disclosed.
12
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a subscriber television network that uses HTML compression in accordance with the present invention.
FIG. 2 illustrates HTML compression in accordance with the present invention.
FIG. 3 illustrates HTML decompression in accordance with the present invention.
13
DETAILED DESCRIPTION OF THE INVENTION
The present invention relates to a method and apparatus for compressing scripting language content, such as HTML. FIG. 1 illustrates a subscriber television network that uses HTML compression in accordance with the present invention.
Although the invention may be implemented in a variety of networks, it is particularly suitable for use in subscriber television networks that allow users (subscribers) to access HTML data, such as on the Internet. Typically, the user can access HTML content, such as Web pages, that is delivered via a downstream channel on the network. For example, a variety of techniques can be used to deliver HTML data via cable and satellite television networks. The user is typically provided with an upstream link via a conventional telephone network to enter commands, such as a URL address to request to view a particular Web page. Some cable television networks have an upstream user data channel that can be used for this purpose.
The request is received at a headend or other central location, and forwarded to the content server that is designated by the URL. The content returned by the server to the headend is then prepared for transport to the user. For example, the HTML data may be encapsulated in digital MPEG-2 packets that are in- band or out-of-band with programming service data (e.g., television programs, audio, etc.). Or, the HTML data may be carried in the vertical 14
blanking interval (VBI) of a digital or analog television signal.
The invention is compatible with essentially any communication technique for providing the HTML data to the end user.
The HTML content is subsequently recovered at the user's terminal and rendered by a browser application or graphics processing engine for viewing on a video monitor, such as a television or computer monitor. The headend may act as a proxy server when interacting with the content server, e.g., when the URL request from the user is in a format that is not compatible with the content server. In this case, the proxy server converts the URL request into the necessary format, and converts the content returned by the server into a format that the user's terminal can understand.
FIG. 1 shows an example embodiment wherein a network 100 includes a content server 110, a headend 130, and a user terminal 150. The content server 110 is representative of any number of available origin or proxy servers that store HTML data in a computer network such as the Internet .
Similarly, the user terminal 150 is representative of a population of terminals that can receive broadcast signals from a common service provider, such as the headend 130 in a cable/optical fiber or satellite television network.
An optional upstream channel 160, such as a conventional telephone link and modems, allows the terminal 150 to communicate directly with content 15
servers .
A channel 162 is used by the headend 130, e.g., to broadcast programming services from function 136 (such as television programs, weather and stock data, shop at home data and the like) to a subscriber terminal population, including the example terminal 150. HTML content is also communicated to the terminal 150 via the one-way or bi-directional channel 162. The channel 162 may physically be implemented as coaxial cable, a satellite link, optical fiber, local wireless channel
(such as multi-point microwave distribution - MMDS) , or a telephone link for example, or a combination thereof.
A channel 164 allows the headend 130 and the example content server 110 to communicate with each other. This channel typically is implemented as a telephone link or Ethernet network. The server 110 is generally remote from the headend 130, although it is possible for the headend to store HTML content on a local storage media, such as digital video disc or magnetic tape, or on a hard drive of a file server.
Known networking architectures are used to provide the channel 164.
When the headend 130 provides the content locally, a limited amount of content is provided. The content may be selected to correspond to the programming services. In this case, a graphic may be overlaid with a television program to inform the user that related HTML content is available. For example, during a televised baseball game, the user can be directed to a Web site for baseball scores.
In some cases, the entire local content may be 16
continually or periodically broadcast, e.g., on the same channel (or multiplex) as the programming service, or on a separate channel (or multiplex) . This may occur on one-way only networks where the user has no upstream link to the headend. The selection of the desired HTML content then occurs at the user terminal 150.
Known conditional access techniques may be used to provide access to the HTML content on a fee basis . The present invention is suitable with any of the above scenarios .
In the example FIG. 1, it is assumed that the user has some upstream channel (either 160 or 162) to cause selected HTML content to be recovered from the content server 110 and provided to the terminal 150 via the headend 130.
The content server 110, headend 132, and terminal 150 are shown with HTML compression functions 112, 132 and 152, respectively, and HTML decompression functions 114, 134 and 154, respectively. Not all of these functions are required, however.
Generally, compression of the HTML data provided to the terminal is most important. The HTML data output from the terminal, if any, is generally small. This can vary, however, for example, if the user is sending HTML content to another user, or is authorized to send HTML content to modify the remote server 110. The compression function 152 is used to compress HTML data transmitted from the terminal 150 to the headend 130 or the content server 110. The decompression function 154 is used to decompress 17
compressed HTML data received from the headend 130 or content server 110.
The compression function 132 is used to compress HTML data transmitted from the headend 130 to the content server 110 or the terminal 150. The decompression function 134 is used to decompress compressed HTML data received from the content server 110 or the terminal 150.
The compression function 112 is used to compress HTML data transmitted from the content server 110 to the headend 130 or the terminal 160. The decompression function 114 is used to decompress compressed HTML data received from the headend 130 or the terminal 150.
The terminal 150 includes a user interface 158 for receiving user commands, e.g., via a keyboard or infrared remote control. For example, the user may click on a graphic on the display 170 that is associated with a URL, to initiate the downloading of the corresponding HTML content to the terminal 150. A browser 159 may be a full-featured browser application such as used on a personal computer, or a minimal browser that has only some basic functionality, such as text rendering or limited graphics rendering capabilities. The browser 159 is used in conjunction with the graphics engine 156 for rendering text and images for the display 170 from the HTML content received at the terminal 150.
A video decoder 157 may be used for rendering video, associated with the compressed (or uncompressed) scripting language content, for the display 170.
The display 170 may be a television screen or a 18
video monitor for a PC, for example.
The processing power of the terminal 150 will dictate the level of features that can be supported by the browser 159 and the graphics engine 156. The compression functions 112, 132 and 152 can implement an HTML compression scheme as shown in FIG. 2, while the decompression functions 114, 134 and 154 can implement an HTML decompression scheme as shown in FIG. 3. FIG. 2 illustrates HTML compression in accordance with the present invention. The compression function 200 corresponds to the compression functions 112, 132, 152 of FIG. 1. A buffer/parser 210 receives uncompressed HTML data. Note that the HTML data may reference locations where audio, video or graphics data can be found.
The text is parsed and provide to a conventional text coding function 215 to provide coded text, e.g., as ASCII data. The HTML elements, such as tags, including their attributes, sub-attributes, sub-sub-attributes, if any, and so forth, are parsed and provided to a compression function 220, which optionally has a look-up table 225 that can be implemented using known techniques. The look-up table 225 associates a codeword with each HTML element (tag and attribute) . The length of the codeword should be selected based on the number of different tags and attributes that are to be coded. A sixteen-bit codeword (two bytes) is believed to be appropriate to handle the existing tags while also allowing for future growth. 19
Moreover, it is possible to reserve one or more of the sixteen bits to designate whether the tag is an empty tag or a container tag. For example, the most significant bit can be selected. For container tags, one or more other reserved bits can also designate whether the tag is a starting tag or ending tag.
Other information that can be designated is whether the tag is a style markup tag or a structural markup tag. Generally, style markup tags (designating bold style, font, quoted text, and so forth) can be used within structural markup tags (designating lists, tables, anchors, and so forth) , while the opposite is not recommended.
For structural markup tags, the codeword can designate whether the tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup, for example.
A codeword can also indicate a number of attributes that are associated with each tag. The number of bits reserved for this purpose should correspond to the maximum expected number of attributes. For example, three bits can indicate that here are up to eight attributes associated with a tag. Generally, bits should be reserved in the codeword to designate characteristics of the tag to the extent that this aids in rendering of the HTML data. For example, the designation of starting and ending container tags is useful because it signals a processor of the bounds of the text to modified. For example, with eight bits of data required for each character (including a letter, number, punctuation 20
symbol or blank space) , the HTML code "<IMG SRC="filename.GIF" ALT=" filename" ALIGN=middle>" has 52 characters, or 52 bytes of data. By substituting a two-byte codeword for each of the HTML elements: "IMG", "SRC", "ALT" and "ALIGN", the fourteen bytes need to code these elements is reduced to eight bytes, for a savings of six bytes. In a given HTML page, the amount of savings with the present invention increases for longer elements (e.g., compare "<BLOCKQUOTE>, which is reduced from twelve to two bytes, to "<A>", which is reduced from three to two bytes) , and the number of elements in a page.
For each element, a codeword is output from the compression function 220 and provided to a combiner 230 to be combined with the coded text in the appropriate sequence to provide compressed HTML data in accordance with the present invention. This data comprises text codes for the text, and codewords from the compression function 220 for the HTML elements. Note that additional, known compression techniques, such as the Lempel-Ziv algorithm and Huffman coding, can be used with the compressed HTML data output from the combiner 230, or for the coded text alone or the codewords alone. Moreover, associated video/audio data may be compressed using known techniques .
FIG. 3 illustrates HTML decompression in accordance with the present invention. The decompression function 300 corresponds to the decompression functions 114, 134, 154 of FIG. 1. Here, the compressed HTML is received at a 21
buffer/parser 310. The coded text comprising text data is provided to a text decoding function 315 to recover the text, which is then provided to a combiner 330, while the HTML codewords are provided to a decompression function 320. A look-up table 325 at the decompression function 320 associates an HTML element with each received codeword. The corresponding elements are output to the combiner 330 to form the uncompressed HTML data. Accordingly, it can be seen that the present invention provides a method and apparatus for compressing scripting language content, such as HTML. Codewords are provided for HTML elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page. Moreover, the codeword may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or to provide other information about the tag to aid in processing. The technique is compatible with other compression techniques to provide even greater compression.
The invention provides a significant reduction in the amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal. Additionally, the invention allows the use of a graphics engine or browser, e.g. in a subscriber terminal that processes/renders the compressed HTML data (e.g., codewords) directly without decompressing them. This can provide significant savings in processing time and complexity.
Additionally, each codeword has the same length 22
and therefore generally takes the same amount of time to process, so the processing time becomes more deterministic .
The techniques of the present invention may be implemented using any known hardware, software and/or firmware .
Although the invention has been described in connection with various specific embodiments, those skilled in the art will appreciate that numerous adaptations and modifications may be made thereto without departing from the spirit and scope of the invention as set forth in the claims.
For example, while the invention was discussed in connection with a cable or satellite television broadband communication networks, it will be appreciated that other networks such as local area networks (LANs) , metropolitan area networks (MANs) , wide area networks (WANs) , internets, intranets, and the Internet, or combinations thereof, may be used. Moreover, the invention is suitable for use in compressing any scripting language content, including HTML or any similar language (e.g. - Extensible Markup Language (XML) or Synchronized Multimedia Integration Language (SMIL) .

Claims

23What is claimed is:
1. A method for processing scripting language data, comprising the steps of:
(a) parsing the scripting language data to separate text thereof from scripting language elements thereof, said scripting language elements including tags ,-
(b) providing a respective codeword for each different tag;
(c) coding the text to provide coded text; and
(d) combining the codewords with the coded text to provide compressed scripting language data.
2. The method of claim 1, wherein: at least one of the codewords designates whether the associated tag is an empty tag or a container tag.
3. The method of claim 1, wherein, for container tags, at least one of the corresponding codewords designates whether the container tag is a starting tag or an ending tag.
4. The method of claim 1, wherein: at least one of the codewords designates whether the corresponding tag is a style markup tag or a structural markup tag.
5. The method of claim 1, wherein, for structural markup tags, at least one of the corresponding codewords designates whether the 24
structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
6. The method of claim 1, wherein said scripting language elements include attributes of the tags, comprising the further step of: providing a respective codeword for each different attribute .
7. The method of claim 1, wherein: said scripting language elements include attributes of the tags; and the respective codewords indicate a number of attributes that are associated with each tag.
8. The method of claim 1, comprising the further step of : communicating the compressed scripting language data from a scripting language content server to a subscriber terminal in a communication network.
9. The method of claim 1, comprising the further step of : communicating the compressed scripting language data from a headend to a subscriber terminal in a communication network.
10. The method of claim 1, comprising the further step of: 25
communicating the compressed scripting language data to a subscriber terminal in a communication network; and processing the compressed scripting language data without recovering the scripting language elements to provide data suitable for display.
11. The method of claim 1, comprising the further steps of:
(d) parsing the compressed scripting language data to separate the coded text from the codewords thereof ;
(e) decoding the coded text to provide decoded text ;
(f) providing the respective scripting language elements for each corresponding different codeword obtained in said step (d) ; and
(g) combining the scripting language elements provided in said step (f) with the decoded text to provide uncompressed scripting language data.
12. The method of claim 11, wherein: the uncompressed scripting language data is processed by said steps (d) - (g) at a subscriber terminal in a communication network.
13. The method of claim 1, wherein: the scripting language data comprises Hyper Text Markup Language (HTML) data.
14. The method of claim 1, comprising the further step of: 26
temporarily storing the compressed scripting language data in a proxy server for scripting language content that is accessed frequently by subscriber terminals .
15. An apparatus for processing scripting language data, comprising: a first parser for parsing the scripting language data to separate text thereof from scripting language elements thereof, said scripting language elements including tags; first means for providing a respective codeword for each different tag; means for coding the text to provide coded text; and a first combiner for combining the codewords with the coded text to provide compressed scripting language data.
16. The apparatus of claim 15, wherein: at least one of the codewords designates whether the associated tag is an empty tag or a container tag.
17. The apparatus of claim 15, wherein, for container tags, at least one of the corresponding codewords designates whether the container tag is a starting tag or an ending tag.
18. The apparatus of claim 15, wherein: 27
at least one of the codewords designates whether the corresponding tag is a style markup tag or a structural markup tag.
19. The apparatus of claim 15, wherein, for structural markup tags, at least one of the corresponding codewords designates whether the structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
20. The apparatus of claim 15, wherein said scripting language elements include attributes of the tags, further comprising: means for providing a respective codeword for each different attribute.
21. The apparatus of claim 15, wherein: said scripting language elements include attributes of the tags; and the respective codewords indicate a number of attributes that are associated with each tag.
22. The apparatus of claim 15, wherein: the compressed scripting language data is communicated from an scripting language content server to a subscriber terminal in a communication network.
23. The apparatus of claim 15, wherein: 28
the compressed scripting language data is communicated from a headend to a subscriber terminal in a communication network.
24. The apparatus of claim 15, wherein the compressed scripting language data is communicated to a subscriber terminal in a communication network, further comprising: at least one processor for processing the compressed scripting language data without recovering the scripting language elements to provide data suitable for display.
25. The apparatus of claim 15, further comprising: a second parser for parsing the compressed scripting language data to separate the coded text thereof from the codewords thereof ; second means for providing the respective scripting language elements for each corresponding different codeword obtained from said second parser; means for decoding the coded text to provide decoded text; and a second combiner for combining the scripting language elements provided by said second means with the decoded text to provide uncompressed scripting language data .
26. The apparatus of claim 25, wherein: the uncompressed scripting language data is processed by said second parser, second means, means 29
for decoding, and second combiner at a subscriber terminal in a communication network.
27. The apparatus of claim 15, wherein: the scripting language data comprises Hyper Text Markup Language (HTML) data.
28. The apparatus of claim 15, further comprising: means for temporarily storing the compressed scripting language data in a proxy server for scripting language content that is accessed frequently by subscriber terminals.
PCT/US2000/040754 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content WO2001019052A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU80351/00A AU8035100A (en) 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content
EP00971058A EP1279267A2 (en) 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content
CA002384687A CA2384687A1 (en) 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US39383599A 1999-09-10 1999-09-10
US09/393,835 1999-09-10

Publications (2)

Publication Number Publication Date
WO2001019052A2 true WO2001019052A2 (en) 2001-03-15
WO2001019052A3 WO2001019052A3 (en) 2002-11-14

Family

ID=23556435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/040754 WO2001019052A2 (en) 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content

Country Status (5)

Country Link
EP (1) EP1279267A2 (en)
AU (1) AU8035100A (en)
CA (1) CA2384687A1 (en)
TW (1) TW473673B (en)
WO (1) WO2001019052A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002060067A2 (en) * 2001-01-26 2002-08-01 Pogo Mobile Solutions Limited A method of data compression
WO2003073719A1 (en) * 2002-02-28 2003-09-04 Nokia Corporation Http message compression
WO2004073278A1 (en) * 2003-02-14 2004-08-26 Research In Motion Limited System and method of compact messaging in network communications
EP1610228A1 (en) * 2003-03-07 2005-12-28 Sharp Kabushiki Kaisha Data conversion method capable of optimally performing mark-up language processing
US7886218B2 (en) 2008-02-27 2011-02-08 Aptimize Limited Methods and devices for post processing rendered web pages and handling requests of post processed web pages
FR2988497A1 (en) * 2012-05-04 2013-09-27 Sagemcom Energy & Telecom Sas XML type message server for use in communication system, has compressing unit compressing XML type message, and decompressing unit that is utilized for decompressing message compressed in format of XML type
WO2013166189A1 (en) * 2012-05-01 2013-11-07 Qualcomm Iskoot, Inc. Selectively exchanging metadata in a wireless communications system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5668548A (en) * 1995-12-28 1997-09-16 Philips Electronics North America Corp. High performance variable length decoder with enhanced throughput due to tagging of the input bit stream and parallel processing of contiguous code words
WO1997034240A1 (en) * 1996-03-15 1997-09-18 University Of Massachusetts Compact tree for storage and retrieval of structured hypermedia documents
EP0848553A2 (en) * 1996-12-10 1998-06-17 Nextlevel Systems, Inc. Mapping uniform resource locators to broadcast addresses in a television signal
EP0896284A1 (en) * 1997-08-05 1999-02-10 Fujitsu Limited Compressing and decompressing data
EP0928070A2 (en) * 1997-12-29 1999-07-07 Unwired Planet, Inc. Compression of documents with markup language that preserves syntactical structure
EP0991018A2 (en) * 1998-09-28 2000-04-05 Fujitsu Limited Method and apparatus for compressing data
WO2000070770A1 (en) * 1999-05-13 2000-11-23 Euronet Uk Limited Compression/decompression method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5668548A (en) * 1995-12-28 1997-09-16 Philips Electronics North America Corp. High performance variable length decoder with enhanced throughput due to tagging of the input bit stream and parallel processing of contiguous code words
WO1997034240A1 (en) * 1996-03-15 1997-09-18 University Of Massachusetts Compact tree for storage and retrieval of structured hypermedia documents
EP0848553A2 (en) * 1996-12-10 1998-06-17 Nextlevel Systems, Inc. Mapping uniform resource locators to broadcast addresses in a television signal
EP0896284A1 (en) * 1997-08-05 1999-02-10 Fujitsu Limited Compressing and decompressing data
EP0928070A2 (en) * 1997-12-29 1999-07-07 Unwired Planet, Inc. Compression of documents with markup language that preserves syntactical structure
EP0991018A2 (en) * 1998-09-28 2000-04-05 Fujitsu Limited Method and apparatus for compressing data
WO2000070770A1 (en) * 1999-05-13 2000-11-23 Euronet Uk Limited Compression/decompression method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002060067A2 (en) * 2001-01-26 2002-08-01 Pogo Mobile Solutions Limited A method of data compression
WO2002060067A3 (en) * 2001-01-26 2003-09-18 Pogo Mobile Solutions Ltd A method of data compression
WO2003073719A1 (en) * 2002-02-28 2003-09-04 Nokia Corporation Http message compression
WO2004073278A1 (en) * 2003-02-14 2004-08-26 Research In Motion Limited System and method of compact messaging in network communications
US7448043B2 (en) 2003-02-14 2008-11-04 Research In Motion Limited System and method of compact messaging in network communications by removing tags and utilizing predefined message definitions
US8069451B2 (en) 2003-02-14 2011-11-29 Research In Motion Limited System and method of compact messaging in network communications by removing tags and utilizing predefined message definitions
EP1610228A1 (en) * 2003-03-07 2005-12-28 Sharp Kabushiki Kaisha Data conversion method capable of optimally performing mark-up language processing
EP1610228A4 (en) * 2003-03-07 2009-07-29 Sharp Kk Data conversion method capable of optimally performing mark-up language processing
US7886218B2 (en) 2008-02-27 2011-02-08 Aptimize Limited Methods and devices for post processing rendered web pages and handling requests of post processed web pages
WO2013166189A1 (en) * 2012-05-01 2013-11-07 Qualcomm Iskoot, Inc. Selectively exchanging metadata in a wireless communications system
FR2988497A1 (en) * 2012-05-04 2013-09-27 Sagemcom Energy & Telecom Sas XML type message server for use in communication system, has compressing unit compressing XML type message, and decompressing unit that is utilized for decompressing message compressed in format of XML type

Also Published As

Publication number Publication date
WO2001019052A3 (en) 2002-11-14
AU8035100A (en) 2001-04-10
TW473673B (en) 2002-01-21
CA2384687A1 (en) 2001-03-15
EP1279267A2 (en) 2003-01-29

Similar Documents

Publication Publication Date Title
US6345307B1 (en) Method and apparatus for compressing hypertext transfer protocol (HTTP) messages
US6938270B2 (en) Communicating scripts in a data service channel of a video signal
US6018764A (en) Mapping uniform resource locators to broadcast addresses in a television signal
US7103904B1 (en) Methods and apparatus for broadcasting interactive advertising using remote advertising templates
US7849226B2 (en) Television with set top internet terminal with user interface wherein auxiliary content is received that is associated with current television programming
US7165266B2 (en) Combining real-time and batch mode logical address links
US6400407B1 (en) Communicating logical addresses of resources in a data service channel of a video signal
US5818935A (en) Internet enhanced video system
US20100281042A1 (en) Method and System for Transforming and Delivering Video File Content for Mobile Devices
US20020138849A1 (en) Broadcast enhancement trigger addressed to multiple uniquely addressed information resources
US20050162551A1 (en) Multi-lingual closed-captioning
GB2347329A (en) Converting electronic documents into a format suitable for a wireless device
WO2001019052A2 (en) Method and apparatus for compressing scripting language content
JP3209929B2 (en) Information display method and device
JP3277130B2 (en) Information display device and method
WO2010062761A1 (en) Method and system for transforming and delivering video file content for mobile devices
WO2005069617A1 (en) Subtitling of an audio or video flow in a multimedia document
STANDARD Declarative Data Essence—Content Level
WO2001077894A1 (en) Paged web protocol
KR100417601B1 (en) Apparatus for interfaceing between webbrowser and dsm-cc
JPH1032798A (en) Information display method/device
Koivisto Multimedia Presentation and Transmission Standards and Their Support for Automatic Analysis, Conversion and Scalling: A Survey
WO2002029585A1 (en) Image transfer system and method
MXPA97010026A (en) Uniform configuration resource localizers to disseminate directions in a televis sign

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2384687

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2000971058

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

WWP Wipo information: published in national office

Ref document number: 2000971058

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Ref document number: 2000971058

Country of ref document: EP