WO2003007572A1 - Method for compressing protocols and related system - Google Patents

Method for compressing protocols and related system Download PDF

Info

Publication number
WO2003007572A1
WO2003007572A1 PCT/EP2002/007876 EP0207876W WO03007572A1 WO 2003007572 A1 WO2003007572 A1 WO 2003007572A1 EP 0207876 W EP0207876 W EP 0207876W WO 03007572 A1 WO03007572 A1 WO 03007572A1
Authority
WO
WIPO (PCT)
Prior art keywords
bnf
protocol
code
rule
receiving terminal
Prior art date
Application number
PCT/EP2002/007876
Other languages
French (fr)
Inventor
Richard Price
Original Assignee
Roke Manor Research Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB0117132.1A external-priority patent/GB0117132D0/en
Application filed by Roke Manor Research Limited filed Critical Roke Manor Research Limited
Publication of WO2003007572A1 publication Critical patent/WO2003007572A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/005Statistical coding, e.g. Huffman, run length coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/70Media network packetisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC

Definitions

  • the invention relates to a method for compressing protocols and a related system. It is particularly applicable to data compression schemes, which are used in systems to transmit different protocols under which varying protocol information is sent. It has particular, but not exclusive, application to BNF (Backus Naur Form) which is a standard language used to describe new protocols and signalling schemes.
  • BNF Backus Naur Form
  • variable data e.g. strings
  • string, references, as well as the data had to be carefully ordered. This increased processing time.
  • compression algorithms reduced the efficiency of the compression process.
  • European Patent Application Number EP-A2-0 844 768 (Webtv Networks Inc).
  • European Patent Application Number EP-A2-0 844 768 describes a technique in which data that can be compressed, is attached to a compression stream, and compressed. Compressed data is then transmitted continuously as it is generated. Previously data was transmitted in compressed, discrete blocks or packets. Therefore channel carrying capacity was not fully optimised.
  • the technique described overcomes the problem associated with the compression techniques, in which compressed data was transmitted as intermittent packets. However, it did not address the particular problem of improving the compression ratio. Furthermore the aforementioned Patent Application does not provide any hint or suggestion as to how to reduce the required memory capacity of a compressor/decompressor system.
  • a method of transmitting specific protocol field information comprising the following steps: a) pre-storing predictable data portions of protocol fields, at the receiving terminal; b) removing variable data from the protocol field; c) compressing the variable data into a code; d) transmitting the code to said receiving terminal; e) receiving the code at the receiving terminal; f) decompressing the code into said variable data; and g) reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal.
  • the process of transmitting protocols is therefore made more efficient by applying various data compression schemes to the protocols.
  • the amount of transmitted data is reduced, for example by transmitting only the variable data, such as the strings of characters that may vary within a particular protocol.
  • the invention utilises an algorithm to provide efficient compression of arbitrary protocols including SIP [RFC-2543] and RTSP [RFC-2326].
  • the algorithm preferably incorporates in its input a Backus Naur Form (BNF) description of the protocol to be compressed, which is stored at the compressor and decompressor and used as part of the compression process. Since the algorithm is preprogrammed with knowledge of how protocols behave, the compression ratio obtained is very high and the processing and memory requirements are low.
  • BNF Backus Naur Form
  • BNF descriptors are available for a wide range of important protocols including Hyper-text Mark-up Protocol (HTML) and Session Initiation Protocol (SIP).
  • HTTP Hyper-text Mark-up Protocol
  • SIP Session Initiation Protocol
  • the first line of a protocol message references a strict, unaltering set of protocol rules.
  • protocol rules are stored at a receiving terminal of a transmission system, for example a BNF descriptor of a protocol tells the receiving terminal the exact type of messages that are to be transmitted, and references further rules in such a way that every possible message that the protocol can send is defined. In other words it describes the format of the message, effectively stating what a SIP or HTML will comprise.
  • a communication system in which one of a plurality of protocols are transmitted to a receiving terminal, said system comprising: a) means for pre-storing predictable data portions of protocol fields, at the receiving terminal; b) means for removing variable data from the protocol field; c) means for compressing the variable data into a code; d) a transmitter for transmitting the code to said receiving terminal; e) receiving the code at the receiving terminal; f) means for decompressing the code into said variable data; and g) means for reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal.
  • Non predictable data includes information which is not either stored or derivable at the receiving terminal.
  • the type of pre-stored data to be combined may incl ⁇ de a flag or label pointing to which element of pre-stored data needs to be included in the reconstruction of the transmission specific protocol, which data portions need to be taken from a look-up table or index; or which data needs to be derived, for example by way of an iteration to compute a number in a sequence from a previous number according to a stored algorithm.
  • the means for compressing the variable data into a code; means for decompressing the code into said variable data; and means for reconstructing the protocol field may all be performed by dedicated hardware devices or in a Programmable Micro-controller, such as a Digital Signal Processor (DSP) or Application Specific Integrated Circuit (ASIC).
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • the number of bits, which may be used to code for each option is generally inversely proportional to the frequency of that option occurring. This improves the compression ratio yet further. It may use for example a Huffman method.
  • the protocol may be in any suitable form, but is preferably in
  • BNF Backus Naur Form
  • All versions of the BNF language are based around the concept of a "BNF rule”.
  • Each BNF rule describes the syntax of a portion of the protocol, and is defined in terms of existing BNF rules.
  • the "top-level" BNF rule describes the entire protocol in question.
  • BNF_rule ... Defines a new BNF rule in terms of existing BNF rules
  • a new BNF rale for an alphanumeric character can be defined in terms of existing BNF rules for letters and digits as follows:
  • alphanum ⁇ letter>
  • An example BNF description taken from SIP [RFC-2543] is described below.
  • the top-level BNF rule in this case describes the format of a website address. Note that website addresses can be expressed either as digits (for example "66.218.71.87”), or in the form of text (for example "www.yahoo.com”).
  • the BNF description allows both the numerical form and the text form of a website address to be used:
  • IPv4address l* ⁇ digit> ".” l* ⁇ digit> ".” l* ⁇ digit> ".' l* ⁇ digit>
  • alphanum ⁇ letter>
  • the BNF-based compression algorithm can compress a protocol described using any variant of BNF, provided that the actions to take for each fundamental BNF rale have been defined. Compression protocols are defined using ABNF [RFC-2234], as this covers a wide range of protocols including SIP [RFC-2543].
  • the basic idea of the BNF-based compression algorithm is to provide a BNF (Backus Naur Form) description of the relevant protocol at the compressor and decompressor.
  • this BNF description is parsed to reconstruct the original uncompressed message.
  • the information required in the compressed message is therefore just a set of instructions telling the decompressor which BNF rale to parse when more than one choice is available.
  • new_rule_3 ⁇ domainlabel> ⁇ new_rule_4>
  • the BNF "and" rule is built up from n consecutive existing BNF rules. For example:
  • ⁇ byte> ⁇ hex_digit> ⁇ hex_digit>
  • the BNF "or" rule specifies that the next part of the protocol can take one of n different forms. For example:
  • alphanum ⁇ letter>
  • an alphanumeric character can be either a letter or a digit.
  • the compressor appends the bits 00 to the end of the compressed message; if it contains the name "Dick” then the compressor appends the bits 01 and so on.
  • the decompressor reads these bits in the compressed message it can work out which of the four names must occur in the uncompressed message.
  • a BNF "text" rale specifies that a certain text string must occur in the uncompressed message. Since the BNF rale does not offer any choice of which text string occurs, it can be compressed down to 0 bits (i.e. the decompressor can successfully reconstruct the text string without needing any information from the compressed message).
  • the decompressor When this BNF rule is encountered, the decompressor simply copies the string "Tom" into the uncompressed message.
  • the BNF "optional" rale specifies that a certain BNF rule may or may not be present in the uncompressed message.
  • list_rule x*y ( ⁇ rule>)
  • a text string can contain 0 or more alphanumeric characters.
  • the protocol is that of a "hello” message.
  • hello-message "hello”
  • name "andrew”
  • first line is the all-defining line of the protocol.
  • BNF the first line also makes reference to the line ⁇ name>.
  • This second line says that the name may be "andrew” or "richard”.
  • the decompressor knows the protocol which is coming next, the only data which needs to be transmitted is the data telling which names are transmitted. In the example, rather than transmitting the strings “andrew” or “richard", because the choices or possibilities are one of two strings, all data can be transmitted as either a "0" or a "1". There is no need to transmit what the protocol (message) is, as a decompressor can derive from the header what message to expect.
  • a binary code may be assigned to each possibility.
  • these 10 possibilities are compressed i.e. coded according to the probability they occur using known compression schemes, such as Huffman.
  • the size of the binary code is made inversely proportional to the probability that that number of characters occurs.
  • hello_message "Hello " ⁇ name>
  • the first step is to split up the two BNF rules into simpler BNF rules taking one of the five "canonical" forms. This is accomplished as follows:
  • hello_message ⁇ rule_l> ⁇ name>
  • the compressor inserts some information into the compressed message that allows the decompressor to work out what to do next.
  • the compressor inserts log 2 (n) bits into the compressed message that can be interpreted as an integer from 0 to (n-1).
  • hello_message "Hello " ⁇ name>
  • the frequency that each choice in a BNF description will be used can be calculated by applying the scheme to a selection of messages. The number of times that each choice is used is recorded, and the results are scaled to give the necessary frequency of occurrence.
  • the resulting set of probability values can then be used to build a "Huffman tree". This tree indicates how to encode each of the choices in an optimally efficient manner, with more common choices communicated to the decompressor using fewer bits than rare choices.
  • the invention has been described with specific reference to examples of compressing and decompressing messages from protocols described using ABNF.
  • Advantages of supporting ABNF is that it is the variant of BNF used to describe many text-based protocols including SIP. It will be appreciated that many other variants exist on the basic BNF language.
  • the "EPIC" variant of BNF provides a number of additional fundamental BNF rales that can be used to describe, in greater detail, protocols to be compressed.
  • EPIC version of BNF provides rules for sequence numbers (which increase by 1 every time the rule is invoked) and length fields (which contain the length of the entire message). Use of these BNF rules improves the compression ratio because the sequence numbers and length fields can be computed or inferred at the decompressor rather than being transmitted as part of the compressed code.
  • any protocol described using the EPIC version of BNF will be compressed to a greater degree than a protocol described using ABNF.
  • dynamic compression may be used to improve the compression ratio.
  • the version of the BNF-based compressor described only compresses messages individually, so the overall compression ratio does not improve. This is also the case even when a large number of consecutive messages are compressed.
  • a more advanced version of the algorithm could modify dynamically the BNF description of the protocol 'on the fly', thereby learning more details of the message flow which it is compressing. For example, any new text strings that occur in the message flow could be added to the BNF description at both the compressor and decompressor.
  • the BNF-based compression algorithm described above can compress any message compatible with the BNF.
  • illegal messages cannot be compressed. In certain cases this may be a drawback, for example if the BNF description becomes outdated. Therefore in the ability to compress messages, which do not follow the BNF description currently available, is ideally also available at the compressor and decompressor.
  • Suitable generic compression algorithms include DEFLATE and
  • a more advanced method for adding generic message compression is to provide a new BNF rule for generic compression and to add this rule to the BNF description of the protocol. This rule can then be invoked as a default, for any part of a message that contains freeform text. Any non-conforming text therefore would still benefit from generic compression techniques. Compliant portions of a message can still be compressed, using the original BNF rales, so the overall compression ratio is higher than that obtained by using the generic algorithm alone.
  • the invention has been described by way of exemplary embodiments only and it will be appreciated that variation to the embodiments may be made without departing from the scope of the invention.
  • the invention may be included in a base system, for use in mobile telephony, a router used in routing signals along optical networks or a mobile telephone handset or pager.

Abstract

In a communications system where different protocols are sent from a transmitting terminal to a receiving terminal, a method of transmitting specific protocol fields information, known as protocol fields, comprising: pre-storing information/instructions regarding said protocol fields at the receiving terminal. A compressor compresses variable data in said fields. The transmitting terminal sends compressed data only relating to variable data within a specific protocol field; and from this the receiving terminal interprets the specific protocol field by decompressing the received data. The invention has particular application to where the protocol fields is in Backus Naur from.

Description

METHOD FOR COMPRESSING PROTOCOLS AND RELATED
SYSTEM
BACKGROUND TO THE INVENTION
The invention relates to a method for compressing protocols and a related system. It is particularly applicable to data compression schemes, which are used in systems to transmit different protocols under which varying protocol information is sent. It has particular, but not exclusive, application to BNF (Backus Naur Form) which is a standard language used to describe new protocols and signalling schemes.
In some data transmission schemes, all variable data (e.g. strings) were not all sent in full. However, shorter reference was made to them by way of a numerical shorthand. These numerical references or shorthand still required a relatively high number of bits to ensure they could be decompressed when received. Also, under such schemes, the string, references, as well as the data, had to be carefully ordered. This increased processing time. Clearly such compression algorithms reduced the efficiency of the compression process.
In data systems, such as, for example, packet switching, there is often varying data which is sent within a protocol message. In telephony this variable data is usually found in the header of a packet and includes information about the timing protocol; the routing protocol as well as so-called checking data. All this data is in addition to the actual voice data which is usually stored in the "payload". In some packet telephone systems header data was sent in the form of strings of characters. In these systems transmitting protocols was usually performed by sending all the strings of the protocol message. As ever increasing demands are being placed upon telephone networks in order to optimise available bandwidth and speed attempts were made to compress data so as to meet the demand.
PRIOR ART
An example of a prior art protocol is described in European Patent Application Number EP-A2-0 844 768 (Webtv Networks Inc). European Patent Application Number EP-A2-0 844 768 describes a technique in which data that can be compressed, is attached to a compression stream, and compressed. Compressed data is then transmitted continuously as it is generated. Previously data was transmitted in compressed, discrete blocks or packets. Therefore channel carrying capacity was not fully optimised. The technique described overcomes the problem associated with the compression techniques, in which compressed data was transmitted as intermittent packets. However, it did not address the particular problem of improving the compression ratio. Furthermore the aforementioned Patent Application does not provide any hint or suggestion as to how to reduce the required memory capacity of a compressor/decompressor system.
It is an object of the invention to improve the compression ratio of a compressor/decompressor system. Another object of the invention is to reduce the required memory capacity of a compressor/decompressor system. A further object of the invention to transmit protocol messages in a more efficient manner.
SUMMARY OF THE INVENTION
According to a first aspect of the invention there is provided a method of transmitting specific protocol field information comprising the following steps: a) pre-storing predictable data portions of protocol fields, at the receiving terminal; b) removing variable data from the protocol field; c) compressing the variable data into a code; d) transmitting the code to said receiving terminal; e) receiving the code at the receiving terminal; f) decompressing the code into said variable data; and g) reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal. The process of transmitting protocols is therefore made more efficient by applying various data compression schemes to the protocols. The amount of transmitted data is reduced, for example by transmitting only the variable data, such as the strings of characters that may vary within a particular protocol.
The invention utilises an algorithm to provide efficient compression of arbitrary protocols including SIP [RFC-2543] and RTSP [RFC-2326]. The algorithm preferably incorporates in its input a Backus Naur Form (BNF) description of the protocol to be compressed, which is stored at the compressor and decompressor and used as part of the compression process. Since the algorithm is preprogrammed with knowledge of how protocols behave, the compression ratio obtained is very high and the processing and memory requirements are low.
BNF descriptors are available for a wide range of important protocols including Hyper-text Mark-up Protocol (HTML) and Session Initiation Protocol (SIP). Under such protocols the first line of a protocol message references a strict, unaltering set of protocol rules. Such protocol rules are stored at a receiving terminal of a transmission system, for example a BNF descriptor of a protocol tells the receiving terminal the exact type of messages that are to be transmitted, and references further rules in such a way that every possible message that the protocol can send is defined. In other words it describes the format of the message, effectively stating what a SIP or HTML will comprise.
Therefore rather than transmitting the variable data or indeed numerical references thereto, only data relating to the options is sent; the options being interpreted at the receiving terminal, for example from a look-up table of known relationships.
According to another aspect of the invention there is provided a communication system, in which one of a plurality of protocols are transmitted to a receiving terminal, said system comprising: a) means for pre-storing predictable data portions of protocol fields, at the receiving terminal; b) means for removing variable data from the protocol field; c) means for compressing the variable data into a code; d) a transmitter for transmitting the code to said receiving terminal; e) receiving the code at the receiving terminal; f) means for decompressing the code into said variable data; and g) means for reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal.
Non predictable data includes information which is not either stored or derivable at the receiving terminal. For example the type of pre-stored data to be combined may inclμde a flag or label pointing to which element of pre-stored data needs to be included in the reconstruction of the transmission specific protocol, which data portions need to be taken from a look-up table or index; or which data needs to be derived, for example by way of an iteration to compute a number in a sequence from a previous number according to a stored algorithm.
Preferably means is also provided for determining which of said protocols is to be sent and when by said transmitting terminal.
The means for compressing the variable data into a code; means for decompressing the code into said variable data; and means for reconstructing the protocol field, may all be performed by dedicated hardware devices or in a Programmable Micro-controller, such as a Digital Signal Processor (DSP) or Application Specific Integrated Circuit (ASIC).
In a particularly preferred embodiment the number of bits, which may be used to code for each option, is generally inversely proportional to the frequency of that option occurring. This improves the compression ratio yet further. It may use for example a Huffman method.
The protocol may be in any suitable form, but is preferably in
Backus Naur form. An advantage with this is that the invention is able to make use of the fact that the only variable data needs to be sent in order to complete the specific protocol instruction.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
The invention will now be 'described with reference to examples. A brief description of (Backus Naur Form) BNF, as a language, is included to assist the understanding of the examples.
BNF (Backus Naur Form) is a language used to describe the behaviour of many well-known protocols. All versions of the BNF language are based around the concept of a "BNF rule". Each BNF rule describes the syntax of a portion of the protocol, and is defined in terms of existing BNF rules. The "top-level" BNF rule describes the entire protocol in question.
The following basic constructs are used to define new BNF rules:
BNF_rule = ... Defines a new BNF rule in terms of existing BNF rules
<BNF_rale> Reference to an existing BNF rule
<a> I <b> ... I <z> Choice of different BNF rules For example, a new BNF rale for an alphanumeric character can be defined in terms of existing BNF rules for letters and digits as follows:
alphanum = <letter> | <digit>
Clearly new BNF rules cannot be defined in terms of existing BNF rules ad infinitum, so one or more "fundamental" BNF rales must also be provided. For example, many variants of BNF allow new BNF rules to be specified in terms of ASCII text as illustrated below:
digit = "0" | " 1" I "2" I "3" | "4" | "5" | "6" | "7" | " " I "9"
A number of variants exist for the basic BNF language, each offering different fundamental BNF rules. For example the Augmented BNF specification used in SIP offers a selection of fundamental rules including the following:
"string" String of ASCII characters
[<BNF_rule>] Optional BNF rale
x*y <BNF rale> List of between x and y occurrences of
BNF_rule. If x is omitted then x = 0, and if y is omitted then y = infinity An example BNF description taken from SIP [RFC-2543] is described below. The top-level BNF rule in this case describes the format of a website address. Note that website addresses can be expressed either as digits (for example "66.218.71.87"), or in the form of text (for example "www.yahoo.com"). The BNF description allows both the numerical form and the text form of a website address to be used:
webaddress = <IPv4address> I <hostname>
IPv4address = l*<digit> "." l*<digit> "." l*<digit> ".' l*<digit>
hostname = *(<domainlabel> "." ) <toplabel> [ "." ]
domainlabel = *<alphanum>
toplabel — * <letter>
alphanum = <letter> | <digit>
letter = <lowercase> | <uppercase> uppercase = "A" | "B" | "C" | "D" | "E" | "F" | "G" | *'H"
HT ii i ii T it I <<xζ" I "T " I "λ/T" I "1ST" I " " I "P" I " " I "T? " I " " I "rr" I "T T" "\ " I " X/'" I "Y" I "V" I '"7"
lowercase = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" "j" I "k" I "1" I "m" I "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | '"w"
"x" "z"
digit = "0" I " 1" I "2" I "3" | "4" | "5" | "6" | "7" | "8" | "9"
Example BNF description of a website address
The BNF-based compression algorithm can compress a protocol described using any variant of BNF, provided that the actions to take for each fundamental BNF rale have been defined. Compression protocols are defined using ABNF [RFC-2234], as this covers a wide range of protocols including SIP [RFC-2543].
The basic idea of the BNF-based compression algorithm is to provide a BNF (Backus Naur Form) description of the relevant protocol at the compressor and decompressor.
At the decompressor this BNF description is parsed to reconstruct the original uncompressed message. The information required in the compressed message is therefore just a set of instructions telling the decompressor which BNF rale to parse when more than one choice is available.
For simplicity, it is assumed that every BNF rale used to describe the protocol takes one of the following five "canonical" forms:
and_rule = <rale_l> <rule_2> ... <rale_n>
or_rale = <rale_l> | <rule_2> | ... | <rale_n>
text_rule = "string"
optional_rule = [<rule>]
list_rule = x*y (<rale>)
If a particular BNF rale does not fall into one of the above categories then it must be broken down into a set of simpler BNF rales. For example, consider the following rule:
hostname = *(<domainlabel> "." ) <toρlabel> [ "." ]
This rale is too complex to take one of the five canonical forms; however it can be rewritten as the following set of simpler BNF rales:
hostname = <new_rule_l> <toplabel> <new_rule_2> new_rule_l = *<new_rule_3>
new_rule_2 = [<new_rule_4>]
new_rule_3 = <domainlabel> <new_rule_4>
new_rule_4 = "."
All of the rales listed above are in canonical form. More generally, it is always possible to rewrite any BNF rule as a set of one or more rules in canonical form.
Once the BNF description of the protocol has been rewritten so that all of the BNF rules are in canonical form, the compressor and the decompressor make use of the BNF description as follows:
BNF "and" rule
BNF description: and rule = <rale_l> <rale_2> ... <rule_n>
The BNF "and" rule is built up from n consecutive existing BNF rules. For example:
<byte> = <hex_digit> <hex_digit> The above rale specifies that a single byte can be built up from two consecutive hex digits.
Observe that a BNF "and" rule does not offer any choice in the way that it is built up from simpler BNF rales; in the above example a byte is always built from exactly two consecutive hex digits. This means that any instance of an "and" rule can be compressed down to 0 bits. When the compressor or decompressor encounters a BNF "and" rale built up from n existing BNF rules, it just follows each of the n existing rules in turn taking the actions as hereindefined.
BNF "or" rule
BNF description: or_rule = <rule_l> | <rale_2> | ... | <rule_n>
The BNF "or" rule specifies that the next part of the protocol can take one of n different forms. For example:
alphanum = <letter> | <digit>
The above rale specifies that an alphanumeric character can be either a letter or a digit.
When the compressor encounters a BNF "or" rale, it tries each of the choices in turn until it finds the rale that successfully matches the uncompressed message. The compressor then appends k bits to the end of the compressed message, where k = log2(n) rounded up to the nearest integer. The value of the k bits is set to 0 if the first rule is chosen, 1 if the second rule is chosen and so on. The decompressor can read these bits to determine which of the n possible BNF rales it needs to follow in order to successfully rebuild the uncompressed message.
As an example, consider the following instance of an "or" rule:
name = "Tom" | "Dick" j "Harry" | "Sally"
If the uncompressed message contains the name "Tom" then the compressor appends the bits 00 to the end of the compressed message; if it contains the name "Dick" then the compressor appends the bits 01 and so on. When the decompressor reads these bits in the compressed message it can work out which of the four names must occur in the uncompressed message.
BNF "text" rule
BNF description: text_rale = "string"
A BNF "text" rale specifies that a certain text string must occur in the uncompressed message. Since the BNF rale does not offer any choice of which text string occurs, it can be compressed down to 0 bits (i.e. the decompressor can successfully reconstruct the text string without needing any information from the compressed message).
For example, consider the following BNF rule:
name = "Tom"
When this BNF rule is encountered, the decompressor simply copies the string "Tom" into the uncompressed message.
BNF "optional" rule
BNF description: optional_rule = [<rule>]
The BNF "optional" rale specifies that a certain BNF rule may or may not be present in the uncompressed message.
When the compressor encounters a BNF "optional" rule, it appends a single bit to the compressed message. The bit takes value "1 " if the optional rule is present in the uncompressed message, and takes value "0" if it is not present. When the decompressor encounters the BNF "optional" rule, it reads the value of the bit to determine whether the optional rale is present or not.
BNF "list" rule
BNF description: list_rule = x*y (<rule>) The BNF "list" rule allows between x and y instances of an existing BNF rule to appear in the protocol. If x is omitted then x = 0 and if y is omitted then y = infinity. For example:
textstring = *<alphanum>
The above rale specifies that a text string can contain 0 or more alphanumeric characters.
When the compressor encounters a BNF "list" rule, it counts the number of instances of the existing BNF rule that occur in the uncompressed message. Suppose that k instances of the BNF rule occur (where k lies between x and y inclusive). If k = x then the compressor appends the single bit "1 " to the compressed message; if k = x + 1 then the compressor appends "01" to the compressed message; if k = x + 2 then the compressor appends "001" and so on. When the decompressor encounters the BNF "list" rale it reads the compressed message to find out how many instances of the BNF rule it needs to append to the uncompressed message.
Example 1
In this example the protocol is that of a "hello" message. Below represents this message and the two lines that describe the language in BNF format hello-message = "hello" <name> name = "andrew"|"richard" where "|" denotes the option "or"; and the optional data is highlighted in bold script.
Note the first line is the all-defining line of the protocol. In BNF the first line also makes reference to the line <name>. This second line says that the name may be "andrew" or "richard".
If the decompressor knows the protocol which is coming next, the only data which needs to be transmitted is the data telling which names are transmitted. In the example, rather than transmitting the strings "andrew" or "richard", because the choices or possibilities are one of two strings, all data can be transmitted as either a "0" or a "1". There is no need to transmit what the protocol (message) is, as a decompressor can derive from the header what message to expect.
In the example there are only two alternatives and these options are known at the decompressor end. Therefore, only the choices are sent. The compressed data is therefore in the form of a "0" or a "1" because there are two choices.
Example 2
This example is similar to the previous example but instead there are 10 possibilities. "a"|"b"|"c"|"d"|"e"|"f'|"g"|"h"|i"|"j"
A binary code may be assigned to each possibility. In order to further enhance efficiency, in a preferred embodiment these 10 possibilities are compressed i.e. coded according to the probability they occur using known compression schemes, such as Huffman.
For example if the frequency of these letters occurring happened to be in the following ratios: "a" 50%, "b" 20%, "c" 10% etc then, for example, the following bits could represent the characters:
a = 0 b = 10 c = 110 d = 11100 e = 11101 f = 11110 g = 111110 h = 1111110 j = 1111111
In this way more frequent characters are compressed into (coded by) fewer bits because they are transmitted most frequently. All that is required by such schemes is a prerequisite knowledge of the probability of these occurring. Various methods of determining these would be well known to the skilled person. An example of such is sampling.
Example 3
There is also in BNF the so called star (*) rule, whereby a number of identical characters are sent; this number may vary from 0 to any integer.
message = * ("a")
for example zero "a"s , or "a", or "aa", or "aaa", or "aaaa'
These options are coded into binary form purely by the binary number representing the number of characters "a'"s in this example.
Alternatively in a preferred embodiment, the size of the binary code is made inversely proportional to the probability that that number of characters occurs.
No. of "a"s Probability Assigned binary code zero 15% 100 a 5% 1010 aa 5% 1011 aaa 1% 110000 aaaa 1% 110001 aaaaa 50% 0 etc
Another example of the BNF-based compression algorithm is described below. Consider a very simple protocol defined by the following BNF:
hello_message = "Hello " <name>
name "Tom" I "Dick" I "Harry" | "Sally"
The first step is to split up the two BNF rules into simpler BNF rules taking one of the five "canonical" forms. This is accomplished as follows:
hello_message = <rule_l> <name>
rule 1 "Hello "
name <rale_2> I <rule_3> | <rule_4> | <rale_5>
rule 2 "Tom" rule 3 = "Dick
rale 4 "Harry'
rule 5 "Sally'
When reconstructing an uncompressed message generated using the above BNF, all that the decompressor needs to know is the correct choice of name to insert after the "Hello " text. Therefore each message can be compressed down to just 2 bits.
The four possible uncompressed messages and their compressed equivalents are given below:
Uncompressed message: Compressed message:
Hello Tom 00 Hello Dick . 01
Hello Harry 10
Hello Sally 11
A number of further enhancements that can be made to improve the efficiency of this base scheme. For each of the BNF rales that allow some flexibility in how the protocol behaves (this includes the
"or" rale, "optional" rule and "list" rule), the compressor inserts some information into the compressed message that allows the decompressor to work out what to do next.
For example, if a BNF "or" rale is encountered that offers n possible choices of which BNF rale to follow next, then the compressor inserts log2(n) bits into the compressed message that can be interpreted as an integer from 0 to (n-1).
To reduce the average number of bits required to transmit a choice to the decompressor, it is possible to take into account the probability, from the frequency of occurrence, that each choice will be used in practice. For example, consider the simple protocol mentioned above:
hello_message = "Hello " <name>
name = "Tom" | "Dick" | "Harry" | "Sally"
If all four names occur on a roughly equal basis then it is most efficient to communicate the choice of name as a 2-bit integer. However if the name "Tom" occurs 95% of the time then the following encoding will give better efficiency:
Uncompressed message: Compressed message:
Hello Tom 0 Hello Dick 10
Hello Harry 110
Hello Sally 111
The frequency that each choice in a BNF description will be used, can be calculated by applying the scheme to a selection of messages. The number of times that each choice is used is recorded, and the results are scaled to give the necessary frequency of occurrence.
Note that the selection of messages provided to the BNF-based compressor in this "learning" phase should reflect as accurately as possible the mix of messages that will be compressed. More accurate probability values give a higher compression ratio when the compression algorithm is applied.
The resulting set of probability values can then be used to build a "Huffman tree". This tree indicates how to encode each of the choices in an optimally efficient manner, with more common choices communicated to the decompressor using fewer bits than rare choices.
The invention has been described with specific reference to examples of compressing and decompressing messages from protocols described using ABNF. Advantages of supporting ABNF is that it is the variant of BNF used to describe many text-based protocols including SIP. It will be appreciated that many other variants exist on the basic BNF language. For example the "EPIC" variant of BNF provides a number of additional fundamental BNF rales that can be used to describe, in greater detail, protocols to be compressed.
EPIC version of BNF provides rules for sequence numbers (which increase by 1 every time the rule is invoked) and length fields (which contain the length of the entire message). Use of these BNF rules improves the compression ratio because the sequence numbers and length fields can be computed or inferred at the decompressor rather than being transmitted as part of the compressed code.
Consequently, any protocol described using the EPIC version of BNF will be compressed to a greater degree than a protocol described using ABNF. Furthermore so called dynamic compression may be used to improve the compression ratio. The version of the BNF-based compressor described, only compresses messages individually, so the overall compression ratio does not improve. This is also the case even when a large number of consecutive messages are compressed. However, a more advanced version of the algorithm could modify dynamically the BNF description of the protocol 'on the fly', thereby learning more details of the message flow which it is compressing. For example, any new text strings that occur in the message flow could be added to the BNF description at both the compressor and decompressor. If the same text string occurred in a later message, then it would already be available at the decompressor, so it would not have to be transmitted in full in subsequently compressed messages. This aspect further improves the overall compression ratio. For a given BNF description, the BNF-based compression algorithm described above can compress any message compatible with the BNF. However illegal messages cannot be compressed. In certain cases this may be a drawback, for example if the BNF description becomes outdated. Therefore in the ability to compress messages, which do not follow the BNF description currently available, is ideally also available at the compressor and decompressor.
The simplest way to ensure that arbitrary messages can be compressed is to invoke a "backup" compression algorithm for those messages that do not match the BNF description of the protocol.
Suitable generic compression algorithms include DEFLATE and
LZW.
Alternatively, a more advanced method for adding generic message compression is to provide a new BNF rule for generic compression and to add this rule to the BNF description of the protocol. This rule can then be invoked as a default, for any part of a message that contains freeform text. Any non-conforming text therefore would still benefit from generic compression techniques. Compliant portions of a message can still be compressed, using the original BNF rales, so the overall compression ratio is higher than that obtained by using the generic algorithm alone.
The invention has been described by way of exemplary embodiments only and it will be appreciated that variation to the embodiments may be made without departing from the scope of the invention. For example, the invention may be included in a base system, for use in mobile telephony, a router used in routing signals along optical networks or a mobile telephone handset or pager.

Claims

1. A method of transmitting specific protocol field information comprises the following steps: a) pre-storing predictable data portions of protocol fields, at the receiving terminal; b) removing variable data from the protocol field; c) compressing the variable data into a code; d) transmitting the code to said receiving terminal; e) receiving the code at the receiving terminal; f) decompressing the code into said variable data; and g) reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal.
2. A method as claimed in claim 1 wherein the number of bits which code for each option is generally inversely proportional to the probability of that option occurring.
3. A method as claimed in claim 2 wherein said coding is performed using a Huffman method.
4. A method as claimed in any preceding claim wherein said protocol is in Backus Naur form.
5. A communication system comprising: a) means for pre-storing predictable data portions of protocol fields, at a receiving terminal; b) means for removing variable data from the protocol field; c) means for compressing the variable data into a code; d) a transmitter for transmitting the code to said receiving terminal; e) receiving the code at the receiving terminal; f) means for decompressing the code into said variable data; and g) means for reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal.
6. A communications system as claimed in claim 5 wherein the number of bits which code for each option is generally inversely proportional to the probability of that option occurring.
7. A communications system as claimed in claim 6 wherein said coding is performed using a Huffman method.
8. A communications system as claimed in any preceding claim wherein said protocol is in Backus Naur form.
9. A Programmable Integrated Circuit (PIC), for use in the system of any of claims 5 to 8 in, has means for decompressing the code into said variable data; and means for reconstructing the protocol field using the variable data and the predictable portion of said transmission specific protocol stored at the receiving terminal.
10. A Programmable Integrated Circuit (PIC) according to claim 9 which is an Application Specific Integrated Circuit (ASIC).
11. A Programmable Integrated Circuit (PIC) according to claim 9 includes Electrically Erasable Programmable Memory (EPROM).
12. A signal compressed according to the method of claims 1 to 4.
13. A router including the Programmable Integrated Circuit (PIC) according to any of claims 9 to 11.
PCT/EP2002/007876 2001-07-13 2002-07-15 Method for compressing protocols and related system WO2003007572A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0117132.1 2001-07-13
GBGB0117132.1A GB0117132D0 (en) 2001-07-13 2001-07-13 Data compression
GB0125866.4 2001-10-29
GB0125866A GB2377597B (en) 2001-07-13 2001-10-29 Method of compressing protocols

Publications (1)

Publication Number Publication Date
WO2003007572A1 true WO2003007572A1 (en) 2003-01-23

Family

ID=26246310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2002/007876 WO2003007572A1 (en) 2001-07-13 2002-07-15 Method for compressing protocols and related system

Country Status (1)

Country Link
WO (1) WO2003007572A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11431990B2 (en) 2015-06-04 2022-08-30 Thales Holdings Uk Plc Video compression with increased fidelity near horizon

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0512174A1 (en) * 1991-05-08 1992-11-11 Semaphore, Inc. Parallel rule-based data transmission method and apparatus
US5293379A (en) * 1991-04-22 1994-03-08 Gandalf Technologies, Inc. Packet-based data compression method
US6040790A (en) * 1998-05-29 2000-03-21 Xerox Corporation Method of building an adaptive huffman codeword tree
WO2000049748A1 (en) * 1999-02-17 2000-08-24 Nokia Mobile Phones Ltd. Header compression in real time services
WO2000079763A1 (en) * 1999-06-18 2000-12-28 Telefonaktiebolaget L M Ericsson (Publ) Robust header compression in packet communications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5293379A (en) * 1991-04-22 1994-03-08 Gandalf Technologies, Inc. Packet-based data compression method
EP0512174A1 (en) * 1991-05-08 1992-11-11 Semaphore, Inc. Parallel rule-based data transmission method and apparatus
US6040790A (en) * 1998-05-29 2000-03-21 Xerox Corporation Method of building an adaptive huffman codeword tree
WO2000049748A1 (en) * 1999-02-17 2000-08-24 Nokia Mobile Phones Ltd. Header compression in real time services
WO2000079763A1 (en) * 1999-06-18 2000-12-28 Telefonaktiebolaget L M Ericsson (Publ) Robust header compression in packet communications

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JACOBSON V: "Compressing TCP/IP Headers for Low-Speed Serial Links", RFC 1144, INTERNET ENGINEERING TASK FORCE, February 1990 (1990-02-01), XP002139708 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11431990B2 (en) 2015-06-04 2022-08-30 Thales Holdings Uk Plc Video compression with increased fidelity near horizon

Similar Documents

Publication Publication Date Title
US6883035B2 (en) System and method for communicating with temporary compression tables
EP1334560B1 (en) Communication method for shared context compression
US6985965B2 (en) Static information knowledge used with binary compression methods
CA2065578C (en) Packet-based data compression method
US7071853B2 (en) Method of compressing data packets
US7295575B2 (en) Packet transmitting/receiving apparatus and packet transmission method
US6963587B2 (en) Communication system and method utilizing request-reply communication patterns for data compression
US20040075596A1 (en) Huffman data compression method
AU2001277483B2 (en) Header compression method for network protocols
EP1397866A2 (en) Method and apparatus for adaptive data compression
AU2001293963A1 (en) A method of processing data packets
CA2428788C (en) Static information knowledge used with binary compression methods
JPH11196000A (en) Coding method and data compressor
EP1631885A2 (en) Two stage loss-less compressor for a clear channel over a packet network
WO2004070505A2 (en) Method and device for text data compression
WO2002041498A2 (en) Communication system and method utilizing request-reply communication patterns for data compression
WO2003007572A1 (en) Method for compressing protocols and related system
JPH09149080A (en) Data transmitter
GB2377597A (en) Method of compressing protocols where information/instructions about the protocols are pre-stored at the receiving terminal
WO2003015284A1 (en) Method for compression of data using variable length code
Bakota et al. Optimal format indication in distributed profiles ROHC compression

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): GB US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase