US20060139187A1 - Pattern-driven, message-oriented compression apparatus and method - Google Patents

Pattern-driven, message-oriented compression apparatus and method Download PDF

Info

Publication number
US20060139187A1
US20060139187A1 US11/269,148 US26914805A US2006139187A1 US 20060139187 A1 US20060139187 A1 US 20060139187A1 US 26914805 A US26914805 A US 26914805A US 2006139187 A1 US2006139187 A1 US 2006139187A1
Authority
US
United States
Prior art keywords
pattern
message
event
stream
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/269,148
Other versions
US7321322B2 (en
Inventor
Nadav Helfman
Guy Keren
Alex Drobinsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAP SE
SAP Portals Israel Ltd
Original Assignee
SAP SE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAP SE filed Critical SAP SE
Priority to US11/269,148 priority Critical patent/US7321322B2/en
Assigned to SAP PORTALS ISRAEL LTD. reassignment SAP PORTALS ISRAEL LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VIRTUAL LOCALITY LTD.
Publication of US20060139187A1 publication Critical patent/US20060139187A1/en
Application granted granted Critical
Publication of US7321322B2 publication Critical patent/US7321322B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78

Definitions

  • the present invention relates generally to lossless data compression. More particularly, the present invention relates to repeating compression tasks of data generated by similar sources and possible enactments of universal data compression to utilize the attributes of such sources.
  • the performance of data compression depends on what can be determined about the characteristics of the source. When given an incoming data stream, its characteristics can be used to devise a model for better prediction of forecoming strings. If such characteristics are determined prior to compression, a priori knowledge of source characteristics can be obtained, providing a significant advantage and allowing for more efficient compression. However, in most cases a priori knowledge of the source characteristics cannot be determined. This often occurs in real-world applications where properties of a source are dynamic. In particular, the symbol probability distribution of a source usually changes along the time axis.
  • Some substitutional compression processes can be used to compress such data, since they do not require a priori knowledge of the source properties. Such processes can adaptively learn the source characteristics on the fly during the coding phase. Moreover, the decoder can regenerate the source characteristics during decoding, so that characteristics are not required to be transmitted from encoder to decoder.
  • the LZ compression algorithm is a universal compression algorithm that is based on substitutional compression.
  • the main reason for LZ compression algorithm to work universally is the adaptability of the dictionary to the incoming stream.
  • the LZ compression algorithm processes the input data stream and then adaptively constructs two identical buffers of a dictionary at both the encoder and the decoder. Without explicit transmission of the dictionary, this building process is performed during the coding and decoding of the stream, and the dictionary is updated to adapt to the input stream. Matching procedures using this adapted dictionary are expected to give the desirable compression result, since the dictionary reflects incoming statistic quite accurately.
  • Many applications, which may benefit from data compression have repeating usage patterns. Examples for such applications are: a client/server application working session which repeats frequently, or a periodic remote backup process. There is therefore a need for a priori knowledge about the source data.
  • the present invention regards a compression apparatus that includes a usage pattern classifier, an encoder, a decoder and a signaling mechanism of classified usage patterns between the encoder and the decoder.
  • the input stream is delivered to the encoder as messages, which are detected by the classifier.
  • the encoder matches each message with one or both of (a) a dictionary of previously detected streams and (b) a buffer of most N recent messages.
  • This matching results in (a) detection of new repeating strings, (b) a collection of “badly compressed message segments” for future “off-line” analysis, and (c) encoder messages in which content is replaced with a token that includes one or both of (a) references to existing strings in the dictionary with the length used from the beginning of the stream, and (b) a location in the most N recent messages buffer. The location in the most N recent messages buffer is also considered as the declaration of a new string in the dictionary.
  • Offline learning is triggered by a break in the transmitted data detected by the classifier.
  • a pause in the current session results in “internal session redundancy analysis”—matching all “badly compressed message” segments from the current session resulting in (a) new strings in the dictionary, and (b) a reminder of message segments saved for future “cross session” redundancy analysis.
  • the dictionary is “aged”—strings are removed to make room for new items using some “aging policy” process.
  • the end of the current session results in a cross session redundancy analysis which resolves the reminder segment left from the internal session redundancy analysis process.
  • Several versions of the data structure may co-exist to enable analysis in the background. In this case an identifier of the data structure version used is added to the format of the encoded message.
  • An actual realization of the mechanism may also include state structures signatures exchange between the encoder and the decoder, and data structure disk persistency for initialization and recovery.
  • a compression or decompression apparatus comprising at least one data source for providing a stream of data to at least one data destination; at least one pattern classifier for processing the stream of data of the at least one data source into a single stream of messages and for generating at least one pattern event, a message encoder and a message decoder for changing an internal state in response to the at least one pattern event.
  • the stream of messages can comprise continuous content segments in time, application layer or proximity.
  • the encoder internal data structure comprises at least one string dictionary, and a store for most recent messages comprising at least one most recent message. The at least one most recent message is matched with at least one string within the at least one string dictionary.
  • the apparatus can further comprise a pattern classifier for detecting a pattern event in the data stream.
  • the pattern event can be a silence in the session event or an end session event.
  • the encoder or decoder further comprises a badly compressed message segments store for processing at least one badly compressed message into at least one new dictionary string in response to a silence in session event.
  • a compression or decompression method comprising at least one data source providing a stream of data to at least one data destination, employing at least one pattern classifier processing the stream of data of the at least one data source into a single stream of messages and generating at least one pattern event, a message encoder and a message decoder changing an internal state in response to the at least one pattern event.
  • the method can further comprise a step of matching messages from a store for most recent messages within the encoder internal data structure with strings stored in a string dictionary.
  • the method can further comprise the step of the pattern classifier detecting a pattern event in the data stream.
  • the method can further comprise the step of processing a badly compressed message segments store within the encoder or decoder into new dictionary strings in response to a silent in session event.
  • the step of matching can comprise the matching of a hash value of a fixed size prefix within the matched context.
  • FIG. 1 is a block diagram that illustrates a data compression system with a pattern classifier, an encoder, a decoder and a classified patterns events signaling mechanism, in accordance with a preferred embodiment of the present invention
  • FIG. 2 is a block diagram that illustrates the internal structure of the encoder and operation scenarios, in accordance with a preferred embodiment of the present invention.
  • Message a continuous content segment with time, application layer or other proximity as detected by the classifier. Message is the basic unit of processing and encoded and decoded as an atomic operation.
  • Session a stream of Messages with time, application layer or other proximity as detected by the classifier, all generated by the same collection data source.
  • a session is associated with begin session and end session events.
  • Session silence a time, application layer or other pause in the stream of messages in a session.
  • the present invention provides a method and apparatus that adds a prior assumption of the existence of usage patterns to universal data compression methods.
  • Two identical data structures are maintained in the encoder and the decoder, based on the content of the stream with the addition of signals of detected patterns sent from the encoder to the decoder.
  • the history covered by the data structures of the current invention is from the initial usage of the application.
  • the compression ratio achieved may be in an order of about 1-3 magnitudes larger than in common universal data compression while the present mechanism is highly efficient, and suitable for real-time communication.
  • a collection of similar or logically related data sources 100 is producing data streams, which are processed by a pattern classifier 110 into a stream of messages.
  • Each message is processed by an Encoder 120 as an atomic unit, providing a stream of encoded messages.
  • a Decoder 130 decodes each message proving the original stream of messages, which is processed by the messages to streams unit 140 into streams flowing to the associated collection of destinations 150 .
  • the pattern classifier 110 also detects the Session Silent and End Session events, which are signaled both to the Encoder 130 and the Decoder 150 , triggering a modification of the encoder/decoder mutual data structure, named context in this text.
  • the constituent components of the present invention as described in FIGS. 1-2 can operate within a computerized system having one or more central processing unit. Persons skilled in the art will appreciate that the present invention can be operated and applied in many computerized systems, including such systems associated with personal and business computers, network environments, and the like.
  • the Encoder operates as follows: (a) upon activation of the system (step 0 ), the most recent context is loaded into a memory structure 220 from a contexts store 260 .
  • the context includes a Dictionary of strings 230 and a store of message segments 250 .
  • the dictionary 230 is indexed by two methods: (a)—“fingerprint”—a hash value calculated on a fixed size prefix of the string by any (efficient) hashing method such as UHASH, and (b) an identifier.
  • the identifier can be a sequential numerator, a randomly generated identifier or any other like identifier suitable for indexing a dictionary.
  • an ID may be associated with each Context data structure to match with the remote instance for synchronization validation.
  • a message is handed to the encoder (step 1 . 0 ) it is stored in the current message data structure.
  • a fingerprint value is calculated in a particular manner, which can be identical to the dictionary string fingerprints described above.
  • the fingerprint values are used to query (step 1 . 1 ), Dictionary 230 , and (optionally for this massage) (step 1 . 2 ) the N most recent message store 215 using a one to many cross redundancy analyzer 210 .
  • Any match with dictionary 230 is used as an ESCAPE code (string id, length) token in the encoded message.
  • any match with the message with a defined minimal length is used as an ESCAPE code (relative message id in the store, location in message, length).
  • the matched segment is added to the dictionary (step 1 . 2 . 1 ) on both sides.
  • Badly compressed segments larger than a given threshold (step 1 . 4 ) are added to a message segments store data structure.
  • a session silent event (step 2 . 0 ) signaled by the pattern classifier activates a many-to-many cross redundancy analyzer 240 . This analyzer is handed the current active context (step 2 . 1 ) and replaces said active context after the analysis with a new active context (step 2 .
  • An End session event (step 3 . 0 ), signaled by the pattern classifier, activates the many-to-many cross redundancy analyzer unit to read previous (step 3 . 1 ) and current (step 3 . 2 ) context and to resolve items in the message segments store into a new context (step 3 . 3 ) with disk persistency (step 3 . 4 ).
  • Both one-to-many analyzer 210 and many-to-many analyzer 240 operate by mapping each fingerprint value into a list of its instances.
  • the process in the Decoder is similar in the opposite direction.
  • the Decoder has the same context data structures, which are used to resolve back tokens into context segments in methods, which are known to persons skilled in the related art.
  • a data Source of a web-based application running on a computer system with one CPU, is generating replies (in response to requests from a web client application, for example).
  • the stream of communication might have the following pattern. Packets are transmitted contiguously with a delay of less than 50 msec (milliseconds) between each packet, until the content the web based application “wishes” to transmit to the Destination (the web client) is entirely transmitted.
  • a pattern classifier module running on the network gateway computer captures the stream from the Source to the Destination, via a method such as redirecting the traffic to a local listening TCP port using DNAT (Destination network address translation), which is a well-known networking method. After a period of more than 50 msec from the previous packet, the classifier receives the content of a new packet of the stream and starts buffering the content until the flow of packet stops for more than 50 msec.
  • the content is packed with meta-information regarding the original stream into a message data structure and delivered to the Encoder.
  • the Encoder matches the message with a string of previously detected strings using a method such as comparing signatures of fixed length segments in the message. Then, the encoder matches the message with a buffer of N previously transmitted messages for repeating strings. Any segment whose size is more than 10% of the message (or some other measure) and which is not covered by the dictionary or previous messages is added into a segments store. Every matched string is replaced with an escape-char (a sequence to alert a decoder that this is a replaced string) and an index value.
  • the encoded message is transmitted into the other side and handled by the Decoder, which is running on the gateway computer to the Destination's network.
  • the Decoder replaces every escape-char and index with the original string and transmits the content into the destination using a local TCP connection, which matches the meta-information in the Message. In addition, it adds segments, that are larger than 10% of the message into a segments store. This process repeats for every reply message from the web browser to the web client. When the user “takes a break” of more then 120 seconds (for example) and stops generating new requests, the web browser will eventually also stop generating new replies. A software timer in the classifier, which is reset and retriggered to generate an event within 120 seconds (or some other period), after generating every message, eventually triggers a “stream-silence” event. The event is delivered to both the Encoder and the Decoder.
  • both the Encoder and the Decoder analyze the content of the segments store.
  • Each string that is larger than 32 bytes (or some other threshold length) and repeats at least twice is added to the strings dictionary. Having a new version of the strings dictionary, the internal state of both the encoder and the decoder is changed in reaction to the stream-silence event.

Abstract

A compression and decompression method and apparatus comprising at least one data source providing a stream of data to at least one data destination, employing at least one pattern classifier processing the stream of data of the at least one data source into a single stream of messages and generating at least one pattern event, a message encoder and a message decoder changing an internal state in response to the at least one pattern event.

Description

    CROSS-REFERENCES TO RELATED APPLICATIONS
  • This application claims priority to International Application No. PCT/IL2004/000377 filed May 6, 2004.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates generally to lossless data compression. More particularly, the present invention relates to repeating compression tasks of data generated by similar sources and possible enactments of universal data compression to utilize the attributes of such sources.
  • 2. Discussion of the Related Art
  • The performance of data compression depends on what can be determined about the characteristics of the source. When given an incoming data stream, its characteristics can be used to devise a model for better prediction of forecoming strings. If such characteristics are determined prior to compression, a priori knowledge of source characteristics can be obtained, providing a significant advantage and allowing for more efficient compression. However, in most cases a priori knowledge of the source characteristics cannot be determined. This often occurs in real-world applications where properties of a source are dynamic. In particular, the symbol probability distribution of a source usually changes along the time axis.
  • Some substitutional compression processes can be used to compress such data, since they do not require a priori knowledge of the source properties. Such processes can adaptively learn the source characteristics on the fly during the coding phase. Moreover, the decoder can regenerate the source characteristics during decoding, so that characteristics are not required to be transmitted from encoder to decoder.
  • These compression processes can be applied to universal data content and are sometimes called universal data compression algorithms. The LZ compression algorithm is a universal compression algorithm that is based on substitutional compression. The main reason for LZ compression algorithm to work universally is the adaptability of the dictionary to the incoming stream. In general, the LZ compression algorithm processes the input data stream and then adaptively constructs two identical buffers of a dictionary at both the encoder and the decoder. Without explicit transmission of the dictionary, this building process is performed during the coding and decoding of the stream, and the dictionary is updated to adapt to the input stream. Matching procedures using this adapted dictionary are expected to give the desirable compression result, since the dictionary reflects incoming statistic quite accurately. Many applications, which may benefit from data compression, have repeating usage patterns. Examples for such applications are: a client/server application working session which repeats frequently, or a periodic remote backup process. There is therefore a need for a priori knowledge about the source data.
  • BRIEF SUMMARY OF THE INVENTION
  • The present invention regards a compression apparatus that includes a usage pattern classifier, an encoder, a decoder and a signaling mechanism of classified usage patterns between the encoder and the decoder. The input stream is delivered to the encoder as messages, which are detected by the classifier. The encoder matches each message with one or both of (a) a dictionary of previously detected streams and (b) a buffer of most N recent messages. This matching results in (a) detection of new repeating strings, (b) a collection of “badly compressed message segments” for future “off-line” analysis, and (c) encoder messages in which content is replaced with a token that includes one or both of (a) references to existing strings in the dictionary with the length used from the beginning of the stream, and (b) a location in the most N recent messages buffer. The location in the most N recent messages buffer is also considered as the declaration of a new string in the dictionary. Offline learning is triggered by a break in the transmitted data detected by the classifier. A pause in the current session results in “internal session redundancy analysis”—matching all “badly compressed message” segments from the current session resulting in (a) new strings in the dictionary, and (b) a reminder of message segments saved for future “cross session” redundancy analysis. During the process, the dictionary is “aged”—strings are removed to make room for new items using some “aging policy” process. The end of the current session results in a cross session redundancy analysis which resolves the reminder segment left from the internal session redundancy analysis process. Several versions of the data structure may co-exist to enable analysis in the background. In this case an identifier of the data structure version used is added to the format of the encoded message. An actual realization of the mechanism may also include state structures signatures exchange between the encoder and the decoder, and data structure disk persistency for initialization and recovery.
  • In accordance with one aspect of the present invention, there is provided a compression or decompression apparatus comprising at least one data source for providing a stream of data to at least one data destination; at least one pattern classifier for processing the stream of data of the at least one data source into a single stream of messages and for generating at least one pattern event, a message encoder and a message decoder for changing an internal state in response to the at least one pattern event. The stream of messages can comprise continuous content segments in time, application layer or proximity. The encoder internal data structure comprises at least one string dictionary, and a store for most recent messages comprising at least one most recent message. The at least one most recent message is matched with at least one string within the at least one string dictionary. The apparatus can further comprise a pattern classifier for detecting a pattern event in the data stream. The pattern event can be a silence in the session event or an end session event. The encoder or decoder further comprises a badly compressed message segments store for processing at least one badly compressed message into at least one new dictionary string in response to a silence in session event.
  • In accordance with another aspect of the present invention, there is provided a compression or decompression method comprising at least one data source providing a stream of data to at least one data destination, employing at least one pattern classifier processing the stream of data of the at least one data source into a single stream of messages and generating at least one pattern event, a message encoder and a message decoder changing an internal state in response to the at least one pattern event. The method can further comprise a step of matching messages from a store for most recent messages within the encoder internal data structure with strings stored in a string dictionary. The method can further comprise the step of the pattern classifier detecting a pattern event in the data stream. The method can further comprise the step of processing a badly compressed message segments store within the encoder or decoder into new dictionary strings in response to a silent in session event. The step of matching can comprise the matching of a hash value of a fixed size prefix within the matched context.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram that illustrates a data compression system with a pattern classifier, an encoder, a decoder and a classified patterns events signaling mechanism, in accordance with a preferred embodiment of the present invention;
  • FIG. 2 is a block diagram that illustrates the internal structure of the encoder and operation scenarios, in accordance with a preferred embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Message: a continuous content segment with time, application layer or other proximity as detected by the classifier. Message is the basic unit of processing and encoded and decoded as an atomic operation.
  • Session: a stream of Messages with time, application layer or other proximity as detected by the classifier, all generated by the same collection data source. A session is associated with begin session and end session events.
  • Session silence: a time, application layer or other pause in the stream of messages in a session.
  • The present invention provides a method and apparatus that adds a prior assumption of the existence of usage patterns to universal data compression methods. Two identical data structures are maintained in the encoder and the decoder, based on the content of the stream with the addition of signals of detected patterns sent from the encoder to the decoder. The history covered by the data structures of the current invention is from the initial usage of the application. The compression ratio achieved may be in an order of about 1-3 magnitudes larger than in common universal data compression while the present mechanism is highly efficient, and suitable for real-time communication.
  • Referring to FIG. 1, a collection of similar or logically related data sources 100 is producing data streams, which are processed by a pattern classifier 110 into a stream of messages. Each message is processed by an Encoder 120 as an atomic unit, providing a stream of encoded messages. A Decoder 130 decodes each message proving the original stream of messages, which is processed by the messages to streams unit 140 into streams flowing to the associated collection of destinations 150. The pattern classifier 110 also detects the Session Silent and End Session events, which are signaled both to the Encoder 130 and the Decoder 150, triggering a modification of the encoder/decoder mutual data structure, named context in this text. The constituent components of the present invention as described in FIGS. 1-2 can operate within a computerized system having one or more central processing unit. Persons skilled in the art will appreciate that the present invention can be operated and applied in many computerized systems, including such systems associated with personal and business computers, network environments, and the like.
  • Referring now to FIG. 2, the internal structure of an Encoder is described. The Encoder operates as follows: (a) upon activation of the system (step 0), the most recent context is loaded into a memory structure 220 from a contexts store 260. The context includes a Dictionary of strings 230 and a store of message segments 250. The dictionary 230 is indexed by two methods: (a)—“fingerprint”—a hash value calculated on a fixed size prefix of the string by any (efficient) hashing method such as UHASH, and (b) an identifier. The identifier can be a sequential numerator, a randomly generated identifier or any other like identifier suitable for indexing a dictionary. In actual implementations an ID (and context signature) may be associated with each Context data structure to match with the remote instance for synchronization validation. When a message is handed to the encoder (step 1.0), it is stored in the current message data structure. For every location in the message, a fingerprint value is calculated in a particular manner, which can be identical to the dictionary string fingerprints described above. The fingerprint values are used to query (step 1.1), Dictionary 230, and (optionally for this massage) (step 1.2) the N most recent message store 215 using a one to many cross redundancy analyzer 210. Any match with dictionary 230 is used as an ESCAPE code (string id, length) token in the encoded message. Any match with the message with a defined minimal length is used as an ESCAPE code (relative message id in the store, location in message, length). In addition, the matched segment is added to the dictionary (step 1.2.1) on both sides. Badly compressed segments larger than a given threshold (step 1.4) are added to a message segments store data structure. A session silent event (step 2.0) signaled by the pattern classifier activates a many-to-many cross redundancy analyzer 240. This analyzer is handed the current active context (step 2.1) and replaces said active context after the analysis with a new active context (step 2.2) while saving the new context also to the context in the contexts store (step 2.4). An End session event (step 3.0), signaled by the pattern classifier, activates the many-to-many cross redundancy analyzer unit to read previous (step 3.1) and current (step 3.2) context and to resolve items in the message segments store into a new context (step 3.3) with disk persistency (step 3.4). Both one-to-many analyzer 210 and many-to-many analyzer 240 operate by mapping each fingerprint value into a list of its instances.
  • The process in the Decoder is similar in the opposite direction. The Decoder has the same context data structures, which are used to resolve back tokens into context segments in methods, which are known to persons skilled in the related art.
  • One embodiment of the present invention is provided as follows: A data Source of a web-based application, running on a computer system with one CPU, is generating replies (in response to requests from a web client application, for example).
  • The stream of communication might have the following pattern. Packets are transmitted contiguously with a delay of less than 50 msec (milliseconds) between each packet, until the content the web based application “wishes” to transmit to the Destination (the web client) is entirely transmitted. A pattern classifier module running on the network gateway computer captures the stream from the Source to the Destination, via a method such as redirecting the traffic to a local listening TCP port using DNAT (Destination network address translation), which is a well-known networking method. After a period of more than 50 msec from the previous packet, the classifier receives the content of a new packet of the stream and starts buffering the content until the flow of packet stops for more than 50 msec. Then, the content is packed with meta-information regarding the original stream into a message data structure and delivered to the Encoder. The Encoder matches the message with a string of previously detected strings using a method such as comparing signatures of fixed length segments in the message. Then, the encoder matches the message with a buffer of N previously transmitted messages for repeating strings. Any segment whose size is more than 10% of the message (or some other measure) and which is not covered by the dictionary or previous messages is added into a segments store. Every matched string is replaced with an escape-char (a sequence to alert a decoder that this is a replaced string) and an index value. The encoded message is transmitted into the other side and handled by the Decoder, which is running on the gateway computer to the Destination's network. The Decoder replaces every escape-char and index with the original string and transmits the content into the destination using a local TCP connection, which matches the meta-information in the Message. In addition, it adds segments, that are larger than 10% of the message into a segments store. This process repeats for every reply message from the web browser to the web client. When the user “takes a break” of more then 120 seconds (for example) and stops generating new requests, the web browser will eventually also stop generating new replies. A software timer in the classifier, which is reset and retriggered to generate an event within 120 seconds (or some other period), after generating every message, eventually triggers a “stream-silence” event. The event is delivered to both the Encoder and the Decoder. In reaction to the event, both the Encoder and the Decoder analyze the content of the segments store. Each string that is larger than 32 bytes (or some other threshold length) and repeats at least twice is added to the strings dictionary. Having a new version of the strings dictionary, the internal state of both the encoder and the decoder is changed in reaction to the stream-silence event.

Claims (14)

1. Within a computerized environment having at least one central processing unit for processing incoming data to perform one of compression or decompression, a method comprising:
providing a stream of incoming data from at least one data source directed to at least one data destination;
processing the stream of incoming data using a pattern classifier into a single stream of messages;
generating one or more pattern events;
for compression, encoding messages using a message encoder that changes an internal state in response to the at least one pattern event; and
for decompression, decoding messages using a message decoder that changes an internal state in response to the at least one pattern event.
2. The method of claim 1, wherein the single stream of messages comprises content segments continuous in one or more of time, application layer or proximity.
3. The method of claim 1, further comprising matching messages from a store for most recent messages within the encoder internal data structure with strings stored in a string dictionary.
4. The method of claim 3, wherein matching comprises matching a hash value of a fixed size prefix within a matched context.
5. The method of claim 1, further comprising detecting a pattern event in the data stream using the pattern classifier.
6. The method of claim 5, wherein the pattern event comprises one or more of a silence in the session event or an end session event.
7. The method of claim 1, further comprising processing a badly compressed message segments store within the encoder or decoder into new dictionary strings in response to a silence in session event.
8. Within a computerized environment having at least one central processing unit for processing incoming data to perform one of compression or decompression, an apparatus comprising:
at least one data source that provides a stream of incoming data directed to at least one data destination;
a pattern classifier for processing the stream of incoming data into a single stream of messages;
a pattern event generator for generating at least one pattern event;
a message encoder that changes an internal state in response to the at least one pattern event; and
a message decoder that changes an internal state in response to the at least one pattern event.
9. The apparatus of claim 8, wherein the single stream of messages comprises content segments continuous in one or more of time, application layer or proximity.
10. The apparatus of claim 8, wherein an encoder internal data structure comprises:
at least one string dictionary, and
a store for most recent messages comprising at least one most recent message.
11. The apparatus of claim 10, wherein the at least one most recent message is matched with at least one string within the at least one string dictionary.
12. The apparatus of claim 8, wherein the pattern classifier is a pattern classifier that detects pattern events in the incoming data stream.
13. The apparatus of claim 12, wherein pattern events include one or more of a silence in a session event or an end session event.
14. The apparatus of claim 8, wherein the encoder or decoder further comprises a badly compressed message segments store for processing at least one badly compressed message into at least one new dictionary string in response to a silence in session event.
US11/269,148 2003-05-08 2005-11-07 Pattern-driven, message-oriented compression apparatus and method Active US7321322B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/269,148 US7321322B2 (en) 2003-05-08 2005-11-07 Pattern-driven, message-oriented compression apparatus and method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US46866103P 2003-05-08 2003-05-08
WOPCT/IL04/00377 2004-05-06
PCT/IL2004/000377 WO2004100420A2 (en) 2003-05-08 2004-05-06 A pattern driven message oriented compression apparatus and method
US11/269,148 US7321322B2 (en) 2003-05-08 2005-11-07 Pattern-driven, message-oriented compression apparatus and method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2004/000377 Continuation WO2004100420A2 (en) 2003-05-08 2004-05-06 A pattern driven message oriented compression apparatus and method

Publications (2)

Publication Number Publication Date
US20060139187A1 true US20060139187A1 (en) 2006-06-29
US7321322B2 US7321322B2 (en) 2008-01-22

Family

ID=33435198

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/269,148 Active US7321322B2 (en) 2003-05-08 2005-11-07 Pattern-driven, message-oriented compression apparatus and method

Country Status (4)

Country Link
US (1) US7321322B2 (en)
EP (1) EP1620951A4 (en)
IL (1) IL171818A (en)
WO (1) WO2004100420A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050025232A1 (en) * 2003-07-28 2005-02-03 International Business Machines Corporation Apparatus, system and method for data compression using irredundant patterns
US20060098585A1 (en) * 2004-11-09 2006-05-11 Cisco Technology, Inc. Detecting malicious attacks using network behavior and header analysis
US20060161986A1 (en) * 2004-11-09 2006-07-20 Sumeet Singh Method and apparatus for content classification
US7535909B2 (en) 2004-11-09 2009-05-19 Cisco Technology, Inc. Method and apparatus to process packets in a network
US20120197943A1 (en) * 2011-01-28 2012-08-02 International Business Machines Corporation Method, computer system, and physical computer storage medium for organizing data into data structures
US20120243551A1 (en) * 2011-03-22 2012-09-27 Interdisciplinary Center Herzliya Efficient Processing of Compressed Communication Traffic

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8051043B2 (en) 2006-05-05 2011-11-01 Hybir Inc. Group based complete and incremental computer file backup system, process and apparatus
US7751486B2 (en) * 2006-05-19 2010-07-06 Platform Computing Corporation Systems and methods for transmitting data
US9137162B2 (en) 2013-07-23 2015-09-15 Sap Se Network traffic routing optimization
US8937562B1 (en) 2013-07-29 2015-01-20 Sap Se Shared data de-duplication method and system

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4862167A (en) * 1987-02-24 1989-08-29 Hayes Microcomputer Products, Inc. Adaptive data compression method and apparatus
US5572206A (en) * 1994-07-06 1996-11-05 Microsoft Corporation Data compression method and system
US5606317A (en) * 1994-12-09 1997-02-25 Lucent Technologies Inc. Bandwidth efficiency MBNB coding and decoding method and apparatus
US5640563A (en) * 1992-01-31 1997-06-17 International Business Machines Corporation Multi-media computer operating system and method
US5737594A (en) * 1994-07-05 1998-04-07 Trustus Pty Ltd. Method for matching elements of two groups
US5822746A (en) * 1994-07-05 1998-10-13 Trustus Pty Ltd Method for mapping a file specification to a sequence of actions
US5990810A (en) * 1995-02-17 1999-11-23 Williams; Ross Neil Method for partitioning a block of data into subblocks and for storing and communcating such subblocks
US6038593A (en) * 1996-12-30 2000-03-14 Intel Corporation Remote application control for low bandwidth application sharing
US6054943A (en) * 1998-03-25 2000-04-25 Lawrence; John Clifton Multilevel digital information compression based on lawrence algorithm
US6115384A (en) * 1996-06-20 2000-09-05 Fourelle Systems, Inc Gateway architecture for data communication bandwidth-constrained and charge-by-use networks
US6163811A (en) * 1998-10-21 2000-12-19 Wildseed, Limited Token based source file compression/decompression and its application
US6230160B1 (en) * 1997-07-17 2001-05-08 International Business Machines Corporation Creating proxies for distributed beans and event objects
US6269402B1 (en) * 1998-07-20 2001-07-31 Motorola, Inc. Method for providing seamless communication across bearers in a wireless communication system
US20010039565A1 (en) * 1998-06-29 2001-11-08 Abhay K. Gupta Application computing environment
US6333932B1 (en) * 1994-08-22 2001-12-25 Fujitsu Limited Connectionless communications system, its test method, and intra-station control system
US6415329B1 (en) * 1998-03-06 2002-07-02 Massachusetts Institute Of Technology Method and apparatus for improving efficiency of TCP/IP protocol over high delay-bandwidth network
US6445313B2 (en) * 2000-02-07 2002-09-03 Lg Electronics Inc. Data modulating/demodulating method and apparatus for optical recording medium
US6449658B1 (en) * 1999-11-18 2002-09-10 Quikcat.Com, Inc. Method and apparatus for accelerating data through communication networks
US6480123B2 (en) * 1999-12-10 2002-11-12 Sony Corporation Encoding apparatus and method, recording medium, and decoding apparatus and method
US6553141B1 (en) * 2000-01-21 2003-04-22 Stentor, Inc. Methods and apparatus for compression of transform data
US6667700B1 (en) * 2002-10-30 2003-12-23 Nbt Technology, Inc. Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999067886A1 (en) 1998-06-23 1999-12-29 Infit Communications Ltd. Data compression for a multi-flow data stream
US20010039585A1 (en) 1999-12-06 2001-11-08 Leonard Primak System and method for directing a client to a content source

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4862167A (en) * 1987-02-24 1989-08-29 Hayes Microcomputer Products, Inc. Adaptive data compression method and apparatus
US5640563A (en) * 1992-01-31 1997-06-17 International Business Machines Corporation Multi-media computer operating system and method
US5737594A (en) * 1994-07-05 1998-04-07 Trustus Pty Ltd. Method for matching elements of two groups
US5822746A (en) * 1994-07-05 1998-10-13 Trustus Pty Ltd Method for mapping a file specification to a sequence of actions
US5572206A (en) * 1994-07-06 1996-11-05 Microsoft Corporation Data compression method and system
US6333932B1 (en) * 1994-08-22 2001-12-25 Fujitsu Limited Connectionless communications system, its test method, and intra-station control system
US5606317A (en) * 1994-12-09 1997-02-25 Lucent Technologies Inc. Bandwidth efficiency MBNB coding and decoding method and apparatus
US5990810A (en) * 1995-02-17 1999-11-23 Williams; Ross Neil Method for partitioning a block of data into subblocks and for storing and communcating such subblocks
US6115384A (en) * 1996-06-20 2000-09-05 Fourelle Systems, Inc Gateway architecture for data communication bandwidth-constrained and charge-by-use networks
US6038593A (en) * 1996-12-30 2000-03-14 Intel Corporation Remote application control for low bandwidth application sharing
US6230160B1 (en) * 1997-07-17 2001-05-08 International Business Machines Corporation Creating proxies for distributed beans and event objects
US6415329B1 (en) * 1998-03-06 2002-07-02 Massachusetts Institute Of Technology Method and apparatus for improving efficiency of TCP/IP protocol over high delay-bandwidth network
US6054943A (en) * 1998-03-25 2000-04-25 Lawrence; John Clifton Multilevel digital information compression based on lawrence algorithm
US20010039565A1 (en) * 1998-06-29 2001-11-08 Abhay K. Gupta Application computing environment
US6269402B1 (en) * 1998-07-20 2001-07-31 Motorola, Inc. Method for providing seamless communication across bearers in a wireless communication system
US6163811A (en) * 1998-10-21 2000-12-19 Wildseed, Limited Token based source file compression/decompression and its application
US6449658B1 (en) * 1999-11-18 2002-09-10 Quikcat.Com, Inc. Method and apparatus for accelerating data through communication networks
US6480123B2 (en) * 1999-12-10 2002-11-12 Sony Corporation Encoding apparatus and method, recording medium, and decoding apparatus and method
US6553141B1 (en) * 2000-01-21 2003-04-22 Stentor, Inc. Methods and apparatus for compression of transform data
US6445313B2 (en) * 2000-02-07 2002-09-03 Lg Electronics Inc. Data modulating/demodulating method and apparatus for optical recording medium
US6667700B1 (en) * 2002-10-30 2003-12-23 Nbt Technology, Inc. Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050025232A1 (en) * 2003-07-28 2005-02-03 International Business Machines Corporation Apparatus, system and method for data compression using irredundant patterns
US7479905B2 (en) * 2003-07-28 2009-01-20 International Business Machines Corporation Apparatus, system and method for data compression using irredundant patterns
US20060098585A1 (en) * 2004-11-09 2006-05-11 Cisco Technology, Inc. Detecting malicious attacks using network behavior and header analysis
US20060161986A1 (en) * 2004-11-09 2006-07-20 Sumeet Singh Method and apparatus for content classification
US7535909B2 (en) 2004-11-09 2009-05-19 Cisco Technology, Inc. Method and apparatus to process packets in a network
US7936682B2 (en) 2004-11-09 2011-05-03 Cisco Technology, Inc. Detecting malicious attacks using network behavior and header analysis
US8010685B2 (en) * 2004-11-09 2011-08-30 Cisco Technology, Inc. Method and apparatus for content classification
US20120197943A1 (en) * 2011-01-28 2012-08-02 International Business Machines Corporation Method, computer system, and physical computer storage medium for organizing data into data structures
US8732211B2 (en) * 2011-01-28 2014-05-20 International Business Machines Corporation Method, computer system, and physical computer storage medium for organizing data into data structures
US9292546B2 (en) 2011-01-28 2016-03-22 International Business Machines Corporation Method, computer system, and physical computer storage medium for organizing data into data structures
US20120243551A1 (en) * 2011-03-22 2012-09-27 Interdisciplinary Center Herzliya Efficient Processing of Compressed Communication Traffic
US8909813B2 (en) * 2011-03-22 2014-12-09 Ramot At Tel-Aviv University Ltd. Efficient processing of compressed communication traffic

Also Published As

Publication number Publication date
WO2004100420A2 (en) 2004-11-18
US7321322B2 (en) 2008-01-22
EP1620951A4 (en) 2006-06-21
WO2004100420A3 (en) 2005-06-23
EP1620951A2 (en) 2006-02-01
IL171818A (en) 2010-02-17

Similar Documents

Publication Publication Date Title
US7321322B2 (en) Pattern-driven, message-oriented compression apparatus and method
EP2284722B1 (en) Methods and apparatus for generating graphical and media displays at a client
US9036662B1 (en) Compressing packet data
US9727574B2 (en) System and method for applying an efficient data compression scheme to URL parameters
US7643505B1 (en) Method and system for real time compression and decompression
US7802303B1 (en) Real-time in-line detection of malicious code in data streams
KR100949014B1 (en) System and method for improving rohc efficiency
US20070233477A1 (en) Lossless Data Compression Using Adaptive Context Modeling
JP2002533006A (en) Variable-length to variable-length entropy coding
US8207876B2 (en) Memory efficient indexing for disk-based compression
JP4912399B2 (en) Method for compressing language models using GOLOMB codes
US20020103938A1 (en) Adaptive compression in an edge router
US11733867B2 (en) System and method for multiple pass data compaction utilizing delta encoding
US11586588B2 (en) System and methods for bandwidth-efficient cryptographic data transfer
US20050188054A1 (en) Http message compression
US20240088912A1 (en) System and method for encrypted data compaction
US20230384933A1 (en) System and method for data compaction utilizing mismatch probability estimation
US11700013B2 (en) System and method for data compaction and security with extended functionality
KR20040031006A (en) Cache method
CN116527061B (en) Data compression algorithm based on application identification of Internet of things and system thereof
US11853262B2 (en) System and method for computer data type identification
US20240113729A1 (en) System and method for data compression with homomorphic encryption
US11758022B2 (en) Compression of machine-generated data
US20240106457A1 (en) System and method for data compression and encryption using asymmetric codebooks
US20240072824A1 (en) System and method for data compression with protocol adaptation

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAP PORTALS ISRAEL LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VIRTUAL LOCALITY LTD.;REEL/FRAME:017290/0261

Effective date: 20060212

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12