CONTENT STORAGE AND REDUNDANCY ELIMINATION
FIELD OF THE INVENTION The present invention relates to packet-based networks in general, and in particular to apparatus and methods for reducing data traffic associated with the transmission of packets in such networks.
BACKGROUND OF THE INVENTION In computer networks having a client-server architecture, data files are often sent multiple times from a server to the same destination or the same subnet in response to multiple requests. For example, on the Internet a server may transmit the same Graphics Interchange Format (GIF) file over and over again to different users at a single Internet Service Provider (ISP). In packet-switched networks such as the Internet, a data file is segmented prior to transmission into one or more "packets" that are transmitted in a "data stream" to the destination where they are then reassembled into the original data file. In order to facilitate the transmission of the packets over the network, they are "wrapped" by one or more protocols. The Internet server in the previous example may use the HyperText Transfer Protocol (HTTP) in order to transmit the GIF file. The HTTP protocol adds a header that provides additional information about the GIF file (such as size, time and type of server etc.), and may also concatenate several files together (according to the HTTP/1.1 protocol). The Transmission Control Protocol (TCP) then splits the HTTP protocol transmission into packets and for physical transmission over the Internet using the Internet Protocol (IP).
SUMMARY OF THE INVENTION
The present invention seeks to provide apparatus and methods for reducing data traffic associated with the transmission of packets in a packet-based network, such as between two routers in the network. There is thus provided in accordance with a preferred embodiment of the present invention a method for reducing data traffic associated with the transmission of packets in a packet-based network including a) maintaining data transmitted in a data stream at a network source and a network destination, the network source is in communication with the network destination via a network path, b) subsequently receiving a packet associated with the data stream at the network source, c) extracting the packet's data, d) comparing the data extracted in step c) with the data maintained in step a), and e) where the data extracted in step c) at least partially matches the data maintained in step a) f) sending a system packet to the network destination identifying the data extracted in step c) with the data maintained at the network destination, and g) recreating the packet at the network destination from the data maintained at the destination.
Further in accordance with a preferred embodiment of the present invention the system packet includes a checksum of the data stream.
Still further in accordance with a preferred embodiment of the present invention the checksum uniquely identifies the data stream. Additionally in accordance with a preferred embodiment of the present invention the system packet includes an offset identifying the position of the data extracted in step c) in the data maintained in step a).
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the appended drawings in which: Fig. 1 is a simplified block diagram of a system for reducing data traffic associated with the transmission of packets in a network, the system constructed and operative in accordance with a preferred embodiment of the present invention; and
Fig. 2 is a simplified flowchart illustration of a method of operation of the system of Fig. 1 in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
Reference is now made to Fig. 1 , which is a simplified block diagram of a system for reducing data traffic associated with the transmission of packets in a network, the system constructed and operative in accordance with a preferred embodiment of the present invention, and Fig. 2, which is a simplified flowchart illustration of a method of operation of the system of Fig. 1 in accordance with a preferred embodiment of the present invention. In the system of Fig. 1 a stream of packets is transmitted via a network path 8 from a server 10 to a client 12. Client 12 may be any recipient of the data stream sent by server 10, such as an end-user computer or an ISP server. Server 10 is connected to network path 8 via a router 14 and a source device 16. Client 12 is connected to network path 8 via a router 20 and a destination device 18.
Typical operation of the system of Fig. 1 is now explained with specific reference to Fig. 2. Source device 16 preferably receives each packet sent by server 10 for transmission via network path 8 (block 102), extracts the data contained in the packet, and notes the data stream or streams to which the packet belongs (blocks 104 and 106). Next,
the data from each packet is then checked to determine whether some of it, or all of it, was already sent over the network path 8 to destination device 18 (block 108) which preferably stores data sent to it (block 110). One method of determining whether data received at source device 16 from server 10 already exists at destination device 18 is by maintaining in both source device 16 and destination device 18 a copy of all data sent in all data streams or specific data streams, such as those containing GIF data, that previously traversed network path 8 (blocks 110 and 118). Thus if data in a data stream is present in source device 16, it is also present in destination device 18, assuming that a reliable connection exists between source device 16 and destination device 18. If a reliable connection does not exist destination device 18 may inform source device 16 that destination device 18 is missing certain data and request a retransmission of the original data.
If the data were already sent over network path 8, source device 16 communicates to destination device 18 that the data is already available at destination device 18 (block 130). One way of doing this is by sending a stream identifier including the checksum of the stream combined with an offset into the stream, and a length. Destination device 18 retrieves the data from memory (block 132), regenerates the packet (block 134), and sends the regenerated packet to client 12 via router 20 (block 120).
Data not found at destination device 18 is sent by source device 16 via network path 8 (block 112). The data may be sent using any conventional technique, compressed or "as is," provided that the destination device 18 can recreate the original data packet.
It is appreciated that data streams may be transmitted in the context of one or more network sessions. In the TCP IP model, a session might be composed of all packets related to a certain TCP connection. A network session may also be defined as all the IP packets
sent from one IP address to another IP address that share an additional element such as a certain identifier. A session may also include one or more data streams.
A preferred method of associating the data found in a packet with the appropriate data stream is now described. Each packet that is received by source device 16 is checked to determine if it is associated with an existing network session, if a new network session should be defined for the packet transmission, or if it is not to be associated with any session and therefore should be sent "as is," without additional processing by source device 16. In one preferred embodiment of the present invention source and destination devices 16 and 18 are installed at each end of an IP hop. For example, devices 16 and 18 may be installed between the routers at each end of the IP hop. The device at the sending end of the hop, i.e. source device 16. intercepts each IP packet the router sends over the hop, analyzes the packet, and creates a "system packet." The system packet typically includes a packet identifier combined with data. The identifier indicates whether this is a "regular" (bypass) packet, for which no matching data is found at destination device 18, and hence it is sent in its entirely. Alternatively, if some data matches were found, then the packet may include both the unmatched data and indexing information that destination device 18 may use to identify the matched stream, such as a unique checksum of the original stream and information about the location of the data in the stream, such as an offset-length pair. The destination device on the receiving end of the hop receives the system packet and checks the identifier. If the identifier indicates a "regular" (bypass) packet, the packet is analyzed, and the data found in the packet will be stored in destination device 18's memory. The original IP packet is then sent to router 20. However, if the packet identifier indicates that the packet contains information about data streams already present at destination device 18,
destination device 18 then analyzes the packet and recreates the original IP packet from its memory using the information found in the system packet.
Other approaches to data stream identification may be employed, thus supporting data streams which are not yet completed and for which a checksum of the entire stream cannot be calculated.
In another preferred embodiment of the present invention the functionality of source device 16 and destination device 18 may be incorporated into routers 14 and 20 respectfully in hardware and/or software using conventional techniques. The routers may thus be configured to process packets as described above, with the data being stored in the routers' internal memory as necessary.
Source device 16 may use several methods for matching incoming packets to data stored at destination device 18. For example, source device 16 may perform matching on a packet-by-packet basis, than reassemble the data file (e.g., GIF file), and compare the received and stored files. A system utilizing the teachings of the present invention may provide additional data throughput as compared with existing systems as it is typically more efficient to send indexing information as described above which is usually several bytes in length, instead of the entire packet which may contain hundreds of bytes of data.
The methods and apparatus disclosed herein have been described without reference to specific hardware or software. Rather, the methods and apparatus have been described in a manner sufficient to enable persons of ordinary skill in the art to readily adapt commercially available hardware and software as may be needed to reduce any of the embodiments of the present invention to practice without undue experimentation and using conventional techniques.
While the present invention has been described with reference to a few specific embodiments, the description is intended to be illustrative of the invention as a whole and is not to be construed as limiting the invention to the embodiments shown. It is appreciated that various modifications may occur to those skilled in the art that, while not specifically shown herein, are nevertheless within the true spirit and scope of the invention.