US20020154645A1 - System for bypassing a server to achieve higher throughput between data network and data storage system - Google Patents

System for bypassing a server to achieve higher throughput between data network and data storage system Download PDF

Info

Publication number
US20020154645A1
US20020154645A1 US10/172,853 US17285302A US2002154645A1 US 20020154645 A1 US20020154645 A1 US 20020154645A1 US 17285302 A US17285302 A US 17285302A US 2002154645 A1 US2002154645 A1 US 2002154645A1
Authority
US
United States
Prior art keywords
server
data
network
storage
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/172,853
Inventor
Lee Hu
Jordi Ros
Calvin Shen
Roger Thorpe
Wei Tsai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/172,853 priority Critical patent/US20020154645A1/en
Publication of US20020154645A1 publication Critical patent/US20020154645A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Definitions

  • the present invention relates to computer networks, client/server based computing, and data storage (or storage network). More particularly, this invention relates to network management, performance enhancement and reliability improvement for network data access through servers.
  • Server a computer system that controls data access and data flow.
  • Server-oriented refers to data that requires significant computation or processing, that usually is carried out by a server CPU.
  • the examples are network user login processes going through authorization, authentication and accounting (AAA).
  • Storage-oriented Simple storage access such as disk read and/or write is considered storage-oriented. Most operations are data fetching and transport without the involvement of CPU. OPEG and MPEG file transport are examples of storage-oriented data.
  • Server system bus contention causes two problems for networks. Since each peripheral component must contend for the bus usage without any guarantee of bandwidth latency and time of usage, the user data throughput varies, and the latency for data transfer cannot be bounded.
  • the server OS inefficiency puts a heavy toil on the network throughput.
  • an interrupt causes two context switching operations on a server.
  • Context switching is an OS process in which the operating system suspends its current activity, saves the information required to resume the activity later and shifts to execute a new process. Once the new process is completed or suspended, a second context switching occurs during which the OS recovers its previous state and resumes processing.
  • Each context switch represents an undesirable loss of effective CPU utilization for the task and network throughput. For example, a server handles thousands of requests and data switches at high speed. Further, heavy loading and extensive context-switching can cause a server to crash.
  • a small loss of data can cause TCP to retransmit, and retransmissions will cause more interrupts which in turn may cause more OS crashes.
  • the OS interrupt-induced stability problem is very acute in a web hosting system where millions of hits can be received within a short period of time.
  • SAN Storage Area Network
  • NAS Network Attached Storage
  • An NAS is a storage device with an added thin network layer so the storage can be connected to a network directly. It bypasses servers, so server bottlenecks may be non-existent for NAS systems. (We do not consider a storage-dedicated server as NAS.)
  • the major disadvantages are the lack of the flexibility that servers have, (and the overhead associated with the network layer(s) (if it is too thick)).
  • An NAS can be used in secured environments like an internal LAN or SAN. Authorization, account, and authentication (AAA) and firewall are unlikely to be performed by an NAS, since an overly complicated function may not be implemented due to the cost. Furthermore, it is not easy to upgrade software or protocols under the limited design of interfaces for NAS.
  • SAN is an architecture for storage systems with the advantages of flexibility and scalability. While NAS is limited due to its thin network interface, SAN defines an environment dedicated to storage without worrying about security or other heterogeneous design concerns. Essentially, storage devices in SAN can be viewed as a special kind of NAS, e.g. hard disks with Fibre Channel interfaces. Servers (which are more versatile) are still needed to connect the storage devices to the network. Therefore, the server bottleneck is still present. Furthermore, access control and other server functions are not specified in SAN systems, so other components must be added for full functionality.
  • Objects of the invention include the following:
  • the invention aims to provide highest levels of server-based Reliability, Availability and Scalability (RAS) for a network system and highest levels of QoS for the end users.
  • RAS server-based Reliability, Availability and Scalability
  • an apparatus that causes the majority of data to bypass the server(s).
  • This design improves the end-to-end performance of network access by achieving higher throughput between the network and storage system, improving reliability of the system, yet retaining the security, flexibility, and services that a server-based system provides.
  • the apparatus that provides this improvement logically consists of a network interface, server computer interface, and storage interface. It also has a switching element and a high-layer protocol decoding and control unit. Incoming traffic (either from the network or storage system) is decoded and compared against a routing table. If there is a matching entry, it will be routed, according to the information, to the network, the storage interface, or sent to the server for further processing (default).
  • the routing table entries are set up by the server based on the nature of the applications when an application or user request initially comes in. Subsequently, barring any changes or errors, there will be no data exchange between the server and the device (although, a control message may still flow between them). There may also be a speed matching function between the network and storage, load balancing functions for servers, and flow control for priority and QoS purposes. Because the majority of data traffic will bypass the bus and the operating system (OS) of the server(s), the reliability and throughput can be significantly improved. Therefore, for a given capacity of a server, much more data traffic can be handled, thus making the system more scalable.
  • OS operating system
  • FIG. 1 is a top-level logical diagram for the data-driven multi-processor pipelined model.
  • FIG. 2 is a top-level hardware diagram for the data-driven multi-processor pipelined model.
  • FIG. 3 describes the software structure for the preferred embodiment for the data-driven multi-processor pipelined model.
  • FIG. 4 describes the data queues and processes in the preferred embodiment of the data-driven multi-processor pipelined model.
  • FIG. 5 describes the traffic detour to host for Method 2 for file system consistency between the bypass board and the host.
  • FIG. 6 describes the buffer cache relation with the file system and FS device driver.
  • FIG. 7 is the diagram of the TCP retransmission stack.
  • FIG. 8 is top-level diagram for the relation between the device and server and storage.
  • FIG. 9 is general function blocks inside the device with three logical interfaces, namely network, server and storage.
  • FIG. 10 gives an example of major detailed functions performed to achieve claimed improvements.
  • FIGS. 11 and 12 are flow charts for data flow from network to storage or vice-versa.
  • FIG. 13 is a depiction of information decoded in various layers of protocols.
  • FIG. 14 shows an example of the Expanded Routing Table (ERT) with assumed contents.
  • FIG. 15 is an example of pipelining process to maximize the performance.
  • FIGS. 1 - 15 The preferred embodiment of the invention is illustrated in FIGS. 1 - 15 , and described in the text that follows. Although the invention has been most specifically illustrated with particular preferred embodiments, it should be understood that the invention concerns the principles by which such embodiments may be constructed and operated, and is by no means limited to the specific configurations shown.
  • a three-way network server bypass device has two main function blocks ( 100 and 101 ) as shown in FIG. 8. Based-on decoded high layer protocol information, the control unit, CU ( 100 ) decides to switch the data to the server or to the storage through switching element (SE, 101 ).
  • the device may be physically inside the server housing, but may also be supplied as an external unit.
  • the present invention improves performance and reliability of network data access with the flexibility of a server-based system. It avoids multiple data-copying in a server system, where all the traffic in one direction has to be copied at least twice (and to interrupt the server system at least twice) along the data path from the network to the storage or the other way around.
  • the invention lets the majority of traffic bypass the server system bus, operating system (OS) and CPU or any other involvement with the server. It can also support quality of service (QoS) like prioritized traffic streams for real-time applications with video and audio, with bounded delay.
  • QoS quality of service
  • it can provide load balancing and flow control combining with the CPU/bus/OS bypassing to optimize the overall system performance and improve fault-tolerance.
  • the above-mentioned improvements are achieved by decoding high-layer protocol(s) in real-time and using the information to direct the traffic flow between network interfaces, storage system (or SAN), and server(s),
  • the traffic can be categorized as server-oriented, which will be sent to sender system, or storage-oriented (data retrieving), which will be transferred between the network and storage directly without the servers (CPU, OS and Bus) involvement.
  • server-oriented data retrieving
  • the invention dynamically identifies such traffic as storage-oriented and allows such traffic to bypass server (bus, OS and CPU).
  • the example application presented describes a single packet or a stream of packets with a particular purpose (e.g. user request for a web page.) Therefore, such a request-reply pair session may consist of several sub-applications.
  • a user-initiated request may have to go through log-in and authorization processes that should be handled by server(s). This is a server-oriented process. But after a request is authorized, the transfer of data from the storage to the user can bypass the server and be sent directly to the user through the network interface; it is storage-oriented.
  • the log-in and authorization can be a different type of application from the main session. For example, a request may not be real-time time in nature, while the data transfer could be an isochronous video or audio stream like the case of “video-on-demand.”
  • Simplified examples of application categorizing include:
  • Server-oriented traffic For example, a new request to access a web page or user log-in from the network or storage system control between the server and storage system.
  • Traffic types (1) and (2) will be routed to respective network or storage interfaces (e.g. from storage to network or vice-versa.) while (3) and (4) will be sent to server(s).
  • the decoding process is to look into necessary protocol (layers) and to categorize incoming traffic (from where and for what). Then, the decoded header information (IP address, port ID, sequence number, etc.) is used as an index to the routing table for a match.
  • IP address, port ID, sequence number, etc. is used as an index to the routing table for a match.
  • a matched entry means the direct connection between network and storage has been “authorized.”
  • Exemplary decoded header information is shown in FIG. 13.
  • the HTTP header is in the payload of TCP, which in turn is in the IP packet.
  • the decoding process is to look into the HTTP headers for the nature of data (GET, POST, DELETE, etc, and maybe application payload length.)
  • the data content then is divided into segments of integral multiples of a fixed base, a process that we call “base-multiple segmentation” (BMS) technology.
  • BMS base-multiple segmentation
  • a base of y bytes, say 2 Kbytes is chosen, and all data streams or files are segmented into chunks of integral multiples of 2 Kbytes, like 2, 4, or 8 Kbytes (padding it for the last chunk if it is not an exact integral multiple of 2 Kbytes), with an upper limit of, say, 40 Kbytes (20 times y).
  • the maximum size is chosen based-on the requirement of isochronous real-time traffic and the switching speed, such that it will still meet the tightest real-time needs while the switching element serves the largest segments.
  • BMS advantages are that it is easier to pipeline multiple data streams or files yet still has the flexibility of variable segment size, which reduces overhead (in setup and headers) and improves performance of the device.
  • the BMS technique described above can be used to advantage not only with the apparatus of the preferred embodiment, but in general data switching applications as well.
  • ERT Expanded Routing Table
  • a synchronization scheme is employed to interlock the decoding and switching processes.
  • Multiple incoming data streams are queued for decoding and parsing (e.g. at application layer with HTTP) to decided which path to forward the data.
  • Synchronization is necessary between different phases of a request-reply session. For example, a reply to a request from a network user must be forwarded to the user after the authorization (or log-in) process.
  • the server is running the authorization process, the storage data fetching can be handled concurrently to speed up the process. By the time a request is granted, the data may be ready or getting ready for transmission; otherwise, if it is denied, the transmission is aborted.
  • the invention uses a high-layer or cross-layered (cross protocol layers) switching architecture, because the traffic pattern is significantly influenced by the upper layer applications while the transport unit or packet format is mostly determined by the low layer protocols.
  • web applications determine the size and nature of the transfer (e.g. text-only, still pictures and/or video clips) in the headers of application layer.
  • Low layer protocols decide the size(s) of the packets at various network or system segments and the way to handle them (e.g. fixed size packet vs. variable size, packet size, delay tolerance and flow control methods such as window-based flow control).
  • the benefits can be significant. For example, for streaming applications, data transport is streamed instead of switched packet-by-packet, thus achieving higher throughput).
  • the switching element provides a data path for three-way switching (although it can have more than three physical connections) to and from the network, storage and server function units (CPU), through their respective interfaces with bounded delay.
  • the switching element may be a fully-connected crossbar, memory-based switching, shared medium or other switching construct.
  • the switching element has the capability of switching data traffic between any two (or more) of the interconnected interfaces. It is controlled by the control unit (CU) through a routing table that is set by server and on-board control based-on user request information.
  • the decoding block(s) will look into parts of the packet payload to parse higher layer header and/or content information to be used in making routing decisions in real-time. The information will be compared with a routing table entry for a potential match.
  • the purpose of using higher protocol layer information is to direct and optimize the traffic flow (throughput, utilization, delay, losses, etc.) for performance and reliability improvement.
  • HTTP/html application is given as an example.
  • Other applications like ftp and RTSP/RTP can also be implemented.
  • a control signal is sent to the switching element (SE).
  • SE will set up a circuitry moving the data or packet(s) to the proper outgoing interface(s) through the switching element.
  • Data or packets can be moved either individually and/or in batch (streaming), depending on the relations among them. It also controls routing table update, format conversions (which format to use) and other housekeeping tasks.
  • the scheduler decides the order of execution based-on the priority and QoS information in the routing table.
  • Some flow control mechanisms can also be exercised for the network interface and/or storage interface for further improvement of performance.
  • the router keeps a routing table, switching status and history and certain statistics, and controls the path traversed by the packets.
  • the content in the routing table is provided by the server, based on storage controller (or SAN interface), and/or decoded packet information.
  • the switching and routing elements may be of a predetermined latency, and the routing table may include routing information (which port to route), the priority, delay, sensitivity and nature of the applications, and other contents for QoS measurement.
  • [0079] Basically, there are two kinds of buffers in the device. One is to buffer two asynchronized parts between the network, storage and server interfaces. The other serves as a waiting space for decoding higher layer protocols. In other words, the latter is to synchronize the decoding process and the switching process.
  • the decoding time is pre-determined by design, so that the buffer size requirement can be calculated.
  • a common pool of memory may be shared to save memory. This requires buffer management to dynamically allocate the memory for all pending threads/sessions.
  • decodings and conversions There are several formats with respect to different interfaces and layers of protocols. These decodings and conversions have to be done in the device and involve multiple protocol layers. Examples of decoding and format conversions are HTTP, RTSP, ftp, IP/TCP/UDP, Ethernet, SCSI, Fibre Channel, and/or PCI interfaces.
  • logical medium interfaces there are three types of logical medium interfaces: the network, storage and server(s).
  • various physical interfaces are possible, e.g., multiple network interfaces or storage interfaces or multiple servers.
  • Buffers are used to synchronize transmission between interfaces.
  • An example of implementation may be Ethernet, ATM or SONET for network interface, SCSI, Fibre Channel, PCI, InfiniBand, or other system I/O technology.
  • Such speed matching function may be effectuated through buffering.
  • load balancing may be executed between or among any homogeneous interfaces in the device, and is effected based on message exchange comprising feedback information from the targeted device or other means well known in the art.
  • FIG. 10 describes an implementation with Ethernet interface ( 310 ) for networking, PCI ( 340 ) for server and SCSI ( 350 ) for storage.
  • An incoming user/client request is received from Ethernet interface ( 310 ) and decoded at different layers from the Ethernet format ( 311 ) and IP/TCP ( 312 ) format. Then HTTP header is parsed against the Expanded Routing Table residing in the Router ( 313 , 314 and 315 ). If a match is found, the subsequent data (until the end of the HTTP payload; perhaps an html file) will be forwarded per the router; otherwise, the HTTP payload will be sent to the server for further processing (the default route).
  • a routing table match indicates an established (authorized) connection. For example, if the data is sent to storage, it may be an authorized WRITE to the storage.
  • the data routed to the server can either be an initial request for access or server-oriented traffic.
  • the server may process the request with a log-in (if applicable) using an authentication, authorization, and accounting (AAA) process.
  • AAA authentication, authorization, and accounting
  • the software on the server will communicate with the device for all necessary setup (e.g. routing table and file system for the storage) through the Router Control ( 316 ) and Scheduler (in 315 ) and then pass the control to the device and notify the storage to start a response to that request with a given file ID (or name) for the file system through the control path.
  • the file system then can issue commands to SCSI Interface ( 350 ) to fetch the data.
  • Higher layer traffic information e.g. HTTP or even html
  • HTTP HyperText Transfer Protocol
  • a single initial web access request from the network is forwarded to the server.
  • the server decides the access is legitimate, it sets up both the CU and storage control (or through the CU).
  • Subsequent traffic (responses) will bypass the server and be directly forwarded to the network interface for further transfer.
  • a new request from a user will be directed to server for processing. This may include the case of accessing a new web page or area or from different applications (windows).
  • html vs. real-time video clip for example
  • differentiated services can be provided. Further, streaming based on the content can improve even non-real-time applications.
  • the default traffic path is through the server(s). For cases like initial user login, storage access error, or interrupted web page access, the server(s) would take over the control. Signaling is used to communicate between the server and the device. For the majority of data transfer, however, the server(s) is not in the data path so bus contention, OS involvement (interrupt) and CPU loading are significantly reduced. The traffic reduction through the server is very significant while the flexibility of having server(s) handling unusual cases is maintained in the design, as contrasted with the NAS approach.
  • the device is bidirectional.
  • the server sets up the router ( 315 ) mechanism and the subsequent incoming traffic from network for the same session will bypass the server through the decoding processes (( 310 , 311 , 312 , and 313 ).
  • the decoded high layer information is parsed against the routing table (in 315 ).
  • Proper connection to either server or storage can then be established by the Switching Element ( 303 ). If it is through the server, the data will go through the server bus and the OS for proper processing. Otherwise, a direct connection will be set up to route data (say, html files) to storage through the file system (to handle file format), drivers (to handle storage controller interface, e.g. SCSI) and storage controller.
  • the traffic through server and through SE is synchronized by the Scheduler ( 31 b ) and Memory Pool ( 301 ) before it is sent to SCSI Interface ( 350 ). This process is shown in FIG. 11.
  • a priority mechanism can be implemented to support different QoS requirements in the Router and Scheduler ( 315 and 300 ). In the case of multiple servers and/or storage and network devices, a load balancing and flow control mechanism can be applied based-on application tasks.
  • the server's role is supervisory, and is not involved in the byte-by-byte transfer.
  • the CPU, operating system and server bus(es) are not in the normal path of the data transfer in either direction (from network to storage or from storage to network.)
  • This invention represents a change from interrupt-based server and OS to switching-based networking architecture.
  • Performance improvements provided by the invention include:
  • Priority of services higher layer(s) information can be server loading should further improve the QoS to high priority and regular traffic.
  • Scalability multiple devices can be used within a single server or a single device among multiple servers to support large-scale applications.
  • the type of server(s), operating system(s), network(s), storage system(s) or the speeds of the networks are not essential to the invention.
  • Various interfaces can be designed.
  • the three-way switching is a logical concept.
  • the system can involve multiple networks and/or storage networks, e.g. a four-way switching among an ATM, Ethernet and storage area network (SAN) interfaces.
  • the basic idea is a high layer or cross-(protocol) layered switching mechanism among heterogeneous (network) systems with embedded real-time protocol conversion to bypass the server(s) as much as possible.
  • a load balancing scheme can improve the overall system performance further.
  • “Bypass Board” an enhancement board designed to reduce average CPU load when installed into an existing host server. It achieves this in two ways: (1) Reduction in the processing of specific traffic, and (2) Reduction in I/O bus transactions.
  • TWIP Three-Way Internet Port
  • the TWIP Board (or simply referred to as TWIP) is a peripheral interface board that plugs into a PCI socket, replacing at least one existing SCSI disk controller and one NIC board.
  • the host's existing hard drive is then plugged into the TWIP along with a 1 Gbit switched Ethernet connection.
  • Host-side drivers that permit visibility of the SCSI disk and network connection from all existing applications will also accompany the TWIP board.
  • the traffic to be bypassed is assumed to be HTTP traffic.
  • Other types of traffic such as FTP, RTP and etc.
  • the FTP capability can be used, among other thing, to support an efficient backup of the storage device concurrent with, for example, production HTTP operation, thereby avoiding downtime dedicated to backup.
  • TWIP The design of TWIP is based on a data-driven multi-processor pipelined model as shown in FIG. 1.
  • a data-driven multi-processor pipelined model tasks are assigned to specific processors whose actions are triggered by the arrival of input data.
  • each processor puts the output data into the input queues of the processor for the next tasks to be performed on the output data.
  • the operations of the processors are asynchronously executed and the processing of the data form multistage pipelines.
  • the payload data are not copied or moved once they are loaded into the payload buffers until they are finally sent out.
  • Each processor, along with the input/output queues only operates on the labels associated with the payload data.
  • the labels could be the pointers or headers of the payload data. All inter-processor communications involve labels but not the payload.
  • this model is depicted in FIG. 2.
  • the payload data do not even go through any of the processors.
  • the traffic between the payload buffers and the host can also involve labels if the application program (which could be modified) on the host does not need to process the payload data.
  • FIG. 3 shows the software structure for the TWIP preferred embodiment. The functional relationship among the software modules is described below.
  • the Network Interface Card (NIC, 701 ) receives data from the network.
  • the NIC Device Driver (NIC DD, 702 ) fetches the data from the buffer on NIC ( 701 ).
  • the NIC DD ( 702 ) checks to see if the traffic is non-HTTP (the traffic not to be handled by TWIP). If so ( 702 ) redirects the traffic to the Host ( 718 ) through Ethernet DD ( 704 ) using DMA (DMA 1 ), else ( 702 ) directs the traffic to the TCP/IP processor ( 705 ) through packet descriptor module ( 703 ).
  • the TCP/IP processor ( 705 ) passes HTTP payload labels to TWIP HTTP engine ( 707 ) through a socket (TCB Socket, 706 ).
  • the TWIP HTTP engine (HTTP 707 ) parses the HTTP payload and decides to use one of the two file subsystems, ( 709 or 710 ) and then issues file system requests to ( 709 ) or ( 710 ) through the buffer cache ( 708 ).
  • Both file subsystems (tFS, 709 ) and (xFS, 710 ) request data from the buffer cache module ( 711 ). If the data is not cached in ( 711 ) and the request comes from subsystem tFS ( 709 ), ( 711 ) will ask the TWIP block device driver (tBlock DD, 712 ) to fetch the data. If the data is not cached in the buffer cache ( 711 ) and the request comes from subsystem xFS ( 710 ), ( 711 ) will ask another TWIP block device driver (xBlock DD, 714 ) to fetch the data.
  • the tBlock DD ( 712 ) asks tSCSI DD ( 713 ) to fetch data from the SCSI disk ( 716 ) using DMA 5 and the xBlock DD ( 714 ) asks tVirtual DD ( 715 ) to fetch data from the virtual disk ( 719 ) from the host using DMA 2 .
  • Both the non-HTTP traffic host ( 718 ), which handles all non-HTTP traffic, and the virtual disk for the HTTP traffic ( 719 ), which handles all HTTP traffic that TWIP cannot handle, are on the server computer.
  • the non-HTTP host consists of TWIP Host DD ( 722 ), Host TCP/IP engine ( 723 ) and Host non-HTTP application program ( 724 ).
  • TWIP Host DD 722
  • Host TCP/IP engine 723
  • Host non-HTTP application program 724 .
  • the Virtual Disk ( 719 ) simulates a virtual disk to TWIP and it handles the HTTP traffic that cannot be handled by TWIP.
  • the virtual disk ( 719 ) consists of TWIP Host DD II ( 725 ), t-protocol engine ( 726 ), and the Host HTTP application program, assumed to be Apache, ( 727 ).
  • the Virtual Disk ( 719 ) serves as a disk to TWIP for dynamic web content.
  • the HTTP application program ( 727 ) is used to generate the dynamic web content data. Once the data is generated, t-protocol ( 726 ) will create a virtual disk environment for xFS ( 710 ) so that ( 710 ) may load the dynamic web content data for different requests as files from the virtual disk ( 719 ).
  • NIC Network Interface Card is a piece of hardware that is used by the server to communicate to the network.
  • NIC DD Network Interface Card Device Driver is a piece of software that knows how to interact with NIC ( 701 ) to obtain data from the network and to send data to the network.
  • Packet Descriptor ( 703 ) Packet Descriptor Module is used to provide data structures and interfaces for TCP/IP ( 705 ) and NIC DD ( 702 ) for them to communicate.
  • Ethernet DD ( 704 )—Ethernet Device Driver is used to forward Ethernet packets to the host when the Ethernet packets are used by non-HTTP application.
  • TCP/IP ( 705 )—On board TCP/IP is used to handle all HTTP related TCP/IP traffics.
  • TCBSocket is a communication gateway between TCP/IP ( 705 ) and tHTTP ( 707 ).
  • THTTP ( 707 )—THTTP is an on board HTTP protocol. THTTP consists of simple HTTP functions in order to process simple HTTP requests.
  • FSRequest ( 708 )—FSRequest modules provides data structures and interfaces for tHTTP to communicate with the file system module ( 709 and 710 ).
  • TFS ( 709 )—TFS is a file system that understands the file system format that was used to partition the SCSI disk ( 716 ). (e.g. EXT 2 , NTFS)
  • XFS ( 710 )—XFS is a file system that understands the virtual file system format on the Virtual Disk ( 719 ).
  • Buffer Cache ( 711 )—Buffer Cache helps lowering down the disk access by provide caching algorithm.
  • TBlock DD ( 712 )—TBlock Device Driver is one part of the block device driver used to encapsulate the underlying SCSI device drivers ( 713 ).
  • TSCSI DD ( 713 )—TSCSI Device Driver will retrieve data from the SCSI Disk ( 716 ) and present to TBlock Device Driver ( 712 ) in the format of “Block” defined by block device driver.
  • XBlock DD ( 714 )—XBlock Device Driver is one part of the block device driver used to encapsulate the underlying Virtual disk device drivers ( 715 )
  • TVirtual DD ( 715 )—TVirtual Device Driver will retrieve data from the Virtual Disk ( 719 ) and present to XBlock Device Driver ( 714 ) in the format of “Block” defined by block device driver.
  • SCSI Disk ( 716 )—SCSI Disk contains data for the web server.
  • TSCSI DD to Host ( 717 )—TSCSI Device Driver to Host is used to provide a tunnel for the OS SCSI DD ( 720 ) to access tSCSI DD ( 713 ) to retrieve data from SCSI Disk ( 716 )
  • Host for non-HTTP traffic ( 718 )—This is an abstraction on the host that consists of TWIP Host DD I ( 722 ), TCP/IP ( 723 ) and non-HTTP Application ( 724 ). This abstraction represents the processing of non-HTTP traffics from the network. (NOTE: any non-HTTP traffics including the ones that does not use TCP/IP)
  • the middle layer protocol ( 723 ) may change depending on the application, but the idea should be similar.
  • Virtual Disk ( 719 )—Virtual Disk is an abstraction that consists of TWIP Host DD II ( 725 ), t-protocol ( 726 ) and HTTP Application ( 727 ). This abstraction provides TWIP a “virtual disk” so that for all HTTP traffic, including the ones that TWIP can handle or not, will be treated as if they can be handled. Each request that is forwarded to the host is a “file” in “virtual disk”. The content of the file is created on the fly and will be presented to TWIP as if the file is a static file.
  • OS SCSI DD OS SCSI Device Driver is used to provide the interfaces to the server FS ( 721 ) for SCSI disk access for the server computer.
  • FS ( 721 )—This File System is on the server computer and is defined by the OS that the server is running with.
  • TWIP Host DD I ( 722 )—TWIP Host Device Driver I is used to control the data transfer using DMA between TWIP NIC Device Driver ( 702 ) and Server TCP/IP ( 723 ).
  • TCP/IP ( 723 )—The host TCP/IP is used only for non-HTTP Applications.
  • Non-HTTP Application ( 724 )—Any application protocols that is not Hypertext Transfer Protocol. (e.g. FTP, Telnet)
  • TWIP Host DD II ( 725 )—TWIP Host Device Driver II is used to control the data transfer using DMA between tVirtual DD ( 715 ) and t-protocol ( 726 ).
  • t-Protocol ( 726 )—t-Protocol is used to intercept the data from HTTP applications ( 727 ) to TCP/IP so that these data can be send out to the network using TWIP TCP/IP ( 705 ).
  • HTTP Application ( 727 )—Any application protocols that is Hypertext Transfer Protocol. (e.g. Apache)
  • the NIC Device Driver processor ( 802 ) constantly grabs packets from the Network Interface Card's receiving buffer ( 801 ).
  • the NIC DD processor ( 802 ) first makes sure that the packet is a fragmented IP packet. If so, ( 802 ) will put the packet in the IP receiving queue ( 806 ) that will be handled by the TCP/IP processor ( 807 ) on TWIP. If the packet is not fragmented, the NIC DD processor ( 802 ) will determine if the packet is an HTTP packet. If so, ( 802 ) puts it in the IP receiving queue ( 806 ), else ( 802 ) puts the packet in the queue ( 803 ) that will be transferred to the server through Ethernet Device Driver ( 840 ).
  • TWIP TCP/IP processor ( 807 ) constantly grabs packets from ( 806 ) to process. After processing the packet (e.g. de-fragmentation), ( 807 ) can determine if the packet belongs to HTTP traffic.
  • ( 807 ) For non-HTTP traffic, ( 807 ) will forward the packet to the server by putting it in the IP Forward queue ( 808 ). The t-Eth Device Driver ( 840 ) then combines the packets in IP Forward queue ( 808 ) and the packets in the queue ( 803 ) and then put the packets in Queue X ( 841 ). For HTTP traffic, ( 807 ) will hand the HTTP payload portion of the packet to tHTTP ( 813 ) through TCB receiving queue ( 811 ). As we can see, the traffic has been divided into two paths: non-HTTP and HTTP traffic.
  • the Ethernet Device Driver module DMA the data over to the receiving ring ( 851 ) on the server.
  • the TCP/IP ( 853 ) on the server will process the packet in the receiving ring ( 851 ) and present the application layer data to the non-HTTP applications ( 854 ). Normally these non-HTTP applications ( 854 ) will issue file system calls. If so, the file system processor ( 856 ) will communicate with the OS SCSI device driver processor ( 866 ) on the server to obtain data from the SCSI Disk ( 831 ).
  • the OS SCSI device driver ( 866 ) must communicate with TWIP's device driver (tFS DD, 829 ) to obtain data from the SCSI Disk ( 831 ). To do so, ( 866 ) forwards the requests issued by the server file system ( 856 ) to the on board queue ( 835 ) using DMA ( 834 ). TFS device driver ( 829 ) will read requests from the queue ( 835 ) and access the SCSI Disk ( 831 ) to retrieve data from the disk. tFS device driver processor ( 830 ) will then take the data from the disk and put it in the queue ( 832 ) which will be DMA ( 833 ) over to the server queue ( 867 ).
  • the server SCSI device driver ( 865 ) already anticipates for the data to come back in queue ( 867 ). Once the data comes back, ( 865 ) wakes up the processors that are waiting for this piece of data, which is the file system processor ( 857 ). Finally, ( 854 ) will obtain data from the file system processor ( 857 ) and then send it to the network using the server TCP/IP protocol ( 852 ). The server TCP/IP protocol ( 852 ) puts the data in the transmission ring ( 850 ). The data in the ring ( 850 ) will then be forwarded by DMA over to TWIP in Queue Y ( 844 ). Once the data is ready in the queue ( 844 ), the NIC Device Driver processor ( 804 ) will take the data in the queue ( 844 ) and put it into the NIC transmission Queue ( 805 ), which is then sent out to the network.
  • the other path is the HTTP path.
  • the tHTTP processor ( 813 ) will grab the HTTP payload that was put in TCB transmission queue by TWIP TCP/IP processor ( 807 ).
  • the tHTTP processor ( 813 ) will process this payload and determine if HTTP request data can be found on SCSI Disk ( 831 ) or Virtual Disk. If on SCSI disk, tHTTP processor ( 813 ) will use tFS ( 821 ) otherwise it will use xFS ( 819 ).
  • xFS ( 819 ) is a file system processor that will understand the format of a Virtual Disk, which is an abstraction for handling dynamic content requests. This abstraction provides tHTTP processor ( 813 ) an effect as if tHTTP always deals with static content requests.
  • tHTTP To obtain data from SCSI Disk ( 831 ), tHTTP ( 813 ) must issue file system requests to file system request queue ( 816 ) because tFS processor ( 821 ) will continue to look for requests to process from the queue ( 816 ). Once ( 821 ) processes the file system request, it will try to access the disk through buffer cache. Buffer cache gives tFS ( 822 ) the buffer handler to the area that the requested data will be positioned in the memory when comes from the disk. If the requested data is not in the buffer cache, then buffer cache will queue up in ( 825 ) where the request will be processed by tFS device driver processor ( 829 ).
  • the tFS device driver ( 830 ) When the data comes back from the SCSI Disk ( 831 ), the tFS device driver ( 830 ) will put the data in the location ( 826 ) that was associated with the buffer handler. Finally, tFS ( 822 ) will notify, through the queue ( 818 ), tHTTP processor ( 814 ) that the data is ready.
  • tHTTP ( 813 ) needs to obtain data from Virtual Disk, it must also go through buffer cache ( 823 ) as described in the case of tFS ( 821 ) to communicate with the device driver (xFS DD, 827 ).
  • the xFS device driver ( 827 ) will look for request in the queue ( 823 ).
  • xFS device driver ( 828 ) retrieves data from Virtual Disk, it puts the data in the location ( 824 ) that is associated with buffer handler.
  • xFS ( 820 ) will notify, through the queue ( 817 ), tHTTP processor ( 814 ) that the data is ready.
  • THTTP ( 814 ) will put the data coming from both ( 818 ) and ( 817 ) on the TCB transmission queue ( 812 ), which will be taken by TWIP TCP/IP ( 809 ) and processed into a packet. ( 809 ) will put the packet in the IP transmission queue which is then transferred to the network through NIC device driver ( 804 ) and NIC transmission queue ( 805 ).
  • xFS device driver ( 827 ) issues request through the queue ( 862 ) to T-Protocol ( 860 ).
  • T-Protocol processor ( 861 ) provides data structures that make xFS ( 820 ) behave as if it is interacting with a disk.
  • the server HTTP application ( 864 ) will process the request from queue ( 862 ) and create HTTP payload that is presented as the data of a static file from Virtual Disk. The data is then put on the queue ( 863 ).
  • the detail of how HTTP Application ( 864 ) uses the file system ( 856 ) parallels the prior discussion of how the non-HTTP Application accesses the file system ( 856 ).
  • NIC DD ( 802 ), ( 804 )—This is a Network Interface Device Driver processor for receiving ( 802 ) and transmitting ( 804 ).
  • NIC Tx Queue ( 805 )/ Rx Queue ( 801 ) This is the NIC queue for transmitting ( 805 ) and receiving ( 801 ).
  • TCP/IP ( 809 ), ( 807 )—TWIP TCP/IP for transmitting ( 809 ) and receiving ( 807 )
  • IP Fw Queue ( 808 ) This is the forward queue for non-HTTP traffic that is determined by TCP/IP ( 807 )
  • THTTP ( 814 ), ( 813 )—TWTP HTTP processor for transmitting ( 814 ) and receiving ( 813 )
  • TFS ( 821 ), ( 822 )—tFS is the file system processors that understands the file system format on SCSI Disk ( 831 ), both transmitting ( 822 ) and receiving ( 821 ).
  • XFS ( 820 ), ( 819 )—xFS is the file system processors that understands the file system format on Virtual Disk, both transmitting ( 819 ) and receiving ( 820 ).
  • tFS DD 829
  • 830 tFS device driver processors, transmitting ( 829 ) and receiving ( 830 ), that is used to retrieve data from SCSI Disk.
  • xFS DD ( 828 ), ( 827 )—xFS device driver processors, transmitting ( 828 ) and receiving ( 827 ), that is used to retrieve data from Virtual Disk.
  • SCSI Disk ( 831 )—SCSI Disk that is formatted using the format that is supported by tFS. (e.g. EXT 2 , NTFS).
  • DMA X 834
  • DMA Y 833
  • 843 DMA processor uses the DMA channels to transfer data between TWIP and the server.
  • Queue X ( 841 )—This queue is used to queue up all non-HTTP requests from the network.
  • t-ETH DD t-Eth device driver processor grabs data from IP Fw queue ( 808 ) and ( 803 ) and put it into one queue ( 841 )
  • Queue Y ( 844 )—This queue is used to store all the data from the host that needs to be send out as Ethernet packets.
  • TCP/IP ( 853 ), ( 852 )—These are the transmitting ( 852 ) and receiving ( 853 ) host TCP/IP processors.
  • Non-HTTP Application ( 854 )—Any application protocols that is not Hypertext Transfer Protocol. (e.g. FTP, Telnet)
  • HTTP Application ( 864 )—Any application protocols that is Hypertext Transfer Protocol. (e.g. Apache)
  • SCSI device driver ( 865 ) and ( 866 ) These two SCSI device driver processors, receiving ( 866 ) and transmitting ( 865 ) are used to issue SCSI request to TWIP file system device driver in order to complete the request from the host file system ( 856 ).
  • a TWIP file system data consistency problem arises when TWIP issues a read to storage before or after the host initiates a write to the same file. This could lead to inconsistent data fetched by TWIP. Fundamentally this is caused by dual storage accesses without synchronization.
  • TWIP sends the filename to the host before TWIP Http engine issues a file read.
  • the host TWIP device driver generates a fake fileopen(Filename) to block any potential host write to the same file.
  • the host TWIP device driver sends a write_block_ack back signal back to TWIP. If on the other hand, the host fails to open the file for read, meaning that the host may be writing to the same file and TWIP read request should be held back, no write_block_ack back is to be issued, and the process should retry to open the file later.
  • TWIP receives the write_block_ack, TWIP starts reading the file. When TWIP finishes the read, it sends the signal write_block_clear to the host, and the host TWIP device driver then does a fileclose(Filename).
  • This method relies on the host OS to enforce file (storage) access synchronization. It works much the same way all applications run on the host—they have to register with the host OS before proceeding. The registration process, however, can be pipelined. Once a file read request is sent to the host, the TWIP file system does not have to wait for response. It can proceed to process the next connection. After the host acknowledges the request (registration), the TWIP file system will go back to read the file.
  • the host write request is intercepted by the TWIP host device driver.
  • the TWIP host device driver then generates a write request (w_req).
  • TWIP completes all outstanding read requests and sends back a write acknowledgement (w_ack) to the host and routes all future read requests to the host.
  • w_ack write acknowledgement
  • the TWIP host device driver releases the hold on the original write requests and proceeds to write (thick vertical line on host in FIG. 5).
  • the TWIP device driver detects this and sends write-release (w_rel) to TWIP.
  • TWIP receives w_rel it resumes the bypass function if it can handle the new incoming requests.
  • One disadvantage of this approach is that a write blocks any read from TWIP no matter if the write is targeting at a current read or not (global blocking).
  • One advantage of this approach is that it is transparent to clients and graceful transfers the traffic from TWIP to host. The global blocking may not be significant if host write does not happen often.
  • the bypass board creates a second data path to concurrently and asynchronously handle HTTP traffic, within a single TCP connection. This may cause the data arriving at the client within the same connection to be out of order.
  • TWIP inserts a fake request (or a trace command) to the host HTTP server.
  • TWIP then releases the response from bypass operation.
  • TWIP has the control over the pattern to catch, a hardware assisted parsing can be implemented to further speed the process.
  • Caching is useful to speed up data access for frequently accessed files.
  • TWIP will come with a buffer cache ( 711 ).
  • the cache effect is achieved by maintaining a usage table.
  • a file is loaded into the data memory, it is also logged in the usage table with a time stamp.
  • the time stamp is updated every time a file is used.
  • the table is searched to delete those files which have not been used for longest time first by comparing the time stamps.
  • FIG. 6 depicts the relationship among the buffer cache, the TWIP file system, and the TWIP file system device driver.
  • the buffer cache allocates buffer pages for blocks of data on the disk. Each page corresponds to a block on the disk. After the buffer cache allocates the memory, it associates this memory space with a buffer handler. This buffer handler serves as a key to the file system for accessing the data.
  • the buffer cache Given a block number and a device ID, the buffer cache locates the associated buffer handler. If the file system request a data that already exists in the buffer cache, the buffer cache will return a buffer handler that is associated with the existing buffer page without accessing the disk. If the data does not exist in the buffer cache, the buffer cache will allocate a free buffer page according to an optimized algorithm. The free buffer page is associated with the request block number and the device id using the buffer handler. A request to the file system device driver will be issued to retrieve data from the disk to the buffer page. Finally, the associated buffer handler will be passed to the file system.
  • Information about the file that needs to be retransmitted can consist of an ID label that uniquely identifies the file (e.g. in a Linux platform the inode ID would be a good candidate). Also. the offset within the file can be saved. This offset could be derived from the sequence number of the packet to be retransmitted.
  • Information about the packet should include those header fields that are supposed to be dynamic on a per-packet basis within a connection. For example, it is not mandatory to keep information about the IP addresses for the packet since this information does not change within the packets belonging to the same connection. Instead, this information can be retrieved from the connection structure when rebuilding the packet.
  • a packet consists of different parts. Because reconstruction of some parts may be easier than others, a hybrid approach where not the whole packet is removed from memory could be useful. In general, the preferred parts to be removed are those parts of the packet that occupy a large amount of memory and that at the same time are easy to reconstruct.
  • the proposed TCP retransmission scheme can be implemented adding an extra layer to the stack.
  • the actual code in this case is not inserted in the same TCP module but as an extra module.
  • This approach requires the definition of interfaces between [retransmission layer]-[TCP] and [retransmission layer]-[File System].
  • the protocol stack is depicted in FIG. 7.
  • the data consistency problem can also arise for the TCP retransmission scheme. If the data to be retransmitted is modified by the host while waiting to be retransmitted, inconsistent contents will result at the client site.
  • a simple solution is to make an image copy of the entire file on a swap file on the hard disk, when it is first open for transmission. In order to reduce overhead, only large files greater than a specific threshold will be copied. If the file is requested to be retransmitted in part, the image copy on the swap file is to be used, solving the inconsistency problem.

Abstract

A networked system is described in which the majority of data bypass the server(s). This design improves the end-to-end performance of network access by achieving higher throughput between the network and storage system, improving reliability of the system, yet retaining the security, flexibility, and services that a server-based system provides. The apparatus that provides this improvement consists of a network interface, server computer interface, and storage interface. It also has a switching element and a high-layer protocol decoding and control unit. Incoming traffic (either from the network or storage system) is decoded and compared against a routing table. If there is a matching entry, it will be routed, according to the information to the network, the storage interface, or sent to the server for further processing (default). The routing table entries are set up by the server based on the nature of the applications when an application or user request initially comes in. Subsequently, barring any changes or errors, there will be no data exchange between the server and the device (although, a control message may still flow between them). There may also be a speed matching function between the network and storage, load balancing function for servers, and flow control for priority and QoS purposes. Because the majority of data traffic will bypass the bus and the operating system (OS) of the server(s), the reliability and throughput can also be significantly improved. Therefore, for a given capacity of a server, much more data traffic can be handled. Certain improvements concerning one particular embodiment of the invention are also disclosed.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates to computer networks, client/server based computing, and data storage (or storage network). More particularly, this invention relates to network management, performance enhancement and reliability improvement for network data access through servers. [0002]
  • 2. Prior Art [0003]
  • The following definitions will be useful in discussing the prior art in this field, and how the present invention overcomes the limitations of the prior art: [0004]
  • “Server”: a computer system that controls data access and data flow. [0005]
  • “Server-oriented”: Refers to data that requires significant computation or processing, that usually is carried out by a server CPU. The examples are network user login processes going through authorization, authentication and accounting (AAA). [0006]
  • “Storage-oriented”: Simple storage access such as disk read and/or write is considered storage-oriented. Most operations are data fetching and transport without the involvement of CPU. OPEG and MPEG file transport are examples of storage-oriented data. [0007]
  • In the current server-based Internet infrastructure, for an end user to access data from a remote website, the following sequence of events will occur: First, the request packets from the user host have to travel to the remote network access point via the wide area network, through the network gateway at the remote web system, and then (after authorization) to a server in the web system. Second, the server sends a command to the storage device for the data, the requested data travels from the device back to the server, and traverses the reverse path back to the user host. In this end-to-end set-up, the server is situated between the data sources and the user and is often the limiting element of the entire data access operation. Such a configuration has caused servers to become a major bottleneck between the clients (or network end users) and their requested data. Both data and control traffic must pass through the servers: the request and control traffic must travel to the servers and then to the storage devices. The requested data must then return to the server before they are forwarded through the network to the clients. [0008]
  • Most network systems are constructed with this architecture, with server clustering and load-balanced server farms being the two most common variations. The main advantages of current systems are their flexibility and security, since they allow the servers to control all the traffic flows. However, this architecture also comes with a number of disadvantages: server system bus contention (in many cases, a PCI bus), server OS inefficiency (specifically including unreliable and costly interrupt handling), and multiple data copying. Each of these causes different problems. [0009]
  • Server system bus contention causes two problems for networks. Since each peripheral component must contend for the bus usage without any guarantee of bandwidth latency and time of usage, the user data throughput varies, and the latency for data transfer cannot be bounded. [0010]
  • The server OS inefficiency puts a heavy toil on the network throughput. In particular, an interrupt causes two context switching operations on a server. Context switching is an OS process in which the operating system suspends its current activity, saves the information required to resume the activity later and shifts to execute a new process. Once the new process is completed or suspended, a second context switching occurs during which the OS recovers its previous state and resumes processing. Each context switch represents an undesirable loss of effective CPU utilization for the task and network throughput. For example, a server handles thousands of requests and data switches at high speed. Further, heavy loading and extensive context-switching can cause a server to crash. A small loss of data can cause TCP to retransmit, and retransmissions will cause more interrupts which in turn may cause more OS crashes. The OS interrupt-induced stability problem is very acute in a web hosting system where millions of hits can be received within a short period of time. [0011]
  • Multiple data copying is a problem (also known as “double copy”) for normal server operations. According to the current architecture, data received from the storage (or network) have to be copied to the host memory before they are forwarded to the network (or storage). Depending on the design of the storage/network interface and the OS, data could be copied more than two times between their reception and departure at the server, despite the fact that the server CPU does not perform many meaningful functions other than verifying data integrity. Multiple data-copying problem represents a very wasteful usage of the CPU resources. When this is coupled with the OS inefficiency, it also represents a significant degradation of QoS (Quality of Service) for the data transfer. [0012]
  • The current solutions to the above-mentioned problems have involved two different approaches: improving the network performance and improving the storage performance. [0013]
  • From the storage approach, SAN (Storage Area Network) and NAS (Network Attached Storage) represent large current efforts. Another solution is to replace the server bus with a serial I/O architecture (the InfiniBand architecture, which is under development). [0014]
  • An NAS is a storage device with an added thin network layer so the storage can be connected to a network directly. It bypasses servers, so server bottlenecks may be non-existent for NAS systems. (We do not consider a storage-dedicated server as NAS.) The major disadvantages are the lack of the flexibility that servers have, (and the overhead associated with the network layer(s) (if it is too thick)). An NAS can be used in secured environments like an internal LAN or SAN. Authorization, account, and authentication (AAA) and firewall are unlikely to be performed by an NAS, since an overly complicated function may not be implemented due to the cost. Furthermore, it is not easy to upgrade software or protocols under the limited design of interfaces for NAS. [0015]
  • SAN is an architecture for storage systems with the advantages of flexibility and scalability. While NAS is limited due to its thin network interface, SAN defines an environment dedicated to storage without worrying about security or other heterogeneous design concerns. Essentially, storage devices in SAN can be viewed as a special kind of NAS, e.g. hard disks with Fibre Channel interfaces. Servers (which are more versatile) are still needed to connect the storage devices to the network. Therefore, the server bottleneck is still present. Furthermore, access control and other server functions are not specified in SAN systems, so other components must be added for full functionality. [0016]
  • From the network approach, two techniques have been devised: Web Switching and Intelligent Network Interface. Among the goals of web switching is load balancing servers in a web hosting system. While web switching has many platforms, the basic approach is to capture the IP packets and use the information they contain in the layers [0017] 4 through 7 to switch the traffic to the most suitable servers, thus keeping the servers with balanced load. This approach does not address the problems of multiple data copying and server system bus contention. The server OS inefficiency problem is only indirectly addressed.
  • In the Intelligent Network Interface approach, functionalities are added to the NIC (Network Interface Card) that reduce server interrupts by batch processing. This approach does not address the Server system bus contention problem directly, and as a result, the latency of data transfer is still unbounded and data transfer throughput is still not guaranteed. In addition, this approach only reduces switching overhead but does not address the multiple data-copying problem. [0018]
  • BRIEF SUMMARY OF THE INVENTION
  • Objects of the invention include the following: [0019]
  • 1. To increase the network and storage access performance and throughput. [0020]
  • 2. To reduce traffic delay and loss between network(s) and storage due to server congestion or to bound the latency for real-time streamings (QoS improvement). [0021]
  • 3. To increase server and network system, availability, reliability and reduce server system failures by reducing the traffic going through the server bus, OS and CPU. [0022]
  • 4. To maintain the flexibility of a server-based system (vs. a network attached storage or NAS). [0023]
  • 5. To be scalable and reduce the total system cost. [0024]
  • In sum, the invention aims to provide highest levels of server-based Reliability, Availability and Scalability (RAS) for a network system and highest levels of QoS for the end users. [0025]
  • These and other objects of the invention are achieved in the following solution strategies: [0026]
  • 1. Throughput improvement by the data-driven multi-processor pipelined model. [0027]
  • 2. File system consistency between the bypass board and the host. [0028]
  • 3. HTTP synchronization between the bypass board and the host. [0029]
  • 4. Caching on the bypass board. [0030]
  • 5. Storage-based TCP retransmission on the bypass board. [0031]
  • In a networked system, an apparatus is introduced that causes the majority of data to bypass the server(s). This design improves the end-to-end performance of network access by achieving higher throughput between the network and storage system, improving reliability of the system, yet retaining the security, flexibility, and services that a server-based system provides. The apparatus that provides this improvement logically consists of a network interface, server computer interface, and storage interface. It also has a switching element and a high-layer protocol decoding and control unit. Incoming traffic (either from the network or storage system) is decoded and compared against a routing table. If there is a matching entry, it will be routed, according to the information, to the network, the storage interface, or sent to the server for further processing (default). The routing table entries are set up by the server based on the nature of the applications when an application or user request initially comes in. Subsequently, barring any changes or errors, there will be no data exchange between the server and the device (although, a control message may still flow between them). There may also be a speed matching function between the network and storage, load balancing functions for servers, and flow control for priority and QoS purposes. Because the majority of data traffic will bypass the bus and the operating system (OS) of the server(s), the reliability and throughput can be significantly improved. Therefore, for a given capacity of a server, much more data traffic can be handled, thus making the system more scalable.[0032]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a top-level logical diagram for the data-driven multi-processor pipelined model. [0033]
  • FIG. 2 is a top-level hardware diagram for the data-driven multi-processor pipelined model. [0034]
  • FIG. 3 describes the software structure for the preferred embodiment for the data-driven multi-processor pipelined model. [0035]
  • FIG. 4 describes the data queues and processes in the preferred embodiment of the data-driven multi-processor pipelined model. [0036]
  • FIG. 5 describes the traffic detour to host for [0037] Method 2 for file system consistency between the bypass board and the host.
  • FIG. 6 describes the buffer cache relation with the file system and FS device driver. [0038]
  • FIG. 7 is the diagram of the TCP retransmission stack. [0039]
  • FIG. 8 is top-level diagram for the relation between the device and server and storage. [0040]
  • FIG. 9 is general function blocks inside the device with three logical interfaces, namely network, server and storage. [0041]
  • FIG. 10 gives an example of major detailed functions performed to achieve claimed improvements. [0042]
  • FIGS. 11 and 12 are flow charts for data flow from network to storage or vice-versa. [0043]
  • FIG. 13 is a depiction of information decoded in various layers of protocols. [0044]
  • FIG. 14 shows an example of the Expanded Routing Table (ERT) with assumed contents. [0045]
  • FIG. 15 is an example of pipelining process to maximize the performance.[0046]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The preferred embodiment of the invention is illustrated in FIGS. [0047] 1-15, and described in the text that follows. Although the invention has been most specifically illustrated with particular preferred embodiments, it should be understood that the invention concerns the principles by which such embodiments may be constructed and operated, and is by no means limited to the specific configurations shown.
  • In one embodiment, a three-way network server bypass device has two main function blocks ([0048] 100 and 101) as shown in FIG. 8. Based-on decoded high layer protocol information, the control unit, CU (100) decides to switch the data to the server or to the storage through switching element (SE, 101). The device may be physically inside the server housing, but may also be supplied as an external unit.
  • The present invention improves performance and reliability of network data access with the flexibility of a server-based system. It avoids multiple data-copying in a server system, where all the traffic in one direction has to be copied at least twice (and to interrupt the server system at least twice) along the data path from the network to the storage or the other way around. The invention lets the majority of traffic bypass the server system bus, operating system (OS) and CPU or any other involvement with the server. It can also support quality of service (QoS) like prioritized traffic streams for real-time applications with video and audio, with bounded delay. Lastly, in a multiple-server system, it can provide load balancing and flow control combining with the CPU/bus/OS bypassing to optimize the overall system performance and improve fault-tolerance. [0049]
  • The above-mentioned improvements are achieved by decoding high-layer protocol(s) in real-time and using the information to direct the traffic flow between network interfaces, storage system (or SAN), and server(s), Depending on the nature of the application (in part or in whole), the traffic can be categorized as server-oriented, which will be sent to sender system, or storage-oriented (data retrieving), which will be transferred between the network and storage directly without the servers (CPU, OS and Bus) involvement. As Internet and web applications become more prevalent, the resulting ever increasing traffic will tend to be storage-oriented. The invention dynamically identifies such traffic as storage-oriented and allows such traffic to bypass server (bus, OS and CPU). [0050]
  • The example application presented describes a single packet or a stream of packets with a particular purpose (e.g. user request for a web page.) Therefore, such a request-reply pair session may consist of several sub-applications. For instance, a user-initiated request may have to go through log-in and authorization processes that should be handled by server(s). This is a server-oriented process. But after a request is authorized, the transfer of data from the storage to the user can bypass the server and be sent directly to the user through the network interface; it is storage-oriented. Furthermore, the log-in and authorization can be a different type of application from the main session. For example, a request may not be real-time time in nature, while the data transfer could be an isochronous video or audio stream like the case of “video-on-demand.”[0051]
  • Simplified examples of application categorizing include: [0052]
  • 1. Authorized real-time data transfer between a network interface and a storage interface. [0053]
  • 2. Authorized non-real-time data transfer between a network interface and a storage interface. [0054]
  • 3. Server-oriented traffic. For example, a new request to access a web page or user log-in from the network or storage system control between the server and storage system. [0055]
  • 4. All other traffic defaults to the server (e.g., local traffic between server and storage). [0056]
  • Traffic types (1) and (2) will be routed to respective network or storage interfaces (e.g. from storage to network or vice-versa.) while (3) and (4) will be sent to server(s). The decoding process is to look into necessary protocol (layers) and to categorize incoming traffic (from where and for what). Then, the decoded header information (IP address, port ID, sequence number, etc.) is used as an index to the routing table for a match. A matched entry means the direct connection between network and storage has been “authorized.”[0057]
  • Exemplary decoded header information is shown in FIG. 13. For example, the HTTP header is in the payload of TCP, which in turn is in the IP packet. The decoding process is to look into the HTTP headers for the nature of data (GET, POST, DELETE, etc, and maybe application payload length.) [0058]
  • The data content then is divided into segments of integral multiples of a fixed base, a process that we call “base-multiple segmentation” (BMS) technology. For example, a base of y bytes, say 2 Kbytes, is chosen, and all data streams or files are segmented into chunks of integral multiples of 2 Kbytes, like 2, 4, or 8 Kbytes (padding it for the last chunk if it is not an exact integral multiple of 2 Kbytes), with an upper limit of, say, 40 Kbytes (20 times y). The maximum size is chosen based-on the requirement of isochronous real-time traffic and the switching speed, such that it will still meet the tightest real-time needs while the switching element serves the largest segments. The advantages of BMS are that it is easier to pipeline multiple data streams or files yet still has the flexibility of variable segment size, which reduces overhead (in setup and headers) and improves performance of the device. The BMS technique described above can be used to advantage not only with the apparatus of the preferred embodiment, but in general data switching applications as well. [0059]
  • Once the nature of the traffic is determined, by consulting the Expanded Routing Table (ERT) (with more information than a regular routing table), as shown in FIG. 14, a proper switching path can be selected to forward the traffic with proper QoS measurement. For instance, higher priority traffic can be given more bandwidth and/or lower delay. The forwarded traffic to the network will then be processed with the proper protocol format conversion for transmission with all the necessary error checking and/or correction. [0060]
  • A synchronization scheme is employed to interlock the decoding and switching processes. Multiple incoming data streams are queued for decoding and parsing (e.g. at application layer with HTTP) to decided which path to forward the data. Synchronization is necessary between different phases of a request-reply session. For example, a reply to a request from a network user must be forwarded to the user after the authorization (or log-in) process. While the server is running the authorization process, the storage data fetching can be handled concurrently to speed up the process. By the time a request is granted, the data may be ready or getting ready for transmission; otherwise, if it is denied, the transmission is aborted. These concurrently pipelined processes are illustrated in FIG. 15. [0061]
  • The invention uses a high-layer or cross-layered (cross protocol layers) switching architecture, because the traffic pattern is significantly influenced by the upper layer applications while the transport unit or packet format is mostly determined by the low layer protocols. For instance, web applications determine the size and nature of the transfer (e.g. text-only, still pictures and/or video clips) in the headers of application layer. Low layer protocols decide the size(s) of the packets at various network or system segments and the way to handle them (e.g. fixed size packet vs. variable size, packet size, delay tolerance and flow control methods such as window-based flow control). By using upper layer information to help direct the low layer storage data transport, the benefits can be significant. For example, for streaming applications, data transport is streamed instead of switched packet-by-packet, thus achieving higher throughput). [0062]
  • In networking, end-to-end user experience depends on the network bandwidth (transport), server response time and storage access time. Among these factors, server congestion and the associated cost to handle the ever-growing network traffic are the major concerns and uncertainties for delivering QoS. By doing real-time high layer protocol decoding and parsing, and switching the majority of traffic to bypass the server with delay bound, the overall system performance and QoS can be improved greatly. [0063]
  • Functional Description of Main Components: [0064]
  • Switching Element: [0065]
  • The switching element provides a data path for three-way switching (although it can have more than three physical connections) to and from the network, storage and server function units (CPU), through their respective interfaces with bounded delay. The switching element may be a fully-connected crossbar, memory-based switching, shared medium or other switching construct. The switching element has the capability of switching data traffic between any two (or more) of the interconnected interfaces. It is controlled by the control unit (CU) through a routing table that is set by server and on-board control based-on user request information. [0066]
  • Decoding and Control Unit (CU): [0067]
  • Decoding: [0068]
  • Based-on the targeted protocol layer(s), the decoding block(s) will look into parts of the packet payload to parse higher layer header and/or content information to be used in making routing decisions in real-time. The information will be compared with a routing table entry for a potential match. The purpose of using higher protocol layer information is to direct and optimize the traffic flow (throughput, utilization, delay, losses, etc.) for performance and reliability improvement. In FIG. 10, only an HTTP/html application is given as an example. Other applications like ftp and RTSP/RTP can also be implemented. [0069]
  • Control: [0070]
  • Based on the decoded information and the routing table content, a control signal is sent to the switching element (SE). The SE will set up a circuitry moving the data or packet(s) to the proper outgoing interface(s) through the switching element. Data or packets can be moved either individually and/or in batch (streaming), depending on the relations among them. It also controls routing table update, format conversions (which format to use) and other housekeeping tasks. [0071]
  • Scheduler and Flow Control: [0072]
  • While multiple concurrent streams waiting to be routed, the scheduler decides the order of execution based-on the priority and QoS information in the routing table. Some flow control mechanisms can also be exercised for the network interface and/or storage interface for further improvement of performance. [0073]
  • Router: [0074]
  • The router keeps a routing table, switching status and history and certain statistics, and controls the path traversed by the packets. The content in the routing table is provided by the server, based on storage controller (or SAN interface), and/or decoded packet information. [0075]
  • The switching and routing elements may be of a predetermined latency, and the routing table may include routing information (which port to route), the priority, delay, sensitivity and nature of the applications, and other contents for QoS measurement. [0076]
  • Buffering, Format Conversion and Medium Interfaces: [0077]
  • Buffering: [0078]
  • Basically, there are two kinds of buffers in the device. One is to buffer two asynchronized parts between the network, storage and server interfaces. The other serves as a waiting space for decoding higher layer protocols. In other words, the latter is to synchronize the decoding process and the switching process. The decoding time is pre-determined by design, so that the buffer size requirement can be calculated. A common pool of memory may be shared to save memory. This requires buffer management to dynamically allocate the memory for all pending threads/sessions. [0079]
  • Format Conversions: [0080]
  • There are several formats with respect to different interfaces and layers of protocols. These decodings and conversions have to be done in the device and involve multiple protocol layers. Examples of decoding and format conversions are HTTP, RTSP, ftp, IP/TCP/UDP, Ethernet, SCSI, Fibre Channel, and/or PCI interfaces. [0081]
  • Medium Interfaces: [0082]
  • In this description, there are three types of logical medium interfaces: the network, storage and server(s). In actual implementation, various physical interfaces are possible, e.g., multiple network interfaces or storage interfaces or multiple servers. Buffers are used to synchronize transmission between interfaces. An example of implementation may be Ethernet, ATM or SONET for network interface, SCSI, Fibre Channel, PCI, InfiniBand, or other system I/O technology. [0083]
  • There may also be a speed matching function between the network and storage, load balancing functions for servers, and flow control for priority and QoS purposes. Such speed matching function may be effectuated through buffering. Such load balancing may be executed between or among any homogeneous interfaces in the device, and is effected based on message exchange comprising feedback information from the targeted device or other means well known in the art. [0084]
  • Description/Example: [0085]
  • FIG. 10 describes an implementation with Ethernet interface ([0086] 310) for networking, PCI (340) for server and SCSI (350) for storage.
  • Storage to Network Traffic Bypass: [0087]
  • An incoming user/client request is received from Ethernet interface ([0088] 310) and decoded at different layers from the Ethernet format (311) and IP/TCP (312) format. Then HTTP header is parsed against the Expanded Routing Table residing in the Router (313, 314 and 315). If a match is found, the subsequent data (until the end of the HTTP payload; perhaps an html file) will be forwarded per the router; otherwise, the HTTP payload will be sent to the server for further processing (the default route). A routing table match indicates an established (authorized) connection. For example, if the data is sent to storage, it may be an authorized WRITE to the storage. The data routed to the server can either be an initial request for access or server-oriented traffic. The server may process the request with a log-in (if applicable) using an authentication, authorization, and accounting (AAA) process. The software on the server will communicate with the device for all necessary setup (e.g. routing table and file system for the storage) through the Router Control (316) and Scheduler (in 315) and then pass the control to the device and notify the storage to start a response to that request with a given file ID (or name) for the file system through the control path. The file system then can issue commands to SCSI Interface (350) to fetch the data. When the response data in html format comes back from storage, it will be correlated to an established connection in the ERT (315) for proper path (314). Then an HTTP header will be added (322). TCP/IP protocol conversion is carried out on the device (321 and 320). Finally. the data will be packed in Ethernet packets and sent out through the Ethernet Interface (310). The transfer from the storage to the network through the device for this connection will continue until it is completed or the device is notified by the server or storage to stop sending under certain events (e.g. error or user jumping to another web page). A pool of memory is used to dynamically control the traffic and buffer asynchronous flows. Control Unit (300) coordinates all the activities. FIG. 12 shows the flow chart of the data flow.
  • Higher layer traffic information (e.g. HTTP or even html) is used to optimize the performance. For instance, a single initial web access request from the network is forwarded to the server. Once the server decides the access is legitimate, it sets up both the CU and storage control (or through the CU). Subsequent traffic (responses) will bypass the server and be directly forwarded to the network interface for further transfer. But a new request from a user will be directed to server for processing. This may include the case of accessing a new web page or area or from different applications (windows). Also, based on the nature of traffic (html vs. real-time video clip for example), differentiated services can be provided. Further, streaming based on the content can improve even non-real-time applications. [0089]
  • The default traffic path is through the server(s). For cases like initial user login, storage access error, or interrupted web page access, the server(s) would take over the control. Signaling is used to communicate between the server and the device. For the majority of data transfer, however, the server(s) is not in the data path so bus contention, OS involvement (interrupt) and CPU loading are significantly reduced. The traffic reduction through the server is very significant while the flexibility of having server(s) handling unusual cases is maintained in the design, as contrasted with the NAS approach. [0090]
  • Network to Storage Traffic Bypass: [0091]
  • The device is bidirectional. To write to storage, once granted access, the server sets up the router ([0092] 315) mechanism and the subsequent incoming traffic from network for the same session will bypass the server through the decoding processes ((310, 311, 312, and 313). The decoded high layer information is parsed against the routing table (in 315). Proper connection to either server or storage can then be established by the Switching Element (303). If it is through the server, the data will go through the server bus and the OS for proper processing. Otherwise, a direct connection will be set up to route data (say, html files) to storage through the file system (to handle file format), drivers (to handle storage controller interface, e.g. SCSI) and storage controller. The traffic through server and through SE is synchronized by the Scheduler (31 b) and Memory Pool (301) before it is sent to SCSI Interface (350). This process is shown in FIG. 11.
  • In both of the traffic directions, the storage and the network interfaces will carry out the proper protocol and format conversions with necessary buffering as shown in FIGS. 11 and 12. [0093]
  • Other Features: [0094]
  • Because the decoding time and switching time can be pre-determined, the delay for a packet going through the device is bounded. Further, for the same reason, the potential loss of packets can be reduced. A priority mechanism can be implemented to support different QoS requirements in the Router and Scheduler ([0095] 315 and 300). In the case of multiple servers and/or storage and network devices, a load balancing and flow control mechanism can be applied based-on application tasks.
  • The server's role is supervisory, and is not involved in the byte-by-byte transfer. The CPU, operating system and server bus(es) are not in the normal path of the data transfer in either direction (from network to storage or from storage to network.) This invention represents a change from interrupt-based server and OS to switching-based networking architecture. [0096]
  • Performance improvements provided by the invention include: [0097]
  • 1. Higher throughput: a significant (or majority) portion of traffic will directly go through the switching device, so data throughput can be dramatically improved while the server bus and operating system (OS) are bypassed. [0098]
  • 2. Less delay: the server and bus contention and OS interrupt handling are out of the data path, through the switching element. [0099]
  • 3. Real-time applications: bounded latency guarantees real-time applications due to the switching nature of the design. [0100]
  • 4. Better reliability: less traffic going through server means less potential for server caused packet loss and malfunctions (server crashes). With added traffic control mechanism in the device, a shield can be implemented to protect server(s) from overloading and potential malfunctions. [0101]
  • 5. Flexibility and versatility: due to the architecture, the device is still very flexible by having server-oriented or computation intensive services immediately available to the applications, e.g. authorizing, security check, data mining, and data synchronization. [0102]
  • 6. Priority of services: higher layer(s) information can be server loading should further improve the QoS to high priority and regular traffic. [0103]
  • 7. Scalability: multiple devices can be used within a single server or a single device among multiple servers to support large-scale applications. [0104]
  • The type of server(s), operating system(s), network(s), storage system(s) or the speeds of the networks are not essential to the invention. Various interfaces can be designed. The three-way switching is a logical concept. In an actual implementation, the system can involve multiple networks and/or storage networks, e.g. a four-way switching among an ATM, Ethernet and storage area network (SAN) interfaces. The basic idea is a high layer or cross-(protocol) layered switching mechanism among heterogeneous (network) systems with embedded real-time protocol conversion to bypass the server(s) as much as possible. In addition, if multiple servers are involved, a load balancing scheme can improve the overall system performance further. [0105]
  • Certain Improvements [0106]
  • The following additional definitions will be useful in discussing certain improvements in connection with the invention discussed above: [0107]
  • “Bypass Board”: an enhancement board designed to reduce average CPU load when installed into an existing host server. It achieves this in two ways: (1) Reduction in the processing of specific traffic, and (2) Reduction in I/O bus transactions. [0108]
  • “TWIP (Three-Way Internet Port) Board”: The preferred embodiment of the Bypass Board. The TWIP Board (or simply referred to as TWIP) is a peripheral interface board that plugs into a PCI socket, replacing at least one existing SCSI disk controller and one NIC board. The host's existing hard drive is then plugged into the TWIP along with a 1 Gbit switched Ethernet connection. Host-side drivers that permit visibility of the SCSI disk and network connection from all existing applications will also accompany the TWIP board. [0109]
  • In this embodiment, the traffic to be bypassed is assumed to be HTTP traffic. However, other types of traffic (such as FTP, RTP and etc.) are possible. The FTP capability can be used, among other thing, to support an efficient backup of the storage device concurrent with, for example, production HTTP operation, thereby avoiding downtime dedicated to backup. [0110]
  • The drawings are described in detail below with respect to the five specific solutions strategies set forth above. [0111]
  • Throughput Improvement by the Data-Driven Multi-Processor Pipelined Model [0112]
  • The design of TWIP is based on a data-driven multi-processor pipelined model as shown in FIG. 1. In a data-driven multi-processor pipelined model, tasks are assigned to specific processors whose actions are triggered by the arrival of input data. Upon completion of the required processing, each processor puts the output data into the input queues of the processor for the next tasks to be performed on the output data. The operations of the processors are asynchronously executed and the processing of the data form multistage pipelines. The payload data are not copied or moved once they are loaded into the payload buffers until they are finally sent out. Each processor, along with the input/output queues, only operates on the labels associated with the payload data. The labels could be the pointers or headers of the payload data. All inter-processor communications involve labels but not the payload. [0113]
  • There are many advantages of this model. Among them is the savings in the label processing as compared to payload processing. For example, most Ethernet packets are of the size 1.5 KB at the most, the average header/pointer is 20 bytes; this would result in 75:1 ratio in bus and memory traffic, and a saving of 75 fold. [0114]
  • From the hardware perspective, this model is depicted in FIG. 2. As shown in FIG. 2, the payload data do not even go through any of the processors. The traffic between the payload buffers and the host can also involve labels if the application program (which could be modified) on the host does not need to process the payload data. [0115]
  • FIG. 3 shows the software structure for the TWIP preferred embodiment. The functional relationship among the software modules is described below. [0116]
  • The Network Interface Card (NIC, [0117] 701) receives data from the network. The NIC Device Driver (NIC DD, 702) fetches the data from the buffer on NIC (701). The NIC DD (702) checks to see if the traffic is non-HTTP (the traffic not to be handled by TWIP). If so (702) redirects the traffic to the Host (718) through Ethernet DD (704) using DMA (DMA1), else (702) directs the traffic to the TCP/IP processor (705) through packet descriptor module (703). The TCP/IP processor (705) passes HTTP payload labels to TWIP HTTP engine (707) through a socket (TCB Socket, 706). The TWIP HTTP engine (HTTP 707) parses the HTTP payload and decides to use one of the two file subsystems, (709 or 710) and then issues file system requests to (709) or (710) through the buffer cache (708).
  • Both file subsystems (tFS, [0118] 709) and (xFS, 710) request data from the buffer cache module (711). If the data is not cached in (711) and the request comes from subsystem tFS (709), (711) will ask the TWIP block device driver (tBlock DD, 712) to fetch the data. If the data is not cached in the buffer cache (711) and the request comes from subsystem xFS (710), (711) will ask another TWIP block device driver (xBlock DD, 714) to fetch the data.
  • The tBlock DD ([0119] 712) asks tSCSI DD (713) to fetch data from the SCSI disk (716) using DMA5 and the xBlock DD (714) asks tVirtual DD (715) to fetch data from the virtual disk (719) from the host using DMA2.
  • Both the non-HTTP traffic host ([0120] 718), which handles all non-HTTP traffic, and the virtual disk for the HTTP traffic (719), which handles all HTTP traffic that TWIP cannot handle, are on the server computer. The non-HTTP host consists of TWIP Host DD (722), Host TCP/IP engine (723) and Host non-HTTP application program (724). When a request is forwarded from the NIC DD (702), it is transferred to (722) by DMA1. The TWIP host DD I (722) communicates with a network protocol, assumed to be TCP/IP (723), to provide services for the non-HTTP application (724).
  • The Virtual Disk ([0121] 719) simulates a virtual disk to TWIP and it handles the HTTP traffic that cannot be handled by TWIP. The virtual disk (719) consists of TWIP Host DD II (725), t-protocol engine (726), and the Host HTTP application program, assumed to be Apache, (727). The Virtual Disk (719) serves as a disk to TWIP for dynamic web content. The HTTP application program (727) is used to generate the dynamic web content data. Once the data is generated, t-protocol (726) will create a virtual disk environment for xFS (710) so that (710) may load the dynamic web content data for different requests as files from the virtual disk (719).
  • When an application, including both non-HTTP and HTTP programs, needs to access the SCSI disk, it asks the Host File System ([0122] 721) and the Host SCSI DD (720) to complete the disk access request.
  • The following are the definitions of all of the items listed in FIG. 3 for clarification. [0123]
  • NIC ([0124] 701)—Network Interface Card is a piece of hardware that is used by the server to communicate to the network.
  • NIC DD ([0125] 702)—Network Interface Card Device Driver is a piece of software that knows how to interact with NIC (701) to obtain data from the network and to send data to the network.
  • Packet Descriptor ([0126] 703)—Packet Descriptor Module is used to provide data structures and interfaces for TCP/IP (705) and NIC DD (702) for them to communicate.
  • Ethernet DD ([0127] 704)—Ethernet Device Driver is used to forward Ethernet packets to the host when the Ethernet packets are used by non-HTTP application.
  • TCP/IP ([0128] 705)—On board TCP/IP is used to handle all HTTP related TCP/IP traffics.
  • TCBSocket ([0129] 706)—TCBSocket is a communication gateway between TCP/IP (705) and tHTTP (707).
  • THTTP ([0130] 707)—THTTP is an on board HTTP protocol. THTTP consists of simple HTTP functions in order to process simple HTTP requests.
  • FSRequest ([0131] 708)—FSRequest modules provides data structures and interfaces for tHTTP to communicate with the file system module (709 and 710).
  • TFS ([0132] 709)—TFS is a file system that understands the file system format that was used to partition the SCSI disk (716). (e.g. EXT2, NTFS)
  • XFS ([0133] 710)—XFS is a file system that understands the virtual file system format on the Virtual Disk (719).
  • Buffer Cache ([0134] 711)—Buffer Cache helps lowering down the disk access by provide caching algorithm.
  • TBlock DD ([0135] 712)—TBlock Device Driver is one part of the block device driver used to encapsulate the underlying SCSI device drivers (713).
  • TSCSI DD ([0136] 713)—TSCSI Device Driver will retrieve data from the SCSI Disk (716) and present to TBlock Device Driver (712) in the format of “Block” defined by block device driver.
  • XBlock DD ([0137] 714)—XBlock Device Driver is one part of the block device driver used to encapsulate the underlying Virtual disk device drivers (715)
  • TVirtual DD ([0138] 715)—TVirtual Device Driver will retrieve data from the Virtual Disk (719) and present to XBlock Device Driver (714) in the format of “Block” defined by block device driver.
  • SCSI Disk ([0139] 716)—SCSI Disk contains data for the web server.
  • TSCSI DD to Host ([0140] 717)—TSCSI Device Driver to Host is used to provide a tunnel for the OS SCSI DD (720) to access tSCSI DD (713) to retrieve data from SCSI Disk (716)
  • Host for non-HTTP traffic ([0141] 718)—This is an abstraction on the host that consists of TWIP Host DD I (722), TCP/IP (723) and non-HTTP Application (724). This abstraction represents the processing of non-HTTP traffics from the network. (NOTE: any non-HTTP traffics including the ones that does not use TCP/IP) The middle layer protocol (723) may change depending on the application, but the idea should be similar.
  • Virtual Disk ([0142] 719)—Virtual Disk is an abstraction that consists of TWIP Host DD II (725), t-protocol (726) and HTTP Application (727). This abstraction provides TWIP a “virtual disk” so that for all HTTP traffic, including the ones that TWIP can handle or not, will be treated as if they can be handled. Each request that is forwarded to the host is a “file” in “virtual disk”. The content of the file is created on the fly and will be presented to TWIP as if the file is a static file.
  • OS SCSI DD ([0143] 720)—OS SCSI Device Driver is used to provide the interfaces to the server FS (721) for SCSI disk access for the server computer.
  • FS ([0144] 721)—This File System is on the server computer and is defined by the OS that the server is running with.
  • TWIP Host DD I ([0145] 722)—TWIP Host Device Driver I is used to control the data transfer using DMA between TWIP NIC Device Driver (702) and Server TCP/IP (723).
  • TCP/IP ([0146] 723)—The host TCP/IP is used only for non-HTTP Applications.
  • Non-HTTP Application ([0147] 724)—Any application protocols that is not Hypertext Transfer Protocol. (e.g. FTP, Telnet)
  • TWIP Host DD II ([0148] 725)—TWIP Host Device Driver II is used to control the data transfer using DMA between tVirtual DD (715) and t-protocol (726).
  • t-Protocol ([0149] 726)—t-Protocol is used to intercept the data from HTTP applications (727) to TCP/IP so that these data can be send out to the network using TWIP TCP/IP (705).
  • HTTP Application ([0150] 727)—Any application protocols that is Hypertext Transfer Protocol. (e.g. Apache)
  • From the perspective of queues and processes, the TWIP operations are depicted in FIG. 4. A brief description of this queue-and-process architecture is provided below. [0151]
  • The NIC Device Driver processor ([0152] 802) constantly grabs packets from the Network Interface Card's receiving buffer (801). The NIC DD processor (802) first makes sure that the packet is a fragmented IP packet. If so, (802) will put the packet in the IP receiving queue (806) that will be handled by the TCP/IP processor (807) on TWIP. If the packet is not fragmented, the NIC DD processor (802) will determine if the packet is an HTTP packet. If so, (802) puts it in the IP receiving queue (806), else (802) puts the packet in the queue (803) that will be transferred to the server through Ethernet Device Driver (840).
  • TWIP TCP/IP processor ([0153] 807) constantly grabs packets from (806) to process. After processing the packet (e.g. de-fragmentation), (807) can determine if the packet belongs to HTTP traffic.
  • For non-HTTP traffic, ([0154] 807) will forward the packet to the server by putting it in the IP Forward queue (808). The t-Eth Device Driver (840) then combines the packets in IP Forward queue (808) and the packets in the queue (803) and then put the packets in Queue X (841). For HTTP traffic, (807) will hand the HTTP payload portion of the packet to tHTTP (813) through TCB receiving queue (811). As we can see, the traffic has been divided into two paths: non-HTTP and HTTP traffic.
  • For the non-HTTP path, The Ethernet Device Driver module DMA the data over to the receiving ring ([0155] 851) on the server. The TCP/IP (853) on the server will process the packet in the receiving ring (851) and present the application layer data to the non-HTTP applications (854). Normally these non-HTTP applications (854) will issue file system calls. If so, the file system processor (856) will communicate with the OS SCSI device driver processor (866) on the server to obtain data from the SCSI Disk (831).
  • The OS SCSI device driver ([0156] 866) must communicate with TWIP's device driver (tFS DD, 829) to obtain data from the SCSI Disk (831). To do so, (866) forwards the requests issued by the server file system (856) to the on board queue (835) using DMA (834). TFS device driver (829) will read requests from the queue (835) and access the SCSI Disk (831) to retrieve data from the disk. tFS device driver processor (830) will then take the data from the disk and put it in the queue (832) which will be DMA (833) over to the server queue (867). The server SCSI device driver (865) already anticipates for the data to come back in queue (867). Once the data comes back, (865) wakes up the processors that are waiting for this piece of data, which is the file system processor (857). Finally, (854) will obtain data from the file system processor (857) and then send it to the network using the server TCP/IP protocol (852). The server TCP/IP protocol (852) puts the data in the transmission ring (850). The data in the ring (850) will then be forwarded by DMA over to TWIP in Queue Y (844). Once the data is ready in the queue (844), the NIC Device Driver processor (804) will take the data in the queue (844) and put it into the NIC transmission Queue (805), which is then sent out to the network.
  • The other path is the HTTP path. For the HTTP path, the tHTTP processor ([0157] 813) will grab the HTTP payload that was put in TCB transmission queue by TWIP TCP/IP processor (807). The tHTTP processor (813) will process this payload and determine if HTTP request data can be found on SCSI Disk (831) or Virtual Disk. If on SCSI disk, tHTTP processor (813) will use tFS (821) otherwise it will use xFS (819). Once again, xFS (819) is a file system processor that will understand the format of a Virtual Disk, which is an abstraction for handling dynamic content requests. This abstraction provides tHTTP processor (813) an effect as if tHTTP always deals with static content requests.
  • To obtain data from SCSI Disk ([0158] 831), tHTTP (813) must issue file system requests to file system request queue (816) because tFS processor (821) will continue to look for requests to process from the queue (816). Once (821) processes the file system request, it will try to access the disk through buffer cache. Buffer cache gives tFS (822) the buffer handler to the area that the requested data will be positioned in the memory when comes from the disk. If the requested data is not in the buffer cache, then buffer cache will queue up in (825) where the request will be processed by tFS device driver processor (829). When the data comes back from the SCSI Disk (831), the tFS device driver (830) will put the data in the location (826) that was associated with the buffer handler. Finally, tFS (822) will notify, through the queue (818), tHTTP processor (814) that the data is ready.
  • If tHTTP ([0159] 813) needs to obtain data from Virtual Disk, it must also go through buffer cache (823) as described in the case of tFS (821) to communicate with the device driver (xFS DD, 827). The xFS device driver (827) will look for request in the queue (823). When xFS device driver (828) retrieves data from Virtual Disk, it puts the data in the location (824) that is associated with buffer handler. Finally xFS (820) will notify, through the queue (817), tHTTP processor (814) that the data is ready.
  • THTTP ([0160] 814) will put the data coming from both (818) and (817) on the TCB transmission queue (812), which will be taken by TWIP TCP/IP (809) and processed into a packet. (809) will put the packet in the IP transmission queue which is then transferred to the network through NIC device driver (804) and NIC transmission queue (805).
  • In the Virtual Disk, xFS device driver ([0161] 827) issues request through the queue (862) to T-Protocol (860). T-Protocol processor (861) provides data structures that make xFS (820) behave as if it is interacting with a disk. The server HTTP application (864) will process the request from queue (862) and create HTTP payload that is presented as the data of a static file from Virtual Disk. The data is then put on the queue (863). The detail of how HTTP Application (864) uses the file system (856) parallels the prior discussion of how the non-HTTP Application accesses the file system (856).
  • The following are the definitions of all of the items listed in FIG. 4 for clarification. [0162]
  • NIC DD ([0163] 802), (804)—This is a Network Interface Device Driver processor for receiving (802) and transmitting (804).
  • NIC Tx Queue ([0164] 805)/ Rx Queue (801)—This is the NIC queue for transmitting (805) and receiving (801).
  • ([0165] 803)—This is a queue that holds the non-HTTP traffic that is determined by NIC DD (802)
  • TCP/IP ([0166] 809), (807)—TWIP TCP/IP for transmitting (809) and receiving (807)
  • IP Tx Queue ([0167] 810)/Rx Queue (806)—This is the IP queue for transmitting (810) and receiving (801).
  • IP Fw Queue ([0168] 808)—This is the forward queue for non-HTTP traffic that is determined by TCP/IP (807)
  • TCB Tx Queue ([0169] 812)/Rx Queue (811)—This is the socket queue for transmitting (812) and receiving (811). This is the communication portal between tHTTP module and TCP/IP module.
  • THTTP ([0170] 814), (813)—TWTP HTTP processor for transmitting (814) and receiving (813)
  • Fs request queue ([0171] 816), (818)—file system request queues for tFS, both transmitting (816) and receiving (818).
  • Fs request queue ([0172] 817), (819)—file system request queues for xFS, both transmitting (817) and receiving (815)
  • TFS ([0173] 821), (822)—tFS is the file system processors that understands the file system format on SCSI Disk (831), both transmitting (822) and receiving (821).
  • XFS ([0174] 820), (819)—xFS is the file system processors that understands the file system format on Virtual Disk, both transmitting (819) and receiving (820).
  • ([0175] 825), (826)—Transmitting queue (825) and receiving queue (826) are used for the device driver request to retrieve data from SCSI Disk (831).
  • ([0176] 824), (823)—Transmitting queue (823) and receiving queue (824) are used for the device driver request to retrieve data from Virtual Disk.
  • tFS DD ([0177] 829), (830)—tFS device driver processors, transmitting (829) and receiving (830), that is used to retrieve data from SCSI Disk.
  • xFS DD ([0178] 828), (827)—xFS device driver processors, transmitting (828) and receiving (827), that is used to retrieve data from Virtual Disk.
  • SCSI Disk ([0179] 831)—SCSI Disk that is formatted using the format that is supported by tFS. (e.g. EXT2, NTFS).
  • ([0180] 835), (832)—Transmitting queue (832) and receiving queue (835) are used to store the file system request from the OS SCSI device driver processor (866).
  • DMA X ([0181] 834) (842), DMA Y (833) (843)—These four DMA processor uses the DMA channels to transfer data between TWIP and the server.
  • Queue X ([0182] 841)—This queue is used to queue up all non-HTTP requests from the network.
  • t-ETH DD—t-Eth device driver processor grabs data from IP Fw queue ([0183] 808) and (803) and put it into one queue (841)
  • Queue Y ([0184] 844)—This queue is used to store all the data from the host that needs to be send out as Ethernet packets.
  • Rx Ring ([0185] 851) and Tx Ring (850)—These are the queues that stores both the transmitting packet and receiving packet that are non-HTTP from the networks.
  • TCP/IP ([0186] 853), (852)—These are the transmitting (852) and receiving (853) host TCP/IP processors.
  • Non-HTTP Application ([0187] 854)—Any application protocols that is not Hypertext Transfer Protocol. (e.g. FTP, Telnet)
  • Other App ([0188] 855)—This is used as an example that there are other applications other than the HTTP application and non-HTTP application that uses file system (856). Someone who access through the terminal may start these applications.
  • FS ([0189] 856), (857)—These are the receiving (856) and transmitting (857) file system processors on the server.
  • T-Protocol ([0190] 861), (860)—These are the receiving (860) and transmitting (861) t-protocol processors that is used to communicate with the HTTP Application (864) to obtain return payload and then provides xFS (820) with an virtual file system.
  • ([0191] 863), (862)—These two queue are used for store transmitting (863) and receiving (862) data between host and TWIP.
  • HTTP Application ([0192] 864)—Any application protocols that is Hypertext Transfer Protocol. (e.g. Apache)
  • SCSI device driver ([0193] 865) and (866)—These two SCSI device driver processors, receiving (866) and transmitting (865) are used to issue SCSI request to TWIP file system device driver in order to complete the request from the host file system (856).
  • ([0194] 868), (867)—These two are the queues that transfer SCSI disk commands and SCSI disk data between host and TWIP.
  • File System Consistency Between the Bypass Board and the Host [0195]
  • A TWIP file system data consistency problem arises when TWIP issues a read to storage before or after the host initiates a write to the same file. This could lead to inconsistent data fetched by TWIP. Fundamentally this is caused by dual storage accesses without synchronization. [0196]
  • There are two solutions to the problem. To reduce overhead and unnecessary inter-locking, this method could be applied only when large files are being read. [0197]
  • In the first method, TWIP sends the filename to the host before TWIP Http engine issues a file read. The host TWIP device driver generates a fake fileopen(Filename) to block any potential host write to the same file. Then the host TWIP device driver sends a write_block_ack back signal back to TWIP. If on the other hand, the host fails to open the file for read, meaning that the host may be writing to the same file and TWIP read request should be held back, no write_block_ack back is to be issued, and the process should retry to open the file later. Once TWIP receives the write_block_ack, TWIP starts reading the file. When TWIP finishes the read, it sends the signal write_block_clear to the host, and the host TWIP device driver then does a fileclose(Filename). [0198]
  • This method relies on the host OS to enforce file (storage) access synchronization. It works much the same way all applications run on the host—they have to register with the host OS before proceeding. The registration process, however, can be pipelined. Once a file read request is sent to the host, the TWIP file system does not have to wait for response. It can proceed to process the next connection. After the host acknowledges the request (registration), the TWIP file system will go back to read the file. [0199]
  • In the second method, the host write request is intercepted by the TWIP host device driver. The TWIP host device driver then generates a write request (w_req). Then TWIP completes all outstanding read requests and sends back a write acknowledgement (w_ack) to the host and routes all future read requests to the host. Upon receiving the signal w_ack at the host, the TWIP host device driver releases the hold on the original write requests and proceeds to write (thick vertical line on host in FIG. 5). Once the host finishes all outstanding write operations, the TWIP device driver detects this and sends write-release (w_rel) to TWIP. When TWIP receives w_rel it resumes the bypass function if it can handle the new incoming requests. [0200]
  • One disadvantage of this approach is that a write blocks any read from TWIP no matter if the write is targeting at a current read or not (global blocking). One advantage of this approach is that it is transparent to clients and graceful transfers the traffic from TWIP to host. The global blocking may not be significant if host write does not happen often. [0201]
  • HTTP Synchronization Between the Bypass Board and the Host [0202]
  • The bypass board creates a second data path to concurrently and asynchronously handle HTTP traffic, within a single TCP connection. This may cause the data arriving at the client within the same connection to be out of order. [0203]
  • There exist many ways to solve the problem. The solution approach described here assumes that no modification of the host HTTP application program on the host is allowed. A key to the problem is to find the end of the response economically (in term of computation power). An obvious solution is to parse the responses to match HTTP request lengths against the HTTP size fields. This solution may work in some scenarios but it is likely to be very costly due the parsing process (for every byte of data). Further, in some cases the size field of HTTP file may not be available. [0204]
  • The following is a method to quickly signal the end of an HTTP response. [0205]
  • 1. A table keeping track of all outstanding requests within a connection is set up. [0206]
  • 2. If a request to the host HTTP server is followed by a request through TWIP (bypass) according to the table, TWIP inserts a fake request (or a trace command) to the host HTTP server. [0207]
  • 3. After the host HTTP server processes the first request, it responses to the second (fake) request without accessing the storage (e.g., a trace command) and produces a given pattern signal. [0208]
  • 4. The arrival of the given pattern signals the end of the response for the original request. [0209]
  • 5. TWIP then releases the response from bypass operation. [0210]
  • Since TWIP has the control over the pattern to catch, a hardware assisted parsing can be implemented to further speed the process. [0211]
  • Caching on the Bypass Board [0212]
  • Caching is useful to speed up data access for frequently accessed files. TWIP will come with a buffer cache ([0213] 711). The cache effect is achieved by maintaining a usage table. When a file is loaded into the data memory, it is also logged in the usage table with a time stamp. The time stamp is updated every time a file is used. When memory is nearly full, the table is searched to delete those files which have not been used for longest time first by comparing the time stamps.
  • FIG. 6 depicts the relationship among the buffer cache, the TWIP file system, and the TWIP file system device driver. [0214]
  • The buffer cache allocates buffer pages for blocks of data on the disk. Each page corresponds to a block on the disk. After the buffer cache allocates the memory, it associates this memory space with a buffer handler. This buffer handler serves as a key to the file system for accessing the data. [0215]
  • Given a block number and a device ID, the buffer cache locates the associated buffer handler. If the file system request a data that already exists in the buffer cache, the buffer cache will return a buffer handler that is associated with the existing buffer page without accessing the disk. If the data does not exist in the buffer cache, the buffer cache will allocate a free buffer page according to an optimized algorithm. The free buffer page is associated with the request block number and the device id using the buffer handler. A request to the file system device driver will be issued to retrieve data from the disk to the buffer page. Finally, the associated buffer handler will be passed to the file system. [0216]
  • Storage-Based TCP Retransmission on the Bypass Board [0217]
  • The following discussion concerns storage-based TCP retransmission, meaning that the retransmission uses storage as the buffer for the transmitted data yet to be acknowledged. This kind of retransmission scheme is especially useful for high-speed WAN situations where the requirement for retransmission buffer size is huge, causing in-memory buffering to be unpractical. [0218]
  • TCP Retransmission Layer [0219]
  • In general, there are three approaches to the retransmission problem as the following: [0220]
  • 1. The whole packet is kept in the memory. [0221]
  • 2. The whole packet is removed from the memory and instead enough information is stored in the memory to recover the packet. [0222]
  • 3. Only a part of the packet is in the memory and the rest is removed. Enough information is stored in the memory to recover the removed part of the packet. [0223]
  • Current TCP/IP stacks use approach (1). This patent proposes two new methodologies based on approaches (2) and (3). [0224]
  • Approach 2: Removing the Whole Packet [0225]
  • In general, in order to retransmit a packet two types of information are needed: (a) information about the file that the packet is transmitting and (b) information about the packet itself. This information can be saved on a per-packet basis within a connection. [0226]
  • Information about the file that needs to be retransmitted can consist of an ID label that uniquely identifies the file (e.g. in a Linux platform the inode ID would be a good candidate). Also. the offset within the file can be saved. This offset could be derived from the sequence number of the packet to be retransmitted. [0227]
  • Information about the packet should include those header fields that are supposed to be dynamic on a per-packet basis within a connection. For example, it is not mandatory to keep information about the IP addresses for the packet since this information does not change within the packets belonging to the same connection. Instead, this information can be retrieved from the connection structure when rebuilding the packet. [0228]
  • Approach 3: Removing Some Parts of the Packet [0229]
  • A packet consists of different parts. Because reconstruction of some parts may be easier than others, a hybrid approach where not the whole packet is removed from memory could be useful. In general, the preferred parts to be removed are those parts of the packet that occupy a large amount of memory and that at the same time are easy to reconstruct. [0230]
  • This approach intents to maximize the ratio [memory freed per packet]/[complexity of packet reconstruction]. As the headers occupy a small percentage of the packet (usually <10%) and require a bigger effort for their reconstruction, in this approach the headers are kept in memory. As the body of a file that is transmitted in the packets occupies a big percentage of the packets and requires relatively small effort for its reconstruction, the payload (the file body) is removed. Therefore, this approach keeps two types of data: [0231]
  • (a) Information about the file that needs to be recovered. This can consist of an ID label that uniquely identifies the file (e.g. in a Linux platform the inode ID would be a good candidate). Also, the offset within the file can be saved. This offset could be derived from the sequence number of the packet to be retransmitted. [0232]
  • (b) The headers of the packets. [0233]
  • In order to maintain the layering properties of the TCP/IP stack, the proposed TCP retransmission scheme can be implemented adding an extra layer to the stack. The actual code in this case is not inserted in the same TCP module but as an extra module. This approach requires the definition of interfaces between [retransmission layer]-[TCP] and [retransmission layer]-[File System]. The protocol stack is depicted in FIG. 7. [0234]
  • The data consistency problem can also arise for the TCP retransmission scheme. If the data to be retransmitted is modified by the host while waiting to be retransmitted, inconsistent contents will result at the client site. [0235]
  • A simple solution is to make an image copy of the entire file on a swap file on the hard disk, when it is first open for transmission. In order to reduce overhead, only large files greater than a specific threshold will be copied. If the file is requested to be retransmitted in part, the image copy on the swap file is to be used, solving the inconsistency problem. [0236]
  • It is apparent from the foregoing that the present invention achieves the specified objectives of higher levels of RAS and throughput between the network and storage system, while retaining the security, flexibility, and services normally associated with server-based systems, as well as the other objectives outlined herein. While the currently preferred embodiment of the invention has been described in detail, it will be apparent to those skilled in the art that the principles of the invention are readily adaptable to implementations, system configurations and protocols other than those mentioned herein without departing from the scope and spirit of the invention, as defined in the following claims. [0237]

Claims (14)

What is claimed is:
1. An apparatus for interconnecting at least one data network, at least one storage device, and at least one server, comprising:
a) a network interface;
b) a storage interface; and
c) a server interface;
wherein said apparatus can transfer data between any two of said at least one data network, said at least one storage device and said at least one server.
2. The apparatus as described in claim 1, wherein said data comprises audio or video real time streaming traffic.
3. The apparatus as described in claim 1, wherein said network interface is selected from the group essentially consisting of an Ethernet, ATM, or Sonet network or any other standard-based or proprietary network.
4. The apparatus as described in claim 1, wherein said storage interface is selected from the group essentially consisting of a single disk, a raid system or a Storage Area Network.
5. The apparatus as described in claim 1, wherein said server interface is a single server interface or a server cluster interface.
6. The apparatus as described in claim 1, wherein said server is accessed through interfaces including peripheral component interconnect (PCI) or InfiniBand or other so called system I/O.
7. The apparatus as described in claim 1, wherein said apparatus is located within the same physical housing as said server.
8. The apparatus as described in claim 1, wherein said apparatus is physically housed separately from said server.
9. The apparatus as described in claim 1, further comprising a switching element.
10. The apparatus as described in claim 9, wherein said switching element has predetermined latency.
11. The apparatus as described in claim 1, wherein said apparatus further comprises a routing element having a routing table.
12. The apparatus as described in claim 9, wherein said switching element which may be a fully-connected crossbar, memory-based switching shared medium, or other switching construct.
13. The apparatus as described in claim 11, wherein said routing table comprises information from the group essentially consisting of port to route mapping, priority, delay sensitivity, nature of the applications, and information for Quality of Service measurement.
14. A method of using a switch, employing base multiple segmentation (BMS), whereby data flow is subdivided into segments which are each an integral multiple of a fixed base segment size.
US10/172,853 2000-02-10 2002-06-13 System for bypassing a server to achieve higher throughput between data network and data storage system Abandoned US20020154645A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/172,853 US20020154645A1 (en) 2000-02-10 2002-06-13 System for bypassing a server to achieve higher throughput between data network and data storage system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/501,189 US6757291B1 (en) 2000-02-10 2000-02-10 System for bypassing a server to achieve higher throughput between data network and data storage system
US09/631,849 US6535518B1 (en) 2000-02-10 2000-08-03 System for bypassing a server to achieve higher throughput between data network and data storage system
US10/172,853 US20020154645A1 (en) 2000-02-10 2002-06-13 System for bypassing a server to achieve higher throughput between data network and data storage system

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US09/501,189 Continuation US6757291B1 (en) 2000-02-10 2000-02-10 System for bypassing a server to achieve higher throughput between data network and data storage system
US09/631,849 Continuation US6535518B1 (en) 2000-02-10 2000-08-03 System for bypassing a server to achieve higher throughput between data network and data storage system

Publications (1)

Publication Number Publication Date
US20020154645A1 true US20020154645A1 (en) 2002-10-24

Family

ID=27053743

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/501,189 Expired - Lifetime US6757291B1 (en) 2000-02-10 2000-02-10 System for bypassing a server to achieve higher throughput between data network and data storage system
US09/631,849 Expired - Lifetime US6535518B1 (en) 2000-02-10 2000-08-03 System for bypassing a server to achieve higher throughput between data network and data storage system
US10/172,853 Abandoned US20020154645A1 (en) 2000-02-10 2002-06-13 System for bypassing a server to achieve higher throughput between data network and data storage system

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US09/501,189 Expired - Lifetime US6757291B1 (en) 2000-02-10 2000-02-10 System for bypassing a server to achieve higher throughput between data network and data storage system
US09/631,849 Expired - Lifetime US6535518B1 (en) 2000-02-10 2000-08-03 System for bypassing a server to achieve higher throughput between data network and data storage system

Country Status (3)

Country Link
US (3) US6757291B1 (en)
AU (1) AU2001234432A1 (en)
WO (1) WO2001059967A1 (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030147349A1 (en) * 2002-02-01 2003-08-07 Burns Daniel J. Communications systems and methods utilizing a device that performs per-service queuing
US20040059866A1 (en) * 2001-06-25 2004-03-25 Kayuri Patel System and method for representing named data streams within an on-disk structure of a file system
US20040210584A1 (en) * 2003-02-28 2004-10-21 Peleg Nir Method and apparatus for increasing file server performance by offloading data path processing
US20040267831A1 (en) * 2003-04-24 2004-12-30 Wong Thomas K. Large file support for a network file server
US20050047440A1 (en) * 2003-08-25 2005-03-03 Jerome Plun Division of data structures for efficient simulation
US20050125503A1 (en) * 2003-09-15 2005-06-09 Anand Iyengar Enabling proxy services using referral mechanisms
US20050135352A1 (en) * 2003-12-18 2005-06-23 Roe Bryan Y. Efficient handling of HTTP traffic
US20050166239A1 (en) * 2003-10-09 2005-07-28 Olympus Corporation Surgery support system
US20050231849A1 (en) * 2004-04-15 2005-10-20 Viresh Rustagi Graphical user interface for hard disk drive management in a data storage system
US20050235128A1 (en) * 2004-04-15 2005-10-20 Viresh Rustagi Automatic expansion of hard disk drive capacity in a storage device
US20050235364A1 (en) * 2004-04-15 2005-10-20 Wilson Christopher S Authentication mechanism permitting access to data stored in a data processing device
US20050235063A1 (en) * 2004-04-15 2005-10-20 Wilson Christopher S Automatic discovery of a networked device
US20050235283A1 (en) * 2004-04-15 2005-10-20 Wilson Christopher S Automatic setup of parameters in networked devices
US6973666B1 (en) * 2001-02-28 2005-12-06 Unisys Corporation Method of moving video data thru a video-on-demand system which avoids paging by an operating system
US20060080371A1 (en) * 2004-04-23 2006-04-13 Wong Chi M Storage policy monitoring for a storage network
US20060245361A1 (en) * 2005-04-27 2006-11-02 Cisco Technology, Inc Load balancing technique implemented in a storage area network
US20060248252A1 (en) * 2005-04-27 2006-11-02 Kharwa Bhupesh D Automatic detection of data storage functionality within a docking station
US20060262784A1 (en) * 2005-05-19 2006-11-23 Cisco Technology, Inc. Technique for in order delivery of traffic across a storage area network
US20060271598A1 (en) * 2004-04-23 2006-11-30 Wong Thomas K Customizing a namespace in a decentralized storage environment
US20070024919A1 (en) * 2005-06-29 2007-02-01 Wong Chi M Parallel filesystem traversal for transparent mirroring of directories and files
US20070061077A1 (en) * 2005-09-09 2007-03-15 Sina Fateh Discrete inertial display navigation
US20070186037A1 (en) * 2003-12-30 2007-08-09 Wibu-Systems Ag Method for controlling a data processing device
US20070233893A1 (en) * 2000-03-22 2007-10-04 Yottayotta, Inc. Method and system for providing multimedia information on demand over wide area networks
US20070294199A1 (en) * 2001-01-03 2007-12-20 International Business Machines Corporation System and method for classifying text
US20080114854A1 (en) * 2003-04-24 2008-05-15 Neopath Networks, Inc. Transparent file migration using namespace replication
US20080137671A1 (en) * 2006-12-07 2008-06-12 Kaitki Agarwal Scalability of providing packet flow management
US20080183662A1 (en) * 2007-01-31 2008-07-31 Benjamin Clay Reed Resolving at least one file-path for a change-record of a computer file-system object in a computer file-system
US7649901B2 (en) * 2000-02-08 2010-01-19 Mips Technologies, Inc. Method and apparatus for optimizing selection of available contexts for packet processing in multi-stream packet processing
US7756810B2 (en) 2003-05-06 2010-07-13 International Business Machines Corporation Software tool for training and testing a knowledge base
US7765554B2 (en) 2000-02-08 2010-07-27 Mips Technologies, Inc. Context selection and activation mechanism for activating one of a group of inactive contexts in a processor core for servicing interrupts
US7849044B2 (en) * 2000-06-21 2010-12-07 International Business Machines Corporation System and method for automatic task prioritization
US20100332536A1 (en) * 2009-06-30 2010-12-30 Hewlett-Packard Development Company, L.P. Associating attribute information with a file system object
US7877481B2 (en) 2000-02-08 2011-01-25 Mips Technologies, Inc. Method and apparatus for overflowing data packets to a software-controlled memory when they do not fit into a hardware-controlled memory
US20110145449A1 (en) * 2009-12-11 2011-06-16 Merchant Arif A Differentiated Storage QoS
KR101105177B1 (en) 2004-02-19 2012-01-12 퀄컴 캠브리지 리미티드 Data container for user interface content data
US8131689B2 (en) 2005-09-30 2012-03-06 Panagiotis Tsirigotis Accumulating access frequency and file attributes for supporting policy based storage management
US8601211B2 (en) * 2006-12-06 2013-12-03 Fusion-Io, Inc. Storage system with front-end controller
US20150229569A1 (en) * 2014-02-11 2015-08-13 T-Mobile Usa, Inc. Network Aware Dynamic Content Delivery Tuning
US20160373494A1 (en) * 2014-03-06 2016-12-22 Huawei Technologies Co., Ltd. Data Processing Method in Stream Computing System, Control Node, and Stream Computing System
CN114302394A (en) * 2021-11-19 2022-04-08 深圳震有科技股份有限公司 Network direct memory access method and system under 5G UPF

Families Citing this family (128)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5978379A (en) 1997-01-23 1999-11-02 Gadzoox Networks, Inc. Fiber channel learning bridge, learning half bridge, and protocol
US7430171B2 (en) 1998-11-19 2008-09-30 Broadcom Corporation Fibre channel arbitrated loop bufferless switch circuitry to increase bandwidth without significant increase in cost
US6728779B1 (en) * 1999-12-01 2004-04-27 Lucent Technologies Inc. Method and apparatus for exchanging routing information in a packet-based data network
DE10009570A1 (en) * 2000-02-29 2001-08-30 Partec Ag Method for controlling the communication of individual computers in a computer network
DE10016236C2 (en) * 2000-03-31 2003-12-24 Infineon Technologies Ag Modular server
AU2001249821A1 (en) * 2000-04-07 2001-10-30 Broadcom Homenetworking, Inc. A transceiver method and signal therefor embodied in a carrier wave for a frame-based communications network
US7546337B1 (en) 2000-05-18 2009-06-09 Aol Llc, A Delaware Limited Liability Company Transferring files
DE60137564D1 (en) * 2000-06-14 2009-03-19 Sap Ag N OVER HTTP, PROCEDURE, COMPUTER PROGRAM PRODUCT AND SYSTEM
US7366779B1 (en) * 2000-06-19 2008-04-29 Aol Llc, A Delaware Limited Liability Company Direct file transfer between subscribers of a communications system
US9444785B2 (en) 2000-06-23 2016-09-13 Cloudshield Technologies, Inc. Transparent provisioning of network access to an application
US7032031B2 (en) * 2000-06-23 2006-04-18 Cloudshield Technologies, Inc. Edge adapter apparatus and method
US7003555B1 (en) 2000-06-23 2006-02-21 Cloudshield Technologies, Inc. Apparatus and method for domain name resolution
US8204082B2 (en) 2000-06-23 2012-06-19 Cloudshield Technologies, Inc. Transparent provisioning of services over a network
US6950871B1 (en) * 2000-06-29 2005-09-27 Hitachi, Ltd. Computer system having a storage area network and method of handling data in the computer system
US7200666B1 (en) * 2000-07-07 2007-04-03 International Business Machines Corporation Live connection enhancement for data source interface
US6850491B1 (en) * 2000-08-21 2005-02-01 Nortel Networks Limited Modeling link throughput in IP networks
JP4839554B2 (en) * 2000-10-19 2011-12-21 ソニー株式会社 Wireless communication system, client device, server device, and wireless communication method
US6725393B1 (en) * 2000-11-06 2004-04-20 Hewlett-Packard Development Company, L.P. System, machine, and method for maintenance of mirrored datasets through surrogate writes during storage-area network transients
FI20002437A (en) * 2000-11-07 2002-05-08 Nokia Corp Service flow control
US20020065907A1 (en) * 2000-11-29 2002-05-30 Cloonan Thomas J. Method and apparatus for dynamically modifying service level agreements in cable modem termination system equipment
US6907457B2 (en) * 2001-01-25 2005-06-14 Dell Inc. Architecture for access to embedded files using a SAN intermediate device
US7225242B2 (en) * 2001-01-26 2007-05-29 Dell Products L.P. System and method for matching storage device queue depth to server command queue depth
US7149817B2 (en) * 2001-02-15 2006-12-12 Neteffect, Inc. Infiniband TM work queue to TCP/IP translation
JP4483100B2 (en) * 2001-02-20 2010-06-16 株式会社日立製作所 Network connection device
US20020133539A1 (en) * 2001-03-14 2002-09-19 Imation Corp. Dynamic logical storage volumes
US7401126B2 (en) * 2001-03-23 2008-07-15 Neteffect, Inc. Transaction switch and network interface adapter incorporating same
US20020159437A1 (en) * 2001-04-27 2002-10-31 Foster Michael S. Method and system for network configuration discovery in a network manager
US6876656B2 (en) * 2001-06-15 2005-04-05 Broadcom Corporation Switch assisted frame aliasing for storage virtualization
US7343410B2 (en) * 2001-06-28 2008-03-11 Finisar Corporation Automated creation of application data paths in storage area networks
US7239636B2 (en) 2001-07-23 2007-07-03 Broadcom Corporation Multiple virtual channels for use in network devices
US9836424B2 (en) * 2001-08-24 2017-12-05 Intel Corporation General input/output architecture, protocol and related methods to implement flow control
US7231486B2 (en) * 2001-08-24 2007-06-12 Intel Corporation General input/output architecture, protocol and related methods to support legacy interrupts
US7194550B1 (en) * 2001-08-30 2007-03-20 Sanera Systems, Inc. Providing a single hop communication path between a storage device and a network switch
US20030046335A1 (en) * 2001-08-30 2003-03-06 International Business Machines Corporation Efficiently serving large objects in a distributed computing network
US7389332B1 (en) 2001-09-07 2008-06-17 Cisco Technology, Inc. Method and apparatus for supporting communications between nodes operating in a master-slave configuration
US7558264B1 (en) * 2001-09-28 2009-07-07 Emc Corporation Packet classification in a storage system
US20030070027A1 (en) * 2001-10-09 2003-04-10 Yiu-Keung Ng System for interconnecting peripheral host computer and data storage equipment having signal repeater means
US20030110154A1 (en) * 2001-12-07 2003-06-12 Ishihara Mark M. Multi-processor, content-based traffic management system and a content-based traffic management system for handling both HTTP and non-HTTP data
US7177943B1 (en) 2001-12-27 2007-02-13 Cisco Technology, Inc. System and method for processing packets in a multi-processor environment
US20030126283A1 (en) * 2001-12-31 2003-07-03 Ramkrishna Prakash Architectural basis for the bridging of SAN and LAN infrastructures
US7421478B1 (en) 2002-03-07 2008-09-02 Cisco Technology, Inc. Method and apparatus for exchanging heartbeat messages and configuration information between nodes operating in a master-slave configuration
US7295555B2 (en) 2002-03-08 2007-11-13 Broadcom Corporation System and method for identifying upper layer protocol message boundaries
US20040006636A1 (en) * 2002-04-19 2004-01-08 Oesterreicher Richard T. Optimized digital media delivery engine
US20040006635A1 (en) * 2002-04-19 2004-01-08 Oesterreicher Richard T. Hybrid streaming platform
US7899924B2 (en) * 2002-04-19 2011-03-01 Oesterreicher Richard T Flexible streaming hardware
US7415535B1 (en) * 2002-04-22 2008-08-19 Cisco Technology, Inc. Virtual MAC address system and method
US7188194B1 (en) * 2002-04-22 2007-03-06 Cisco Technology, Inc. Session-based target/LUN mapping for a storage area network and associated method
US6895461B1 (en) * 2002-04-22 2005-05-17 Cisco Technology, Inc. Method and apparatus for accessing remote storage using SCSI and an IP network
US7281062B1 (en) 2002-04-22 2007-10-09 Cisco Technology, Inc. Virtual SCSI bus for SCSI-based storage area network
US7433952B1 (en) 2002-04-22 2008-10-07 Cisco Technology, Inc. System and method for interconnecting a storage area network
US7200610B1 (en) 2002-04-22 2007-04-03 Cisco Technology, Inc. System and method for configuring fibre-channel devices
US7587465B1 (en) 2002-04-22 2009-09-08 Cisco Technology, Inc. Method and apparatus for configuring nodes as masters or slaves
US7165258B1 (en) * 2002-04-22 2007-01-16 Cisco Technology, Inc. SCSI-based storage area network having a SCSI router that routes traffic between SCSI and IP networks
US7385971B1 (en) 2002-05-09 2008-06-10 Cisco Technology, Inc. Latency reduction in network data transfer operations
US7509436B1 (en) 2002-05-09 2009-03-24 Cisco Technology, Inc. System and method for increased virtual driver throughput
US7240098B1 (en) 2002-05-09 2007-07-03 Cisco Technology, Inc. System, method, and software for a virtual host bus adapter in a storage-area network
US8028077B1 (en) * 2002-07-12 2011-09-27 Apple Inc. Managing distributed computers
US7398300B2 (en) * 2002-08-14 2008-07-08 Broadcom Corporation One shot RDMA having a 2-bit state
EP1540473B1 (en) * 2002-08-29 2012-02-29 Broadcom Corporation System and method for network interfacing in a multiple network environment
US8631162B2 (en) * 2002-08-30 2014-01-14 Broadcom Corporation System and method for network interfacing in a multiple network environment
US7411959B2 (en) 2002-08-30 2008-08-12 Broadcom Corporation System and method for handling out-of-order frames
US7934021B2 (en) 2002-08-29 2011-04-26 Broadcom Corporation System and method for network interfacing
US7346701B2 (en) 2002-08-30 2008-03-18 Broadcom Corporation System and method for TCP offload
US8180928B2 (en) 2002-08-30 2012-05-15 Broadcom Corporation Method and system for supporting read operations with CRC for iSCSI and iSCSI chimney
US7313623B2 (en) 2002-08-30 2007-12-25 Broadcom Corporation System and method for TCP/IP offload independent of bandwidth delay product
US20040078474A1 (en) * 2002-10-17 2004-04-22 Ramkumar Ramaswamy Systems and methods for scheduling user access requests
KR100449806B1 (en) * 2002-12-23 2004-09-22 한국전자통신연구원 A network-storage apparatus for high-speed streaming data transmission through network
US7831736B1 (en) 2003-02-27 2010-11-09 Cisco Technology, Inc. System and method for supporting VLANs in an iSCSI
US7295572B1 (en) 2003-03-26 2007-11-13 Cisco Technology, Inc. Storage router and method for routing IP datagrams between data path processors using a fibre channel switch
US7904599B1 (en) 2003-03-28 2011-03-08 Cisco Technology, Inc. Synchronization and auditing of zone configuration data in storage-area networks
US7433300B1 (en) 2003-03-28 2008-10-07 Cisco Technology, Inc. Synchronization of configuration data in storage-area networks
DE10314548B4 (en) * 2003-03-31 2007-10-18 OCé PRINTING SYSTEMS GMBH Method, computer and computer program modules for the transmission of data in a computer network
US7526527B1 (en) 2003-03-31 2009-04-28 Cisco Technology, Inc. Storage area network interconnect server
US7555515B1 (en) * 2003-04-21 2009-06-30 Microsoft Corporation Asynchronous pipeline
US20040221123A1 (en) * 2003-05-02 2004-11-04 Lam Wai Tung Virtual data switch and method of use
US20040257990A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation Interchassis switch controlled ingress transmission capacity
US7500055B1 (en) * 2003-06-27 2009-03-03 Beach Unlimited Llc Adaptable cache for dynamic digital media
US7912954B1 (en) * 2003-06-27 2011-03-22 Oesterreicher Richard T System and method for digital media server load balancing
US7451208B1 (en) 2003-06-28 2008-11-11 Cisco Technology, Inc. Systems and methods for network address failover
US7558850B2 (en) 2003-09-15 2009-07-07 International Business Machines Corporation Method for managing input/output (I/O) performance between host systems and storage volumes
US8060619B1 (en) 2003-11-07 2011-11-15 Symantec Operating Corporation Direct connections to a plurality of storage object replicas in a computer network
JP2005165852A (en) * 2003-12-04 2005-06-23 Hitachi Ltd Storage system, storage control device, and control method of storage system
JP2005217815A (en) * 2004-01-30 2005-08-11 Hitachi Ltd Path control method
JP2005293478A (en) * 2004-04-05 2005-10-20 Hitachi Ltd Storage control system, channel controller equipped with the same system and data transferring device
US7484016B2 (en) * 2004-06-30 2009-01-27 Intel Corporation Apparatus and method for high performance volatile disk drive memory access using an integrated DMA engine
KR100868820B1 (en) * 2004-07-23 2008-11-14 비치 언리미티드 엘엘씨 A method and system for communicating a data stream and a method of controlling a data storage level
CN1294728C (en) * 2004-08-05 2007-01-10 华为技术有限公司 Method and system for providing QoS assurance in edge router
JP2006127201A (en) * 2004-10-29 2006-05-18 Hitachi Ltd Storage system and conduction confirmation method
KR100807817B1 (en) * 2004-12-17 2008-02-27 엔에이치엔(주) Method for balancing load among subsystems in communication network system of bus network structure
WO2006065102A1 (en) * 2004-12-17 2006-06-22 Nhn Corporation Communication network system of bus network structure and method for transmitting and receiving data using the system
KR100807815B1 (en) * 2004-12-17 2008-02-27 엔에이치엔(주) Communication network system of bus network structure and method using the communication network system
US8458280B2 (en) * 2005-04-08 2013-06-04 Intel-Ne, Inc. Apparatus and method for packet transmission over a high speed network supporting remote direct memory access operations
US7552240B2 (en) * 2005-05-23 2009-06-23 International Business Machines Corporation Method for user space operations for direct I/O between an application instance and an I/O adapter
US20070005815A1 (en) * 2005-05-23 2007-01-04 Boyd William T System and method for processing block mode I/O operations using a linear block address translation protection table
US20060265525A1 (en) * 2005-05-23 2006-11-23 Boyd William T System and method for processor queue to linear block address translation using protection table control based on a protection domain
US7464189B2 (en) * 2005-05-23 2008-12-09 International Business Machines Corporation System and method for creation/deletion of linear block address table entries for direct I/O
US7502872B2 (en) * 2005-05-23 2009-03-10 International Bsuiness Machines Corporation Method for out of user space block mode I/O directly between an application instance and an I/O adapter
US7502871B2 (en) * 2005-05-23 2009-03-10 International Business Machines Corporation Method for query/modification of linear block address table entries for direct I/O
US7500071B2 (en) * 2005-08-31 2009-03-03 International Business Machines Corporation Method for out of user space I/O with server authentication
US7657662B2 (en) * 2005-08-31 2010-02-02 International Business Machines Corporation Processing user space operations directly between an application instance and an I/O adapter
US7577761B2 (en) * 2005-08-31 2009-08-18 International Business Machines Corporation Out of user space I/O directly between a host system and a physical adapter using file based linear block address translation
US20070168567A1 (en) * 2005-08-31 2007-07-19 Boyd William T System and method for file based I/O directly between an application instance and an I/O adapter
US7639715B1 (en) 2005-09-09 2009-12-29 Qlogic, Corporation Dedicated application interface for network systems
US8762507B1 (en) * 2005-12-23 2014-06-24 Hewlett-Packard Development Company, L.P. Method and system for managing an information technology system
US7889762B2 (en) 2006-01-19 2011-02-15 Intel-Ne, Inc. Apparatus and method for in-line insertion and removal of markers
US7782905B2 (en) * 2006-01-19 2010-08-24 Intel-Ne, Inc. Apparatus and method for stateless CRC calculation
US7849232B2 (en) * 2006-02-17 2010-12-07 Intel-Ne, Inc. Method and apparatus for using a single multi-function adapter with different operating systems
US20070208820A1 (en) * 2006-02-17 2007-09-06 Neteffect, Inc. Apparatus and method for out-of-order placement and in-order completion reporting of remote direct memory access operations
US8078743B2 (en) * 2006-02-17 2011-12-13 Intel-Ne, Inc. Pipelined processing of RDMA-type network transactions
US8316156B2 (en) 2006-02-17 2012-11-20 Intel-Ne, Inc. Method and apparatus for interfacing device drivers to single multi-function adapter
US8295275B2 (en) * 2006-03-20 2012-10-23 Intel Corporation Tagging network I/O transactions in a virtual machine run-time environment
US20080155050A1 (en) * 2006-12-23 2008-06-26 Simpletech, Inc. Direct file transfer host processor
US20080155051A1 (en) * 2006-12-23 2008-06-26 Simpletech, Inc. Direct file transfer system and method for a computer network
US20080155049A1 (en) * 2006-12-23 2008-06-26 Simpletech, Inc. Direct file transfer communication processor
US20080288498A1 (en) * 2007-05-14 2008-11-20 Hinshaw Foster D Network-attached storage devices
WO2008148181A1 (en) * 2007-06-05 2008-12-11 Steve Masson Methods and systems for delivery of media over a network
US8285719B1 (en) 2008-08-08 2012-10-09 The Research Foundation Of State University Of New York System and method for probabilistic relational clustering
US8214505B2 (en) * 2009-06-22 2012-07-03 Citrix Systems, Inc. Systems and methods of handling non-HTTP client or server push on HTTP Vserver
US9325625B2 (en) 2010-01-08 2016-04-26 Citrix Systems, Inc. Mobile broadband packet switched traffic optimization
US8560552B2 (en) * 2010-01-08 2013-10-15 Sycamore Networks, Inc. Method for lossless data reduction of redundant patterns
US8514697B2 (en) * 2010-01-08 2013-08-20 Sycamore Networks, Inc. Mobile broadband packet switched traffic optimization
US8468135B2 (en) * 2010-04-14 2013-06-18 International Business Machines Corporation Optimizing data transmission bandwidth consumption over a wide area network
US8504718B2 (en) * 2010-04-28 2013-08-06 Futurewei Technologies, Inc. System and method for a context layer switch
WO2013070800A1 (en) 2011-11-07 2013-05-16 Nexgen Storage, Inc. Primary data storage system with quality of service
US8930737B2 (en) 2011-12-13 2015-01-06 Omx Technology Ab Method and devices for controlling operations of a central processing unit
US9240023B1 (en) * 2013-01-30 2016-01-19 Amazon Technologies, Inc. Precomputing processes associated with requests
US20150254196A1 (en) * 2014-03-10 2015-09-10 Riverscale Ltd Software Enabled Network Storage Accelerator (SENSA) - network - disk DMA (NDDMA)
US11102114B2 (en) 2018-12-28 2021-08-24 Alibaba Group Holding Limited Method, apparatus, and computer-readable storage medium for network optimization for accessing cloud service from on-premises network

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4942520A (en) * 1987-07-31 1990-07-17 Prime Computer, Inc. Method and apparatus for indexing, accessing and updating a memory
US5706434A (en) * 1995-07-06 1998-01-06 Electric Classifieds, Inc. Integrated request-response system and method generating responses to request objects formatted according to various communication protocols
US5778367A (en) * 1995-12-14 1998-07-07 Network Engineering Software, Inc. Automated on-line information service and directory, particularly for the world wide web
US5974443A (en) * 1997-09-26 1999-10-26 Intervoice Limited Partnership Combined internet and data access system
US6006264A (en) * 1997-08-01 1999-12-21 Arrowpoint Communications, Inc. Method and system for directing a flow between a client and a server
US6115370A (en) * 1998-05-26 2000-09-05 Nera Wireless Broadband Access As Method and system for protocols for providing voice, data, and multimedia services in a wireless local loop system
US20020073218A1 (en) * 1998-12-23 2002-06-13 Bill J. Aspromonte Stream device management system for multimedia clients in a broadcast network architecture
US6421321B1 (en) * 1997-02-25 2002-07-16 Fujitsu Limited Apparatus and a method for transferring a packet flow in a communication network
US6452921B1 (en) * 1998-11-24 2002-09-17 International Business Machines Corporation Method and system within a computer network for maintaining source-route information at a router bypassed by shortcut communication
US6539518B1 (en) * 1999-09-10 2003-03-25 Integrated Memory Logic, Inc. Autodisk controller
US6640278B1 (en) * 1999-03-25 2003-10-28 Dell Products L.P. Method for configuration and management of storage resources in a storage network
US6687732B1 (en) * 1998-09-28 2004-02-03 Inktomi Corporation Adaptive traffic bypassing in an intercepting network driver

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4004277A (en) 1974-05-29 1977-01-18 Gavril Bruce D Switching system for non-symmetrical sharing of computer peripheral equipment
US4246637A (en) 1978-06-26 1981-01-20 International Business Machines Corporation Data processor input/output controller
US4503497A (en) 1982-05-27 1985-03-05 International Business Machines Corporation System for independent cache-to-cache transfer
US4682304A (en) 1983-08-04 1987-07-21 Tektronix, Inc. Asynchronous multiple buffered communications interface having an independent microprocessor for controlling host/peripheral exchanges
US4688166A (en) 1984-08-03 1987-08-18 Motorola Computer Systems, Inc. Direct memory access controller supporting multiple input/output controllers and memory units
US5131081A (en) 1989-03-23 1992-07-14 North American Philips Corp., Signetics Div. System having a host independent input/output processor for controlling data transfer between a memory and a plurality of i/o controllers
US5163131A (en) 1989-09-08 1992-11-10 Auspex Systems, Inc. Parallel i/o network file server architecture
US5404527A (en) * 1992-12-31 1995-04-04 Unisys Corporation System and method for remote program load
US5408465A (en) 1993-06-21 1995-04-18 Hewlett-Packard Company Flexible scheme for admission control of multimedia streams on integrated networks
US5737549A (en) * 1994-01-31 1998-04-07 Ecole Polytechnique Federale De Lausanne Method and apparatus for a parallel data storage and processing server
US5884028A (en) 1994-07-29 1999-03-16 International Business Machines Corporation System for the management of multiple time-critical data streams
US5742759A (en) * 1995-08-18 1998-04-21 Sun Microsystems, Inc. Method and system for facilitating access control to system resources in a distributed computer system
US5913028A (en) 1995-10-06 1999-06-15 Xpoint Technologies, Inc. Client/server data traffic delivery system and method
US6055577A (en) 1996-05-06 2000-04-25 Oracle Corporation System for granting bandwidth for real time processes and assigning bandwidth for non-real time processes while being forced to periodically re-arbitrate for new assigned bandwidth
US5715453A (en) * 1996-05-31 1998-02-03 International Business Machines Corporation Web server mechanism for processing function calls for dynamic data queries in a web page
US5867733A (en) 1996-06-04 1999-02-02 Micron Electronics, Inc. Mass data storage controller permitting data to be directly transferred between storage devices without transferring data to main memory and without transferring data over input-output bus
US5892913A (en) 1996-12-02 1999-04-06 International Business Machines Corporation System and method for datastreams employing shared loop architecture multimedia subsystem clusters
US6069895A (en) 1997-08-29 2000-05-30 Nortel Networks Corporation Distributed route server
US6097955A (en) 1997-09-12 2000-08-01 Lucent Technologies, Inc. Apparatus and method for optimizing CPU usage in processing paging messages within a cellular communications system
US6389479B1 (en) 1997-10-14 2002-05-14 Alacritech, Inc. Intelligent network interface device and system for accelerated communication
US6427173B1 (en) 1997-10-14 2002-07-30 Alacritech, Inc. Intelligent network interfaced device and system for accelerated communication
US6434620B1 (en) 1998-08-27 2002-08-13 Alacritech, Inc. TCP/IP offload network interface device
US6226680B1 (en) 1997-10-14 2001-05-01 Alacritech, Inc. Intelligent network interface system method for protocol processing
US5941969A (en) 1997-10-22 1999-08-24 Auspex Systems, Inc. Bridge for direct data storage device access
US6014692A (en) 1997-11-25 2000-01-11 International Business Machines Corporation Web browser file system attachment
US6081883A (en) 1997-12-05 2000-06-27 Auspex Systems, Incorporated Processing system with dynamically allocatable buffer memory
US5950203A (en) 1997-12-31 1999-09-07 Mercury Computer Systems, Inc. Method and apparatus for high-speed access to and sharing of storage devices on a networked digital data processing system
US6260040B1 (en) 1998-01-05 2001-07-10 International Business Machines Corporation Shared file system for digital content
AU3075899A (en) 1998-03-10 1999-09-27 Quad Research High speed fault tolerant mass storage network information server
US6195703B1 (en) 1998-06-24 2001-02-27 Emc Corporation Dynamic routing for performance partitioning in a data processing network
US6249294B1 (en) 1998-07-20 2001-06-19 Hewlett-Packard Company 3D graphics in a single logical sreen display using multiple computer systems
US6226684B1 (en) 1998-10-26 2001-05-01 Pointcast, Inc. Method and apparatus for reestablishing network connections in a multi-router network
US6269410B1 (en) 1999-02-12 2001-07-31 Hewlett-Packard Co Method and apparatus for using system traces to characterize workloads in a data storage system
US6430570B1 (en) 1999-03-01 2002-08-06 Hewlett-Packard Company Java application manager for embedded device
US6324581B1 (en) 1999-03-03 2001-11-27 Emc Corporation File server system using file system storage, data movers, and an exchange of meta data among data movers for file locking and direct access to shared file systems
US6243737B1 (en) 1999-04-09 2001-06-05 Translink Software, Inc. Method and apparatus for providing direct transaction access to information residing on a host system
AU4717901A (en) 1999-12-06 2001-06-25 Warp Solutions, Inc. System and method for dynamic content routing
US20010016878A1 (en) 2000-02-17 2001-08-23 Hideki Yamanaka Communicating system and communicating method for controlling throughput
US20020107989A1 (en) 2000-03-03 2002-08-08 Johnson Scott C. Network endpoint system with accelerated data path

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4942520A (en) * 1987-07-31 1990-07-17 Prime Computer, Inc. Method and apparatus for indexing, accessing and updating a memory
US5706434A (en) * 1995-07-06 1998-01-06 Electric Classifieds, Inc. Integrated request-response system and method generating responses to request objects formatted according to various communication protocols
US5778367A (en) * 1995-12-14 1998-07-07 Network Engineering Software, Inc. Automated on-line information service and directory, particularly for the world wide web
US6421321B1 (en) * 1997-02-25 2002-07-16 Fujitsu Limited Apparatus and a method for transferring a packet flow in a communication network
US6006264A (en) * 1997-08-01 1999-12-21 Arrowpoint Communications, Inc. Method and system for directing a flow between a client and a server
US5974443A (en) * 1997-09-26 1999-10-26 Intervoice Limited Partnership Combined internet and data access system
US6115370A (en) * 1998-05-26 2000-09-05 Nera Wireless Broadband Access As Method and system for protocols for providing voice, data, and multimedia services in a wireless local loop system
US6687732B1 (en) * 1998-09-28 2004-02-03 Inktomi Corporation Adaptive traffic bypassing in an intercepting network driver
US6452921B1 (en) * 1998-11-24 2002-09-17 International Business Machines Corporation Method and system within a computer network for maintaining source-route information at a router bypassed by shortcut communication
US20020073218A1 (en) * 1998-12-23 2002-06-13 Bill J. Aspromonte Stream device management system for multimedia clients in a broadcast network architecture
US6640278B1 (en) * 1999-03-25 2003-10-28 Dell Products L.P. Method for configuration and management of storage resources in a storage network
US6539518B1 (en) * 1999-09-10 2003-03-25 Integrated Memory Logic, Inc. Autodisk controller

Cited By (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7765554B2 (en) 2000-02-08 2010-07-27 Mips Technologies, Inc. Context selection and activation mechanism for activating one of a group of inactive contexts in a processor core for servicing interrupts
US7649901B2 (en) * 2000-02-08 2010-01-19 Mips Technologies, Inc. Method and apparatus for optimizing selection of available contexts for packet processing in multi-stream packet processing
US7877481B2 (en) 2000-02-08 2011-01-25 Mips Technologies, Inc. Method and apparatus for overflowing data packets to a software-controlled memory when they do not fit into a hardware-controlled memory
US8081645B2 (en) 2000-02-08 2011-12-20 Mips Technologies, Inc. Context sharing between a streaming processing unit (SPU) and a packet management unit (PMU) in a packet processing environment
US20070233893A1 (en) * 2000-03-22 2007-10-04 Yottayotta, Inc. Method and system for providing multimedia information on demand over wide area networks
US8260949B2 (en) * 2000-03-22 2012-09-04 Emc Corporation Method and system for providing multimedia information on demand over wide area networks
US9584665B2 (en) 2000-06-21 2017-02-28 International Business Machines Corporation System and method for optimizing timing of responses to customer communications
US7849044B2 (en) * 2000-06-21 2010-12-07 International Business Machines Corporation System and method for automatic task prioritization
US7752159B2 (en) 2001-01-03 2010-07-06 International Business Machines Corporation System and method for classifying text
US20070294199A1 (en) * 2001-01-03 2007-12-20 International Business Machines Corporation System and method for classifying text
US6973666B1 (en) * 2001-02-28 2005-12-06 Unisys Corporation Method of moving video data thru a video-on-demand system which avoids paging by an operating system
US8135678B1 (en) * 2001-06-25 2012-03-13 Netapp, Inc. System and method for restoring a single data stream file from a snapshot
US7162486B2 (en) * 2001-06-25 2007-01-09 Network Appliance, Inc. System and method for representing named data streams within an on-disk structure of a file system
US8010503B1 (en) 2001-06-25 2011-08-30 Netapp, Inc. System and method for restoring a single data stream file from a snapshot
US20040059866A1 (en) * 2001-06-25 2004-03-25 Kayuri Patel System and method for representing named data streams within an on-disk structure of a file system
US20030147349A1 (en) * 2002-02-01 2003-08-07 Burns Daniel J. Communications systems and methods utilizing a device that performs per-service queuing
US20040210584A1 (en) * 2003-02-28 2004-10-21 Peleg Nir Method and apparatus for increasing file server performance by offloading data path processing
US8180843B2 (en) 2003-04-24 2012-05-15 Neopath Networks, Inc. Transparent file migration using namespace replication
US20080114854A1 (en) * 2003-04-24 2008-05-15 Neopath Networks, Inc. Transparent file migration using namespace replication
US7831641B2 (en) * 2003-04-24 2010-11-09 Neopath Networks, Inc. Large file support for a network file server
US20040267831A1 (en) * 2003-04-24 2004-12-30 Wong Thomas K. Large file support for a network file server
US8495002B2 (en) 2003-05-06 2013-07-23 International Business Machines Corporation Software tool for training and testing a knowledge base
US7756810B2 (en) 2003-05-06 2010-07-13 International Business Machines Corporation Software tool for training and testing a knowledge base
US20050047440A1 (en) * 2003-08-25 2005-03-03 Jerome Plun Division of data structures for efficient simulation
US20050125503A1 (en) * 2003-09-15 2005-06-09 Anand Iyengar Enabling proxy services using referral mechanisms
US8539081B2 (en) 2003-09-15 2013-09-17 Neopath Networks, Inc. Enabling proxy services using referral mechanisms
US20050166239A1 (en) * 2003-10-09 2005-07-28 Olympus Corporation Surgery support system
US20050135352A1 (en) * 2003-12-18 2005-06-23 Roe Bryan Y. Efficient handling of HTTP traffic
US7843952B2 (en) * 2003-12-18 2010-11-30 Intel Corporation Efficient handling of HTTP traffic
US7779033B2 (en) * 2003-12-30 2010-08-17 Wibu-Systems Ag Method for controlling a data processing device
US20070186037A1 (en) * 2003-12-30 2007-08-09 Wibu-Systems Ag Method for controlling a data processing device
KR101105177B1 (en) 2004-02-19 2012-01-12 퀄컴 캠브리지 리미티드 Data container for user interface content data
US20050235364A1 (en) * 2004-04-15 2005-10-20 Wilson Christopher S Authentication mechanism permitting access to data stored in a data processing device
US7681007B2 (en) 2004-04-15 2010-03-16 Broadcom Corporation Automatic expansion of hard disk drive capacity in a storage device
US20050231849A1 (en) * 2004-04-15 2005-10-20 Viresh Rustagi Graphical user interface for hard disk drive management in a data storage system
US20050235128A1 (en) * 2004-04-15 2005-10-20 Viresh Rustagi Automatic expansion of hard disk drive capacity in a storage device
US20050235063A1 (en) * 2004-04-15 2005-10-20 Wilson Christopher S Automatic discovery of a networked device
US20050235283A1 (en) * 2004-04-15 2005-10-20 Wilson Christopher S Automatic setup of parameters in networked devices
US20060271598A1 (en) * 2004-04-23 2006-11-30 Wong Thomas K Customizing a namespace in a decentralized storage environment
US8195627B2 (en) 2004-04-23 2012-06-05 Neopath Networks, Inc. Storage policy monitoring for a storage network
US20060080371A1 (en) * 2004-04-23 2006-04-13 Wong Chi M Storage policy monitoring for a storage network
US8190741B2 (en) 2004-04-23 2012-05-29 Neopath Networks, Inc. Customizing a namespace in a decentralized storage environment
US7881325B2 (en) 2005-04-27 2011-02-01 Cisco Technology, Inc. Load balancing technique implemented in a storage area network
US20060245361A1 (en) * 2005-04-27 2006-11-02 Cisco Technology, Inc Load balancing technique implemented in a storage area network
WO2006116604A3 (en) * 2005-04-27 2007-12-21 Cisco Tech Inc Improved load balancing technique implemented in a storage area network
US20060248252A1 (en) * 2005-04-27 2006-11-02 Kharwa Bhupesh D Automatic detection of data storage functionality within a docking station
US7647434B2 (en) 2005-05-19 2010-01-12 Cisco Technology, Inc. Technique for in order delivery of traffic across a storage area network
US20060262784A1 (en) * 2005-05-19 2006-11-23 Cisco Technology, Inc. Technique for in order delivery of traffic across a storage area network
US8832697B2 (en) 2005-06-29 2014-09-09 Cisco Technology, Inc. Parallel filesystem traversal for transparent mirroring of directories and files
US20070024919A1 (en) * 2005-06-29 2007-02-01 Wong Chi M Parallel filesystem traversal for transparent mirroring of directories and files
US20070061077A1 (en) * 2005-09-09 2007-03-15 Sina Fateh Discrete inertial display navigation
US8131689B2 (en) 2005-09-30 2012-03-06 Panagiotis Tsirigotis Accumulating access frequency and file attributes for supporting policy based storage management
US8601211B2 (en) * 2006-12-06 2013-12-03 Fusion-Io, Inc. Storage system with front-end controller
US11847066B2 (en) 2006-12-06 2023-12-19 Unification Technologies Llc Apparatus, system, and method for managing commands of solid-state storage using bank interleave
US11640359B2 (en) 2006-12-06 2023-05-02 Unification Technologies Llc Systems and methods for identifying storage resources that are not in use
US11573909B2 (en) 2006-12-06 2023-02-07 Unification Technologies Llc Apparatus, system, and method for managing commands of solid-state storage using bank interleave
US10103991B2 (en) * 2006-12-07 2018-10-16 Cisco Technology, Inc. Scalability of providing packet flow management
US20080137671A1 (en) * 2006-12-07 2008-06-12 Kaitki Agarwal Scalability of providing packet flow management
US8724463B2 (en) * 2006-12-07 2014-05-13 Cisco Technology, Inc. Scalability of providing packet flow management
US20140369354A1 (en) * 2006-12-07 2014-12-18 Cisco Technology, Inc. Scalability of providing packet flow management
US9219680B2 (en) * 2006-12-07 2015-12-22 Cisco Technology, Inc. Scalability of providing packet flow management
US20160112323A1 (en) * 2006-12-07 2016-04-21 Cisco Technology, Inc. Scalability of providing packet flow management
US20080183662A1 (en) * 2007-01-31 2008-07-31 Benjamin Clay Reed Resolving at least one file-path for a change-record of a computer file-system object in a computer file-system
US20100332536A1 (en) * 2009-06-30 2010-12-30 Hewlett-Packard Development Company, L.P. Associating attribute information with a file system object
US20110145449A1 (en) * 2009-12-11 2011-06-16 Merchant Arif A Differentiated Storage QoS
US9104482B2 (en) * 2009-12-11 2015-08-11 Hewlett-Packard Development Company, L.P. Differentiated storage QoS
US10587720B2 (en) * 2014-02-11 2020-03-10 T-Mobile Usa, Inc. Network aware dynamic content delivery tuning
US20150229569A1 (en) * 2014-02-11 2015-08-13 T-Mobile Usa, Inc. Network Aware Dynamic Content Delivery Tuning
US10097595B2 (en) * 2014-03-06 2018-10-09 Huawei Technologies Co., Ltd. Data processing method in stream computing system, control node, and stream computing system
US20160373494A1 (en) * 2014-03-06 2016-12-22 Huawei Technologies Co., Ltd. Data Processing Method in Stream Computing System, Control Node, and Stream Computing System
US10630737B2 (en) 2014-03-06 2020-04-21 Huawei Technologies Co., Ltd. Data processing method in stream computing system, control node, and stream computing system
CN114302394A (en) * 2021-11-19 2022-04-08 深圳震有科技股份有限公司 Network direct memory access method and system under 5G UPF

Also Published As

Publication number Publication date
US6535518B1 (en) 2003-03-18
WO2001059967A1 (en) 2001-08-16
AU2001234432A1 (en) 2001-08-20
US6757291B1 (en) 2004-06-29

Similar Documents

Publication Publication Date Title
US6535518B1 (en) System for bypassing a server to achieve higher throughput between data network and data storage system
Kaufmann et al. High performance packet processing with flexnic
Regnier et al. TCP onloading for data center servers
Prylli et al. BIP: a new protocol designed for high performance networking on myrinet
US7519650B2 (en) Split socket send queue apparatus and method with efficient queue flow control, retransmission and sack support mechanisms
US6449656B1 (en) Storing a frame header
US20040210584A1 (en) Method and apparatus for increasing file server performance by offloading data path processing
US20050149529A1 (en) Efficient handling of download requests
US20120226307A1 (en) Devices and methods for reshaping cartilage structures
US20060259644A1 (en) Receive queue device with efficient queue flow control, segment placement and virtualization mechanisms
US20050021764A1 (en) Apparatus and method for hardware implementation or acceleration of operating system functions
US7596634B2 (en) Networked application request servicing offloaded from host
JP2004526218A (en) Highly scalable and fast content-based filtering and load balancing system and method in interconnected fabric
Buonadonna et al. Queue pair IP: a hybrid architecture for system area networks
EP1759317B1 (en) Method and system for supporting read operations for iscsi and iscsi chimney
Shashidhara et al. {FlexTOE}: Flexible {TCP} Offload with {Fine-Grained} Parallelism
EP1839162A1 (en) RNIC-BASED OFFLOAD OF iSCSI DATA MOVEMENT FUNCTION BY TARGET
US7149808B2 (en) Application protocol offloading
US20090292825A1 (en) Method and apparatus for in-kernel application-specific processing of content streams
Barak et al. Performance of the Communication Layers of TCP/IP with the Myrinet Gigabit LAN
Mansley Engineering a user-level TCP for the CLAN network
Kim et al. Building a high-performance communication layer over virtual interface architecture on Linux clusters
Lorenz et al. Modular TCP handoff design in STREAMS–based TCP/IP implementation
Rosu et al. Kernel Support for Faster Web Proxies.
Zhao et al. SpliceNP: a TCP splicer using a network processor

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION