US20150124824A1 - Incast drop cause telemetry - Google Patents
Incast drop cause telemetry Download PDFInfo
- Publication number
- US20150124824A1 US20150124824A1 US14/484,181 US201414484181A US2015124824A1 US 20150124824 A1 US20150124824 A1 US 20150124824A1 US 201414484181 A US201414484181 A US 201414484181A US 2015124824 A1 US2015124824 A1 US 2015124824A1
- Authority
- US
- United States
- Prior art keywords
- packet
- packets
- network device
- buffer
- dequeued
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/20—Traffic policing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/32—Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4641—Virtual LANs, VLANs, e.g. virtual private networks [VPN]
- H04L12/4645—Details on frame tagging
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0893—Assignment of logical groups to network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
- H04L47/62—Queue scheduling characterised by scheduling criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/16—Implementing security features at a particular protocol layer
- H04L63/164—Implementing security features at a particular protocol layer at the network layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0894—Policy-based network configuration management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0895—Configuration of virtualised networks or elements, e.g. virtualised network function or OpenFlow elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
- H04L45/745—Address table lookup; Address filtering
- H04L45/74591—Address table lookup; Address filtering using content-addressable memories [CAM]
Definitions
- the subject technology relates to data gathering for packets that are enqueued and dequeued in a buffer and in particular, for collecting packet metadata for use in analyzing incast events.
- Packet losses can cause timeouts at the transport and application levels, leading to a loss of throughput and an increase in flow transfer times and the number of aborted jobs.
- FIG. 1 illustrates an example network device, according to certain aspects of the subject technology.
- FIG. 2 illustrates an example of a network configuration in which an incast event can occur, according to some implementations.
- FIG. 3 illustrates a conceptual block diagram of a buffer implemented in a network device, according to some aspects.
- FIG. 4 illustrates a block diagram of an example method for capturing packet metadata, according to some implementations.
- a computer network is a geographically distributed collection of nodes interconnected by communication links and segments for transporting data between endpoints, such as personal computers and workstations.
- endpoints such as personal computers and workstations.
- Many types of networks are available, with the types ranging from local area networks (LANs) and wide area networks (WANs) to overlay and software-defined networks, such as virtual extensible local area networks (VXLANs).
- LANs local area networks
- WANs wide area networks
- VXLANs virtual extensible local area networks
- LANs typically connect nodes over dedicated private communication links located in the same geographic region, such as a building or campus.
- WANs typically connect geographically dispersed nodes over long-distance communications links, such as common carrier telephone lines, optical lightpaths, synchronous optical networks (SONET), or synchronous digital hierarchy (SDH) links.
- LANs and WANs can include layer 2 (L2) and/or layer 3 (L3) networks and devices.
- the Internet is an example of a WAN that connects disparate networks throughout the world, providing global communication between nodes on various networks.
- the nodes typically communicate over the network by exchanging discrete frames or packets of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
- TCP/IP Transmission Control Protocol/Internet Protocol
- a protocol can refer to a set of rules defining how the nodes interact with each other.
- Computer networks may be further interconnected by an intermediate network node, such as a router, to extend the effective “size” of each network.
- Transmission Control Protocol is widely used to provide reliable, ordered delivery of data from one network entity to another. More particularly, TCP is frequently relied upon to implement Internet applications, such as, for example, the World Wide Web, e-mail, and file transfer.
- TCP Transmission Control Protocol
- multiple servers may independently send data to a single common receiver. Provided that the multiple senders simultaneously transmit data to the receiver, congestion, or incast congestion, can occur if the receiver is not capable of receiving the quantity of data being transmitted.
- incast or “fan-in” congestion can lead to bursty losses and TCP timeouts.
- incasting occurs when multiple sources simultaneously transfer data to a common client/receiver, overwhelming the buffers to which the client is connected. Incast can cause severe losses of throughput and vastly increase flow transfer times, making its prevention an important factor in ensuring reliable packet delivery across data center interconnect fabrics.
- HRTs are designed to drastically reduce the minimum retransmission timeout (min-RTO).
- min-RTO minimum retransmission timeout
- the approach of reducing the value of the TCP's min-RTO has the effect of drastically reducing the amount of time a TCP source is timed out after bursty packet losses.
- high resolution timers can be difficult to implement, especially in virtual-machine-rich environments. For instance, reducing min-RTO can require making operating system-specific changes to the TCP stack—imposing potentially serious deployment challenges because of the widespread use of closed-source operating systems like Windows and legacy operating systems.
- incast events may be predicted, for example by monitoring a rate at which packets are dequeued from a buffer, as compared to a buffer fill rate.
- a rate at which packets are dequeued from a buffer as compared to a buffer fill rate.
- information is often of limited use because a given buffer can (on average) be empty—thus, time varying measurements based on bandwidth utilization, or on buffer use, may be too coarse-grained to yield insight into the actual cause/s of an incast event.
- the subject technology addresses the foregoing need by providing a way to capture data about packets enqueued just before an incast occurrence. With information regarding enqueued packets, network administrators can better analyze and understand the conditions leading to an overflow event. Enqueued packet information can yield clues as to the systemic cause of an incast event, for example, by providing information regarding source(s) and/or destination(s) of buffered packets, as well information identifying application(s) for which they are associated.
- the subject technology can be implemented by capturing packet metadata for packets residing in a buffer when an incast occurs.
- packet metadata for each dequeued packet can be captured, up to the last packet that was added to the buffer before the incast is detected.
- a last packet stored to the buffer can be marked or “flagged” upon the detection of an incast event.
- packet metadata is captured (e.g., packet header information can be recorded) as each packet is subsequently dequeued. The recordation/capturing of dequeued packet metadata can continue until it is determined that the flagged packet has been dequeued.
- a “snapshot” of packet metadata e.g., representing all packets in the filled buffer (before the incast event), can be recorded for later analysis.
- packet metadata e.g., representing all packets in the filled buffer (before the incast event)
- FIGS. 1 and 2 A brief introductory description of example systems and networks for which metadata information can be captured, as illustrated in FIGS. 1 and 2 , is disclosed herein.
- FIG. 1 illustrates an example network device 110 (e.g., a router) suitable for implementing the present invention.
- Network device 110 includes a master central processing unit (CPU) 162 , interfaces 168 , and bus 115 (e.g., a PCI bus).
- CPU 162 When acting under the control of appropriate software or firmware, CPU 162 is responsible for executing packet management, error detection, and/or routing functions, such as miscabling detection functions, for example. CPU 162 can accomplish all these functions under the control of software including an operating system and any appropriate applications software.
- CPU 162 may include one or more processors 163 such as a processor from the Motorola family of microprocessors or the MIPS family of microprocessors. In alternative aspects, processor 163 is specially designed hardware for controlling the operations of router 110 .
- memory 161 such as non-volatile RAM and/or ROM also forms part of CPU 162 . However, there are many different ways in which memory could be coupled to the system.
- Interfaces 168 are typically provided as interface cards (sometimes referred to as “line cards”). Generally, they control the sending and receiving of data packets over the network and sometimes support other peripherals used with router 110 .
- interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like.
- various very high-speed interfaces may be provided such as fast token ring interfaces, wireless interfaces, Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces and the like.
- FIG. 1 illustrates an example of a network device implementation
- it is not the only network device architecture on which the subject technology may be implemented.
- an architecture having a single processor that handles communications as well as routing computations, etc. is often used.
- other types of interfaces and media can also be implemented.
- FIG. 2 illustrates a data center network structure in which an environment 100 includes Top of Rack (TOR) switches 202 - 208 , aggregate switches 210 and 212 , aggregate routers 214 and 216 , an access router 218 , and Internet 220 .
- FIG. 1 illustrates an example of how one of the TOR switches 208 can be connected to a plurality of servers 222 - 226 .
- each of TOR switches 202 - 208 can be similarly connected to the same plurality of servers 222 - 226 and/or different servers.
- the environment 200 may represent a basic topology, at the abstract level of a data center network.
- each of the TOR switches 202 - 208 may be connected to each of the aggregate switches 210 .
- TOR switch 202 can be connected to both aggregate switch 210 and aggregate switch 212 .
- each of aggregate switches 210 and 212 can be connected to each of the aggregate routers 214 and 216 , which may be connected to the access router 218 .
- access router 218 can be connected to Internet 220 . It is contemplated that any number of TOR switches 202 - 208 , aggregate switches 210 and 212 , aggregate routers 214 and 216 , and access routers 218 can be implemented in environment 200 .
- a network data center may be a facility used to house computer systems and associated components, such as TOR switches 202 - 208 , aggregate switches 210 and 212 , aggregate routers 214 and 216 and/or access router 218 , for example.
- TOR switches 202 - 208 can refer to small port count switches that are situated on the top or near the top of a rack included in a network data center.
- aggregate switches 210 and 212 can be used to increase the link speed beyond the limits of any single cable or port.
- each of TOR switches 202 - 208 may be connected to a plurality of servers 222 - 226 . Although three servers 222 - 226 are shown in FIG. 1 , it is contemplated that each TOR switch 202 - 208 can be connected to any number of servers 222 - 226 .
- TOR switch 208 is representative of TOR switches 202 - 206 and it may be directly connected to servers 222 - 226 .
- TOR switch 208 may be connected to dozens of servers 222 - 226 .
- the number of servers 222 - 226 under the same TOR switch 208 is from 44 to 48, and the TOR switch 208 is a 48-port Gigabit switch with one or multiple 10 Gigabit uplinks.
- In the environment 200 such as a network data center, data may be stored on multiple servers 222 - 226 .
- Incast congestion can occur when a file, or a portion thereof, is fetched from multiple of servers 222 - 226 . More specifically, incast congestion may occur when multiple senders (i.e., servers 222 - 226 ), which may be operating under the same TOR switch 202 - 208 , send data to a single receiver either simultaneously or at approximately the same time.
- the receiver can include any type of server and/or computing device. Even if the senders simultaneously transmit data to the receiver, if the number of senders or the amount of data transmitted by each sender is relatively small, incast congestion may be avoided.
- the environment 200 includes ten servers 222 - 226 and an allocator that assigns one or more of the servers 222 - 226 to provide data in response to a request for that data.
- the servers 222 - 226 send their respective data packets to a receiver at approximately the same time, the receiver may not have available bandwidth to receive the data packets (i.e., incast congestion). As a result, data packets may be lost and the server 222 - 226 that transmitted the lost data packet(s) may need to retransmit those data packets.
- the receiver may need to wait for the lost data packet to be retransmitted in order to receive the data responsive to the request. That is, the performance of environment 200 may be dependent upon the TCP connections between servers 222 - 226 and the receiver. Therefore, the time associated with retransmitting the lost data packets may cause unneeded delay in the environment 200 .
- FIG. 3 illustrates an example, of buffer (queue) of a network device 300 (e.g., similar to network device 110 , discussed above with respect to FIG. 1 ).
- network device 300 includes buffer 302 , which stores multiple packets e.g., packets ‘A,’ ‘B,’ ‘C,’ and ‘D.’
- Network device 300 also include multiple network connections, e.g., a dequeue channel, which removes data from buffer 302 , and multiple enqueued channels, incoming data from which is stored into buffer 302 .
- network device 300 receives data via the multiple enqueue channels, and stores the data in buffer 302 .
- network device 300 will dequeue the data in buffer 302 at a rate that is equal to, or faster than, a rate at which new data is being stored or added to buffer 302 .
- new data e.g., packets
- buffer 302 can fill to capacity, and subsequently received packets, such as packet ‘E’ are dropped.
- an incast event can occur when multiple enqueue channels are used to push data/packets onto buffer 302 faster than the data/packets can be dequeued.
- an incast event it can be helpful to know more about the network conditions preceding the event, for example, by observing the packets stored in buffer 302 before the incast event occurred.
- data may be collected about the packets stored to buffer 302 , for example by capturing packet header metadata for each packet as it is dequeue from buffer 302 .
- the storing/capturing of packet metadata can be initialized by the detection of an incast event and can be continued, for example, until a marked/flag packet is dequeue.
- the marked/flagged packet can be a packet last stored to buffer 302 , before the incast event was detected.
- the packet last stored to buffer 302 can be flagged/marked e.g., to indicate a time immediately preceding a packet drop.
- the marking/flagging of a last packet stored to buffer 302 can be performed by modifying packet header information of the marked packet.
- packet ‘D’ is a last packet stored to buffer 302 .
- Packet ‘E’ represents a first packet dropped after buffer 302 is filled, e.g., due to data incast.
- packet ‘D’ has been marked, by network device 300 , such that a bit in the packet header has been flipped, distinguishing packet ‘D’ from the other packets in buffer 302 .
- FIG. 3 is merely an illustration of an example marking process. However, depending on implementation, the manner and/or process in which packet marking is performed may vary.
- the respective metadata information for each packet can be captured/recorded and stored for later analysis.
- network administrators may better troubleshoot the causes of incast events.
- packet metadata information may be analyzed locally, or remotely (e.g., across one or more remote collectors), depending on the desired implementation. That is, packet metadata may be stored and/or analyzed locally, e.g. on a network device in which metadata information is captured. Alternatively, any portion of captured metadata information may be sent to one or more remote systems/collectors for further storage and/or analysis.
- FIG. 4 illustrates an example block diagram of a process 400 that can be used to implement aspects of the subject technology.
- Process 400 begins with step 402 , in which one or more data packets are received at a network device.
- the network device can include any of a variety of network enabled, processor-based devices, such as one or more switches (e.g. TOR switches) or routers, etc.
- each of the received data packets are stored in a buffer (e.g., a queue) associated with the network device.
- a buffer e.g., a queue
- the packets can be stored in a queue or buffer as they are processed/routed e.g., before being dequeued and transmitted/routed to another node, or network end-point.
- a packet drop condition is determined. If in decision step 406 it is determined that no packet drop has been detected, process 400 proceeds back to step 404 , in which incoming packets continue to be stored in a queue of the network device. Alternatively, if in decision step 406 it is determined that a packet drop has been detected, process 400 proceeds to step 408 in which a packet presently stored in the queue (buffer) is marked, indicating a time marker before the drop event. As discussed in further detail below, the marked packet can be used to identify a time-frame for which packet information (for dequeued packets) is to be captured/collected.
- the marked packet is the last packet enqueued before the drop event was detected. That is, the most recent packet stored to the buffer is identified and marked, for example, by modifying one or more bits in the packet header.
- step 410 packets stored in the buffer prior to the drop event are dequeued.
- the packets are dequeued in a particular order, such as in a first-in-first-our order.
- the marked packet is the last packet to be dequeued, from among the set of total packets residing in the buffer when the packet drop was detected.
- packet data e.g., packet metadata
- the capturing of packet metadata is stopped after the marked packet has been dequeued, e.g., once a ‘snap-shot’ of buffered metadata has been captured.
- captured metadata information is analyzed, for example, to better understand the circumstances preceding the incast event.
- a network administrator, or other user diagnosing the cause of a packet drop event may find such information useful, for example, in determining what applications or network paths/links are associated with the incast.
- captured packet metadata can contain information indicating one or more originating applications, source/origination addresses, destination addresses, tenant network identifier(s), virtual local area network (VLAN) identification(s), etc.
- any specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged, or that only a portion of the illustrated steps be performed. Some of the steps may be performed simultaneously. For example, in certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
- a phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology.
- a disclosure relating to an aspect may apply to all configurations, or one or more configurations.
- a phrase such as an aspect can refer to one or more aspects and vice versa.
- a phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology.
- a disclosure relating to a configuration may apply to all configurations, or one or more configurations.
- a phrase such as a configuration may refer to one or more configurations and vice versa.
Abstract
Aspects of the subject disclosure relate to ways to capture packet metadata following an incast event. In some implementations, a method of the subject technology can include steps for receiving a plurality of data packets at a network device, storing each of the plurality of packets in a buffer, and detecting a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the queue. In some aspects, the method can further include steps for indicating a marked packet from among the received data packets, dequeuing each of the plurality of packets in the buffer, capturing metadata for each dequeued packet until the marked packet is dequeued.
Description
- This application claims the benefit of U.S. Provisional Application No. 61/900,324, filed Nov. 5, 2013, entitled “SYSTEMS AND METHODS FOR DETERMINING METRICS AND WORKLOAD MANAGEMENT,” which is incorporated herein by reference in its entirety.
- The subject technology relates to data gathering for packets that are enqueued and dequeued in a buffer and in particular, for collecting packet metadata for use in analyzing incast events.
- As data centers grow in the number of server nodes and operating speed of the interconnecting network, it has become challenging to ensure reliable packet delivery. Moreover, the workload in large data centers is generated by an increasingly heterogeneous mix of applications, such as search, retail, high-performance computing and storage, and social networking.
- There are two main causes of packets loss/drops: (1) drops due to congestion episodes, particularly “incast” events, and (2) corruption on the channel due to increasing line rates. Packet losses can cause timeouts at the transport and application levels, leading to a loss of throughput and an increase in flow transfer times and the number of aborted jobs.
- Certain features of the subject technology are set forth in the appended claims. However, the accompanying drawings, which are included to provide further understanding, illustrate disclosed aspects and together with the description serve to explain the principles of the subject technology. In the drawings:
-
FIG. 1 illustrates an example network device, according to certain aspects of the subject technology. -
FIG. 2 illustrates an example of a network configuration in which an incast event can occur, according to some implementations. -
FIG. 3 illustrates a conceptual block diagram of a buffer implemented in a network device, according to some aspects. -
FIG. 4 illustrates a block diagram of an example method for capturing packet metadata, according to some implementations. - The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which aspects of the disclosure can be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a more thorough understanding of the subject technology. However, it will be clear and apparent that the subject technology is not limited to the specific details set forth herein and may be practiced without these details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology.
- A computer network is a geographically distributed collection of nodes interconnected by communication links and segments for transporting data between endpoints, such as personal computers and workstations. Many types of networks are available, with the types ranging from local area networks (LANs) and wide area networks (WANs) to overlay and software-defined networks, such as virtual extensible local area networks (VXLANs).
- LANs typically connect nodes over dedicated private communication links located in the same geographic region, such as a building or campus. WANs, on the other hand, typically connect geographically dispersed nodes over long-distance communications links, such as common carrier telephone lines, optical lightpaths, synchronous optical networks (SONET), or synchronous digital hierarchy (SDH) links. LANs and WANs can include layer 2 (L2) and/or layer 3 (L3) networks and devices.
- The Internet is an example of a WAN that connects disparate networks throughout the world, providing global communication between nodes on various networks. The nodes typically communicate over the network by exchanging discrete frames or packets of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP). In this context, a protocol can refer to a set of rules defining how the nodes interact with each other. Computer networks may be further interconnected by an intermediate network node, such as a router, to extend the effective “size” of each network.
- Transmission Control Protocol (TCP) is widely used to provide reliable, ordered delivery of data from one network entity to another. More particularly, TCP is frequently relied upon to implement Internet applications, such as, for example, the World Wide Web, e-mail, and file transfer. In a high-bandwidth and low latency network utilizing TCP, multiple servers may independently send data to a single common receiver. Provided that the multiple senders simultaneously transmit data to the receiver, congestion, or incast congestion, can occur if the receiver is not capable of receiving the quantity of data being transmitted.
- The congestion episode termed “incast” or “fan-in” congestion can lead to bursty losses and TCP timeouts. Essentially, incasting occurs when multiple sources simultaneously transfer data to a common client/receiver, overwhelming the buffers to which the client is connected. Incast can cause severe losses of throughput and vastly increase flow transfer times, making its prevention an important factor in ensuring reliable packet delivery across data center interconnect fabrics.
- Two approaches are conventionally implemented to address the incast problem in data centers: (1) reducing the duration of TCP timeouts using high resolution timers (HRTs), and (2) increasing switch buffer sizes to reduce loss events.
- The use of HRTs is designed to drastically reduce the minimum retransmission timeout (min-RTO). The approach of reducing the value of the TCP's min-RTO has the effect of drastically reducing the amount of time a TCP source is timed out after bursty packet losses. However, high resolution timers can be difficult to implement, especially in virtual-machine-rich environments. For instance, reducing min-RTO can require making operating system-specific changes to the TCP stack—imposing potentially serious deployment challenges because of the widespread use of closed-source operating systems like Windows and legacy operating systems.
- The other approach to the incasting problem is to reduce packet losses using switches with very large buffers. However, increasing switch buffer sizes is very expensive, and increases latency and power dissipation. Moreover, large, high-bandwidth buffers such as needed for high-speed data center switches require expensive, complex and power-hungry memories. In terms of performance, while they can reduce packet drops and hence timeouts due to incast, they may also increase the latency of short messages, potentially leading to the violation of service level agreements (SLAs) for latency-sensitive applications.
- In some other implementations, incast events may be predicted, for example by monitoring a rate at which packets are dequeued from a buffer, as compared to a buffer fill rate. However, such information is often of limited use because a given buffer can (on average) be empty—thus, time varying measurements based on bandwidth utilization, or on buffer use, may be too coarse-grained to yield insight into the actual cause/s of an incast event.
- Accordingly, there remains a need to better understand network conditions that exist just before and during, the occurrence of an incast event.
- The subject technology addresses the foregoing need by providing a way to capture data about packets enqueued just before an incast occurrence. With information regarding enqueued packets, network administrators can better analyze and understand the conditions leading to an overflow event. Enqueued packet information can yield clues as to the systemic cause of an incast event, for example, by providing information regarding source(s) and/or destination(s) of buffered packets, as well information identifying application(s) for which they are associated.
- In some aspects, the subject technology can be implemented by capturing packet metadata for packets residing in a buffer when an incast occurs. As discussed in further detail below, packet metadata for each dequeued packet can be captured, up to the last packet that was added to the buffer before the incast is detected. In some implementations, a last packet stored to the buffer can be marked or “flagged” upon the detection of an incast event. Thereafter, packet metadata is captured (e.g., packet header information can be recorded) as each packet is subsequently dequeued. The recordation/capturing of dequeued packet metadata can continue until it is determined that the flagged packet has been dequeued. Thus, a “snapshot” of packet metadata, e.g., representing all packets in the filled buffer (before the incast event), can be recorded for later analysis. A brief introductory description of example systems and networks for which metadata information can be captured, as illustrated in
FIGS. 1 and 2 , is disclosed herein. -
FIG. 1 illustrates an example network device 110 (e.g., a router) suitable for implementing the present invention.Network device 110 includes a master central processing unit (CPU) 162,interfaces 168, and bus 115 (e.g., a PCI bus). When acting under the control of appropriate software or firmware,CPU 162 is responsible for executing packet management, error detection, and/or routing functions, such as miscabling detection functions, for example.CPU 162 can accomplish all these functions under the control of software including an operating system and any appropriate applications software.CPU 162 may include one ormore processors 163 such as a processor from the Motorola family of microprocessors or the MIPS family of microprocessors. In alternative aspects,processor 163 is specially designed hardware for controlling the operations ofrouter 110. In a specific implementation, memory 161 (such as non-volatile RAM and/or ROM) also forms part ofCPU 162. However, there are many different ways in which memory could be coupled to the system. -
Interfaces 168 are typically provided as interface cards (sometimes referred to as “line cards”). Generally, they control the sending and receiving of data packets over the network and sometimes support other peripherals used withrouter 110. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. In addition, various very high-speed interfaces may be provided such as fast token ring interfaces, wireless interfaces, Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces and the like. - Although the system shown in
FIG. 1 illustrates an example of a network device implementation, it is not the only network device architecture on which the subject technology may be implemented. For example, an architecture having a single processor that handles communications as well as routing computations, etc. is often used. Further, other types of interfaces and media can also be implemented. -
FIG. 2 illustrates a data center network structure in which an environment 100 includes Top of Rack (TOR) switches 202-208,aggregate switches aggregate routers Internet 220. Furthermore,FIG. 1 illustrates an example of how one of the TOR switches 208 can be connected to a plurality of servers 222-226. However, it is contemplated that each of TOR switches 202-208 can be similarly connected to the same plurality of servers 222-226 and/or different servers. In various embodiments, theenvironment 200 may represent a basic topology, at the abstract level of a data center network. As shown, each of the TOR switches 202-208 may be connected to each of the aggregate switches 210. For instance,TOR switch 202 can be connected to bothaggregate switch 210 andaggregate switch 212. Moreover, each ofaggregate switches aggregate routers Internet 220. It is contemplated that any number of TOR switches 202-208,aggregate switches aggregate routers environment 200. - In various aspects, a network data center may be a facility used to house computer systems and associated components, such as TOR switches 202-208,
aggregate switches aggregate routers aggregate switches - As stated above, each of TOR switches 202-208 may be connected to a plurality of servers 222-226. Although three servers 222-226 are shown in
FIG. 1 , it is contemplated that each TOR switch 202-208 can be connected to any number of servers 222-226. In this embodiment,TOR switch 208 is representative of TOR switches 202-206 and it may be directly connected to servers 222-226.TOR switch 208 may be connected to dozens of servers 222-226. In one embodiment, the number of servers 222-226 under thesame TOR switch 208 is from 44 to 48, and theTOR switch 208 is a 48-port Gigabit switch with one or multiple 10 Gigabit uplinks. - In the
environment 200, such as a network data center, data may be stored on multiple servers 222-226. Incast congestion can occur when a file, or a portion thereof, is fetched from multiple of servers 222-226. More specifically, incast congestion may occur when multiple senders (i.e., servers 222-226), which may be operating under the same TOR switch 202-208, send data to a single receiver either simultaneously or at approximately the same time. In various implementations, the receiver can include any type of server and/or computing device. Even if the senders simultaneously transmit data to the receiver, if the number of senders or the amount of data transmitted by each sender is relatively small, incast congestion may be avoided. However, when the amount of data transmitted by the senders exceeds the available buffering at the receiver's access port, data packets that were transmitted by a sender may be lost and therefore, not be received by the receiver. Hence, throughput can decline due to one or more TCP connections experiencing time out caused by data packet drops and/or loss. - For instance, assume that the
environment 200 includes ten servers 222-226 and an allocator that assigns one or more of the servers 222-226 to provide data in response to a request for that data. In various embodiments, if the servers 222-226 send their respective data packets to a receiver at approximately the same time, the receiver may not have available bandwidth to receive the data packets (i.e., incast congestion). As a result, data packets may be lost and the server 222-226 that transmitted the lost data packet(s) may need to retransmit those data packets. Accordingly, provided that the receiver requested a particular piece of data from the servers 222-226, the receiver may need to wait for the lost data packet to be retransmitted in order to receive the data responsive to the request. That is, the performance ofenvironment 200 may be dependent upon the TCP connections between servers 222-226 and the receiver. Therefore, the time associated with retransmitting the lost data packets may cause unneeded delay in theenvironment 200. -
FIG. 3 illustrates an example, of buffer (queue) of a network device 300 (e.g., similar tonetwork device 110, discussed above with respect toFIG. 1 ). As illustrated,network device 300 includesbuffer 302, which stores multiple packets e.g., packets ‘A,’ ‘B,’ ‘C,’ and ‘D.’Network device 300 also include multiple network connections, e.g., a dequeue channel, which removes data frombuffer 302, and multiple enqueued channels, incoming data from which is stored intobuffer 302. - In practice,
network device 300 receives data via the multiple enqueue channels, and stores the data inbuffer 302. When properly functioning,network device 300 will dequeue the data inbuffer 302 at a rate that is equal to, or faster than, a rate at which new data is being stored or added tobuffer 302. However, in some instances new data (e.g., packets) are stored to buffer 302 at a rate exceeding that at which stored data (packets) its data can be dequeued. In such instances, buffer 302 can fill to capacity, and subsequently received packets, such as packet ‘E’ are dropped. As discussed above, an incast event can occur when multiple enqueue channels are used to push data/packets ontobuffer 302 faster than the data/packets can be dequeued. - To better understand the nature of an incast event, it can be helpful to know more about the network conditions preceding the event, for example, by observing the packets stored in
buffer 302 before the incast event occurred. In practice, data may be collected about the packets stored to buffer 302, for example by capturing packet header metadata for each packet as it is dequeue frombuffer 302. The storing/capturing of packet metadata can be initialized by the detection of an incast event and can be continued, for example, until a marked/flag packet is dequeue. In such implementations the marked/flagged packet can be a packet last stored to buffer 302, before the incast event was detected. That is, upon detection of an incast event, the packet last stored to buffer 302 can be flagged/marked e.g., to indicate a time immediately preceding a packet drop. In some implementations the marking/flagging of a last packet stored to buffer 302 can be performed by modifying packet header information of the marked packet. - In the example illustrated in
FIG. 3 , packet ‘D’ is a last packet stored to buffer 302. Packet ‘E’ represents a first packet dropped afterbuffer 302 is filled, e.g., due to data incast. As illustrated, packet ‘D’ has been marked, bynetwork device 300, such that a bit in the packet header has been flipped, distinguishing packet ‘D’ from the other packets inbuffer 302. It is understood that the foregoing implementation depicted byFIG. 3 is merely an illustration of an example marking process. However, depending on implementation, the manner and/or process in which packet marking is performed may vary. - Further to the example of
FIG. 3 , as each of the stored packets are dequeue, the respective metadata information for each packet can be captured/recorded and stored for later analysis. By better understanding the nature of packets contained inbuffer 302 when incast event occurs, network administrators may better troubleshoot the causes of incast events. - It is understood that packet metadata information may be analyzed locally, or remotely (e.g., across one or more remote collectors), depending on the desired implementation. That is, packet metadata may be stored and/or analyzed locally, e.g. on a network device in which metadata information is captured. Alternatively, any portion of captured metadata information may be sent to one or more remote systems/collectors for further storage and/or analysis.
-
FIG. 4 . illustrates an example block diagram of aprocess 400 that can be used to implement aspects of the subject technology.Process 400 begins withstep 402, in which one or more data packets are received at a network device. It is understood that the network device can include any of a variety of network enabled, processor-based devices, such as one or more switches (e.g. TOR switches) or routers, etc. - In
step 402, each of the received data packets are stored in a buffer (e.g., a queue) associated with the network device. For example the packets can be stored in a queue or buffer as they are processed/routed e.g., before being dequeued and transmitted/routed to another node, or network end-point. - Subsequently, in
decision step 406, a packet drop condition is determined. If indecision step 406 it is determined that no packet drop has been detected,process 400 proceeds back to step 404, in which incoming packets continue to be stored in a queue of the network device. Alternatively, if indecision step 406 it is determined that a packet drop has been detected,process 400 proceeds to step 408 in which a packet presently stored in the queue (buffer) is marked, indicating a time marker before the drop event. As discussed in further detail below, the marked packet can be used to identify a time-frame for which packet information (for dequeued packets) is to be captured/collected. - Although any packet in the queue can be marked, in some implementations, the marked packet is the last packet enqueued before the drop event was detected. That is, the most recent packet stored to the buffer is identified and marked, for example, by modifying one or more bits in the packet header.
- In
step 410, packets stored in the buffer prior to the drop event are dequeued. In some implementations, the packets are dequeued in a particular order, such as in a first-in-first-our order. As such, the marked packet is the last packet to be dequeued, from among the set of total packets residing in the buffer when the packet drop was detected. In this manner, packet data (e.g., packet metadata) is captured for all packets residing in the buffer when the drop (incast event) occurred. In certain implementations, the capturing of packet metadata is stopped after the marked packet has been dequeued, e.g., once a ‘snap-shot’ of buffered metadata has been captured. - Subsequently, in
step 412, captured metadata information is analyzed, for example, to better understand the circumstances preceding the incast event. In some implementations, a network administrator, or other user diagnosing the cause of a packet drop event, may find such information useful, for example, in determining what applications or network paths/links are associated with the incast. For example, captured packet metadata can contain information indicating one or more originating applications, source/origination addresses, destination addresses, tenant network identifier(s), virtual local area network (VLAN) identification(s), etc. By better understanding the network conditions leading to an incast, network administrators are provided more information with which to diagnose network problems. - It is understood that any specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged, or that only a portion of the illustrated steps be performed. Some of the steps may be performed simultaneously. For example, in certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
- The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but are to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.”
- A phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology. A disclosure relating to an aspect may apply to all configurations, or one or more configurations. A phrase such as an aspect can refer to one or more aspects and vice versa. A phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology. A disclosure relating to a configuration may apply to all configurations, or one or more configurations. A phrase such as a configuration may refer to one or more configurations and vice versa.
- The word “exemplary” is used herein to mean “serving as an example or illustration.” Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
Claims (23)
1. A computer-implemented method comprising:
receiving a plurality of data packets at a network device;
storing, by the network device, each of the plurality of packets in a buffer;
detecting, by the network device, a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the buffer;
indicating a marked packet from among the plurality of received data packets, wherein the marked packet indicates a last packet enqueued prior to the packet drop event;
dequeuing each of the plurality of packets in the buffer; and
capturing metadata for each dequeued packet until the marked packet is dequeued.
2. The computer-implemented method of claim 1 , further comprising:
determining a cause of an incast event at the network device based on the metadata.
3. The computer-implemented method of claim 1 , further comprising:
sending the captured metadata for one or more of the dequeued packets to one or more remote collectors for further processing.
4. The computer-implemented method of claim 1 , further comprising:
identifying one or more applications associated with one or more of the plurality of packets in the buffer prior to the drop event.
5. The computer-implemented method of claim 1 , wherein the packet drop event corresponds with an incast event at the network device.
6. The computer-implemented method of claim 1 , wherein identifying the marked packet further comprises:
modifying, by the network device, packet header information associated with the marked packet.
7. The computer-implemented method of claim 1 , wherein the metadata for each dequeued packet comprises one or more of the following: a destination address, an origination address, a tenant network identifier, a protocol type, or a virtual local area network (VLAN) identification.
8. The computer-implemented method of claim 1 , wherein indicating the marked packet from among the plurality of received data packets, further comprises:
modifying packet header information of the marked packet.
9. A system for capturing metadata information after an incast event, the system comprising:
a memory; and
one or more processors coupled to the memory, wherein the one or more processors are configured to perform operations comprising:
receiving a plurality of data packets at a network device;
storing, by the network device, each of the plurality of packets in a buffer;
detecting, by the network device, a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the buffer;
indicating a marked packet from among the plurality of received data packets, wherein the marked packet indicates a last packet enqueued prior to the packet drop event;
dequeuing each of the plurality of packets in the buffer; and
capturing metadata for each dequeued packet until the marked packet is dequeued.
10. The system of claim 9 , wherein the one or more processors are further configured to perform operations comprising:
determining a cause of an incast event at the network device based on the metadata.
11. The system of claim 9 , wherein the one or more processors are further configured to perform operations comprising:
sending the captured metadata for one or more of the dequeued packets to one or more remote collectors for further processing.
12. The system of claim 9 , wherein the one or more processors are further configured to perform operations comprising:
identifying one or more applications associated with one or more of the plurality of packets in the buffer prior to the drop event.
13. The system of claim 9 , wherein the packet drop event corresponds with an incast event at the network device.
14. The system of claim 9 , wherein identifying the marked packet further comprises:
modifying, by the network device, packet header information associated with the marked packet.
15. The system of claim 9 , wherein the metadata for each dequeued packet comprises one or more of the following: a destination address, an origination address, a tenant network identifier, a protocol type, or a virtual local area network (VLAN) identification.
16. The system of claim 9 , wherein indicating the marked packet from among the plurality of received data packets, further comprises:
modifying packet header information of the marked packet.
17. A non-transitory computer-readable storage medium comprising instructions stored therein, which when executed by one or more processors, cause the processors to perform operations comprising:
receiving a plurality of data packets at a network device;
storing, by the network device, each of the plurality of packets in a buffer;
detecting, by the network device, a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the buffer;
indicating a marked packet from among the plurality of received data packets, wherein the marked packet indicates a last packet enqueued prior to the packet drop event;
dequeuing each of the plurality of packets in the buffer; and
capturing metadata for each dequeued packet until the marked packet is dequeued.
18. The non-transitory computer-readable storage medium of claim 17 , wherein the processors are further configured to perform operations comprising:
determining a cause of an incast event at the network device based on the metadata.
19. The non-transitory computer-readable storage medium of claim 17 , wherein the one or more processors are further configured to perform operations comprising:
sending the captured metadata for one or more of the dequeued packets to one or more remote collectors for further processing.
20. The non-transitory computer-readable storage medium of claim 17 , wherein the processors are further configured to perform operations comprising:
identifying one or more applications associated with one or more of the plurality of packets in the buffer prior to the drop event.
21. The non-transitory computer-readable storage medium of claim 17 , wherein the packet drop event corresponds with an incast event at the network device.
22. The non-transitory computer-readable storage medium of claim 17 , wherein identifying the marked packet further comprises:
modifying, by the network device, packet header information associated with the marked packet.
23. The non-transitory computer-readable storage medium of claim 17 , wherein the metadata for each dequeued packet comprises one or more of the following: a destination address, an origination address, a tenant network identifier, a protocol type, or virtual local area network (VLAN) identification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/484,181 US20150124824A1 (en) | 2013-11-05 | 2014-09-11 | Incast drop cause telemetry |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361900324P | 2013-11-05 | 2013-11-05 | |
US14/484,181 US20150124824A1 (en) | 2013-11-05 | 2014-09-11 | Incast drop cause telemetry |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150124824A1 true US20150124824A1 (en) | 2015-05-07 |
Family
ID=53007003
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/484,181 Abandoned US20150124824A1 (en) | 2013-11-05 | 2014-09-11 | Incast drop cause telemetry |
US14/532,787 Active 2035-06-16 US9667551B2 (en) | 2013-11-05 | 2014-11-04 | Policy enforcement proxy |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/532,787 Active 2035-06-16 US9667551B2 (en) | 2013-11-05 | 2014-11-04 | Policy enforcement proxy |
Country Status (1)
Country | Link |
---|---|
US (2) | US20150124824A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9667551B2 (en) | 2013-11-05 | 2017-05-30 | Cisco Technology, Inc. | Policy enforcement proxy |
US9996653B1 (en) | 2013-11-06 | 2018-06-12 | Cisco Technology, Inc. | Techniques for optimizing dual track routing |
US10020989B2 (en) | 2013-11-05 | 2018-07-10 | Cisco Technology, Inc. | Provisioning services in legacy mode in a data center network |
US10079761B2 (en) | 2013-11-05 | 2018-09-18 | Cisco Technology, Inc. | Hierarchical routing with table management across hardware modules |
US10116493B2 (en) | 2014-11-21 | 2018-10-30 | Cisco Technology, Inc. | Recovering from virtual port channel peer failure |
US10142163B2 (en) | 2016-03-07 | 2018-11-27 | Cisco Technology, Inc | BFD over VxLAN on vPC uplinks |
US10148586B2 (en) | 2013-11-05 | 2018-12-04 | Cisco Technology, Inc. | Work conserving scheduler based on ranking |
US10164782B2 (en) | 2013-11-05 | 2018-12-25 | Cisco Technology, Inc. | Method and system for constructing a loop free multicast tree in a data-center fabric |
US10182496B2 (en) | 2013-11-05 | 2019-01-15 | Cisco Technology, Inc. | Spanning tree protocol optimization |
US10187302B2 (en) | 2013-11-05 | 2019-01-22 | Cisco Technology, Inc. | Source address translation in overlay networks |
US10193750B2 (en) | 2016-09-07 | 2019-01-29 | Cisco Technology, Inc. | Managing virtual port channel switch peers from software-defined network controller |
US10333828B2 (en) | 2016-05-31 | 2019-06-25 | Cisco Technology, Inc. | Bidirectional multicasting over virtual port channel |
US10382345B2 (en) | 2013-11-05 | 2019-08-13 | Cisco Technology, Inc. | Dynamic flowlet prioritization |
US10516612B2 (en) | 2013-11-05 | 2019-12-24 | Cisco Technology, Inc. | System and method for identification of large-data flows |
US10547509B2 (en) | 2017-06-19 | 2020-01-28 | Cisco Technology, Inc. | Validation of a virtual port channel (VPC) endpoint in the network fabric |
US10778584B2 (en) | 2013-11-05 | 2020-09-15 | Cisco Technology, Inc. | System and method for multi-path load balancing in network fabrics |
US10951522B2 (en) | 2013-11-05 | 2021-03-16 | Cisco Technology, Inc. | IP-based forwarding of bridged and routed IP packets and unicast ARP |
US11102129B2 (en) * | 2018-09-09 | 2021-08-24 | Mellanox Technologies, Ltd. | Adjusting rate of outgoing data requests for avoiding incast congestion |
US11159451B2 (en) | 2018-07-05 | 2021-10-26 | Cisco Technology, Inc. | Stretched EPG and micro-segmentation in multisite fabrics |
US20220038374A1 (en) * | 2019-04-10 | 2022-02-03 | At&T Intellectual Property I, L.P. | Microburst detection and management |
US11509501B2 (en) | 2016-07-20 | 2022-11-22 | Cisco Technology, Inc. | Automatic port verification and policy application for rogue devices |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9699070B2 (en) | 2013-10-04 | 2017-07-04 | Nicira, Inc. | Database protocol for exchanging forwarding state with hardware switches |
WO2015069576A1 (en) * | 2013-11-05 | 2015-05-14 | Cisco Technology, Inc. | Network fabric overlay |
US20150289001A1 (en) * | 2014-04-03 | 2015-10-08 | Piksel, Inc. | Digital Signage System |
CN105490995B (en) * | 2014-09-30 | 2018-04-20 | 国际商业机器公司 | A kind of method and apparatus that NVE E-Packets in NVO3 networks |
US10375043B2 (en) * | 2014-10-28 | 2019-08-06 | International Business Machines Corporation | End-to-end encryption in a software defined network |
US10205658B1 (en) * | 2015-01-08 | 2019-02-12 | Marvell Israel (M.I.S.L) Ltd. | Reducing size of policy databases using bidirectional rules |
US9800508B2 (en) * | 2015-01-09 | 2017-10-24 | Dell Products L.P. | System and method of flow shaping to reduce impact of incast communications |
EP3054646B1 (en) * | 2015-02-06 | 2017-03-22 | Axiomatics AB | Policy separation |
US9992202B2 (en) * | 2015-02-28 | 2018-06-05 | Aruba Networks, Inc | Access control through dynamic grouping |
US9942058B2 (en) | 2015-04-17 | 2018-04-10 | Nicira, Inc. | Managing tunnel endpoints for facilitating creation of logical networks |
US9825814B2 (en) * | 2015-05-28 | 2017-11-21 | Cisco Technology, Inc. | Dynamic attribute based application policy |
US10554484B2 (en) | 2015-06-26 | 2020-02-04 | Nicira, Inc. | Control plane integration with hardware switches |
US9819581B2 (en) | 2015-07-31 | 2017-11-14 | Nicira, Inc. | Configuring a hardware switch as an edge node for a logical router |
US9967182B2 (en) | 2015-07-31 | 2018-05-08 | Nicira, Inc. | Enabling hardware switches to perform logical routing functionalities |
US9847938B2 (en) | 2015-07-31 | 2017-12-19 | Nicira, Inc. | Configuring logical routers on hardware switches |
US10313186B2 (en) | 2015-08-31 | 2019-06-04 | Nicira, Inc. | Scalable controller for hardware VTEPS |
US10263828B2 (en) | 2015-09-30 | 2019-04-16 | Nicira, Inc. | Preventing concurrent distribution of network data to a hardware switch by multiple controllers |
US9948577B2 (en) | 2015-09-30 | 2018-04-17 | Nicira, Inc. | IP aliases in logical networks with hardware switches |
US9998324B2 (en) | 2015-09-30 | 2018-06-12 | Nicira, Inc. | Logical L3 processing for L2 hardware switches |
US10230576B2 (en) | 2015-09-30 | 2019-03-12 | Nicira, Inc. | Managing administrative statuses of hardware VTEPs |
US10079798B2 (en) | 2015-10-23 | 2018-09-18 | Inernational Business Machines Corporation | Domain intercommunication in shared computing environments |
US9806911B2 (en) | 2015-11-02 | 2017-10-31 | International Business Machines Corporation | Distributed virtual gateway appliance |
US10250553B2 (en) | 2015-11-03 | 2019-04-02 | Nicira, Inc. | ARP offloading for managed hardware forwarding elements |
US9917799B2 (en) | 2015-12-15 | 2018-03-13 | Nicira, Inc. | Transactional controls for supplying control plane data to managed hardware forwarding elements |
US9998375B2 (en) * | 2015-12-15 | 2018-06-12 | Nicira, Inc. | Transactional controls for supplying control plane data to managed hardware forwarding elements |
CN108886515B (en) * | 2016-01-08 | 2021-06-15 | 百通股份有限公司 | Method and protection device for preventing malicious information communication in an IP network by utilizing a benign networking protocol |
US10200343B2 (en) | 2016-06-29 | 2019-02-05 | Nicira, Inc. | Implementing logical network security on a hardware switch |
US10868737B2 (en) * | 2016-10-26 | 2020-12-15 | Arizona Board Of Regents On Behalf Of Arizona State University | Security policy analysis framework for distributed software defined networking (SDN) based cloud environments |
US10581744B2 (en) * | 2016-12-02 | 2020-03-03 | Cisco Technology, Inc. | Group-based pruning in a software defined networking environment |
US10171344B1 (en) * | 2017-02-02 | 2019-01-01 | Cisco Technology, Inc. | Isolation of endpoints within an endpoint group |
US10382390B1 (en) | 2017-04-28 | 2019-08-13 | Cisco Technology, Inc. | Support for optimized microsegmentation of end points using layer 2 isolation and proxy-ARP within data center |
US10382265B1 (en) * | 2017-08-28 | 2019-08-13 | Juniper Networks, Inc. | Reversible yang-based translators |
US10855766B2 (en) * | 2017-09-28 | 2020-12-01 | Intel Corporation | Networking switch with object storage system intelligence |
US10728288B2 (en) | 2017-11-21 | 2020-07-28 | Juniper Networks, Inc. | Policy-driven workload launching based on software defined networking encryption policies |
US10742690B2 (en) | 2017-11-21 | 2020-08-11 | Juniper Networks, Inc. | Scalable policy management for virtual networks |
US11489872B2 (en) * | 2018-05-10 | 2022-11-01 | Jayant Shukla | Identity-based segmentation of applications and containers in a dynamic environment |
US10742557B1 (en) * | 2018-06-29 | 2020-08-11 | Juniper Networks, Inc. | Extending scalable policy management to supporting network devices |
US10778724B1 (en) | 2018-06-29 | 2020-09-15 | Juniper Networks, Inc. | Scalable port range management for security policies |
US11178071B2 (en) | 2018-07-05 | 2021-11-16 | Cisco Technology, Inc. | Multisite interconnect and policy with switching fabrics |
US11394693B2 (en) * | 2019-03-04 | 2022-07-19 | Cyxtera Cybersecurity, Inc. | Establishing network tunnel in response to access request |
US11201800B2 (en) | 2019-04-03 | 2021-12-14 | Cisco Technology, Inc. | On-path dynamic policy enforcement and endpoint-aware policy enforcement for endpoints |
US11184325B2 (en) | 2019-06-04 | 2021-11-23 | Cisco Technology, Inc. | Application-centric enforcement for multi-tenant workloads with multi site data center fabrics |
US11216309B2 (en) | 2019-06-18 | 2022-01-04 | Juniper Networks, Inc. | Using multidimensional metadata tag sets to determine resource allocation in a distributed computing environment |
US11171992B2 (en) | 2019-07-29 | 2021-11-09 | Cisco Technology, Inc. | System resource management in self-healing networks |
CN113132326B (en) * | 2019-12-31 | 2022-08-09 | 华为技术有限公司 | Access control method, device and system |
US11418435B2 (en) * | 2020-01-31 | 2022-08-16 | Cisco Technology, Inc. | Inband group-based network policy using SRV6 |
US20210266255A1 (en) * | 2020-02-24 | 2021-08-26 | Cisco Technology, Inc. | Vrf segregation for shared services in multi-fabric cloud networks |
US11700236B2 (en) | 2020-02-27 | 2023-07-11 | Juniper Networks, Inc. | Packet steering to a host-based firewall in virtualized environments |
US11277447B2 (en) | 2020-07-17 | 2022-03-15 | Cisco Technology, Inc. | Distributed policy enforcement proxy with dynamic EPG sharding |
WO2022017582A1 (en) * | 2020-07-21 | 2022-01-27 | Siemens Aktiengesellschaft | Method and system for securing data communication in a computing environment |
US11743189B2 (en) * | 2020-09-14 | 2023-08-29 | Microsoft Technology Licensing, Llc | Fault tolerance for SDN gateways using network switches |
US11570109B2 (en) * | 2021-04-28 | 2023-01-31 | Cisco Technology, Inc. | Software-defined service insertion for network fabrics |
US11502872B1 (en) * | 2021-06-07 | 2022-11-15 | Cisco Technology, Inc. | Isolation of clients within a virtual local area network (VLAN) in a fabric network |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020146026A1 (en) * | 2000-05-14 | 2002-10-10 | Brian Unitt | Data stream filtering apparatus & method |
US20030035385A1 (en) * | 2001-08-09 | 2003-02-20 | William Walsh | Method, apparatus, and system for identifying and efficiently treating classes of traffic |
US20030097461A1 (en) * | 2001-11-08 | 2003-05-22 | Paul Barham | System and method for controlling network demand via congestion pricing |
US20030137940A1 (en) * | 1998-11-24 | 2003-07-24 | Schwartz Steven J. | Pass/drop apparatus and method for network switching node |
US20030174650A1 (en) * | 2002-03-15 | 2003-09-18 | Broadcom Corporation | Weighted fair queuing (WFQ) shaper |
US20030231646A1 (en) * | 2002-06-14 | 2003-12-18 | Chandra Prashant R. | Method and system for efficient random packet enqueue, drop or mark processing in network traffic |
US20040062259A1 (en) * | 2002-09-27 | 2004-04-01 | International Business Machines Corporation | Token-based active queue management |
US20040100901A1 (en) * | 2002-11-27 | 2004-05-27 | International Business Machines Corporation | Method and apparatus for automatic congestion avoidance for differentiated service flows |
US20050007961A1 (en) * | 2003-07-09 | 2005-01-13 | Fujitsu Network Communications, Inc. | Processing data packets using markers |
US20060198315A1 (en) * | 2005-03-02 | 2006-09-07 | Fujitsu Limited | Communication apparatus |
US20060221835A1 (en) * | 2005-03-30 | 2006-10-05 | Cisco Technology, Inc. | Converting a network device from data rate traffic management to packet rate |
US20070223372A1 (en) * | 2006-03-23 | 2007-09-27 | Lucent Technologies Inc. | Method and apparatus for preventing congestion in load-balancing networks |
US20070274229A1 (en) * | 2006-05-24 | 2007-11-29 | Sbc Knowledge Ventures, L.P. | Method and apparatus for reliable communications in a packet network |
US20080031247A1 (en) * | 2006-08-04 | 2008-02-07 | Fujitsu Limited | Network device and data control program |
US20090122805A1 (en) * | 2007-11-14 | 2009-05-14 | Gary Paul Epps | Instrumenting packet flows |
US20090268614A1 (en) * | 2006-12-18 | 2009-10-29 | British Telecommunications Public Limited Company | Method and system for congestion marking |
US20100128619A1 (en) * | 2007-10-30 | 2010-05-27 | Sony Corporation | Relay device, relay method, and program |
US7826469B1 (en) * | 2009-03-09 | 2010-11-02 | Juniper Networks, Inc. | Memory utilization in a priority queuing system of a network device |
US20110158248A1 (en) * | 2009-12-24 | 2011-06-30 | Juniper Networks, Inc. | Dynamic prioritized fair share scheduling scheme in over-subscribed port scenario |
US20110310738A1 (en) * | 2010-06-22 | 2011-12-22 | Verizon Patent And Licensing, Inc. | Congestion buffer control in wireless networks |
US20120063318A1 (en) * | 2002-04-04 | 2012-03-15 | Juniper Networks, Inc. | Dequeuing and congestion control systems and methods for single stream multicast |
US20120281697A1 (en) * | 2010-06-24 | 2012-11-08 | Xiaofeng Huang | Method, device and system for implementing multicast |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7530112B2 (en) * | 2003-09-10 | 2009-05-05 | Cisco Technology, Inc. | Method and apparatus for providing network security using role-based access control |
US7877796B2 (en) * | 2004-11-16 | 2011-01-25 | Cisco Technology, Inc. | Method and apparatus for best effort propagation of security group information |
US7840708B2 (en) * | 2007-08-13 | 2010-11-23 | Cisco Technology, Inc. | Method and system for the assignment of security group information using a proxy |
US20150124824A1 (en) | 2013-11-05 | 2015-05-07 | Cisco Technology, Inc. | Incast drop cause telemetry |
-
2014
- 2014-09-11 US US14/484,181 patent/US20150124824A1/en not_active Abandoned
- 2014-11-04 US US14/532,787 patent/US9667551B2/en active Active
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030137940A1 (en) * | 1998-11-24 | 2003-07-24 | Schwartz Steven J. | Pass/drop apparatus and method for network switching node |
US20020146026A1 (en) * | 2000-05-14 | 2002-10-10 | Brian Unitt | Data stream filtering apparatus & method |
US20030035385A1 (en) * | 2001-08-09 | 2003-02-20 | William Walsh | Method, apparatus, and system for identifying and efficiently treating classes of traffic |
US20030097461A1 (en) * | 2001-11-08 | 2003-05-22 | Paul Barham | System and method for controlling network demand via congestion pricing |
US20030174650A1 (en) * | 2002-03-15 | 2003-09-18 | Broadcom Corporation | Weighted fair queuing (WFQ) shaper |
US20120063318A1 (en) * | 2002-04-04 | 2012-03-15 | Juniper Networks, Inc. | Dequeuing and congestion control systems and methods for single stream multicast |
US20030231646A1 (en) * | 2002-06-14 | 2003-12-18 | Chandra Prashant R. | Method and system for efficient random packet enqueue, drop or mark processing in network traffic |
US20040062259A1 (en) * | 2002-09-27 | 2004-04-01 | International Business Machines Corporation | Token-based active queue management |
US20040100901A1 (en) * | 2002-11-27 | 2004-05-27 | International Business Machines Corporation | Method and apparatus for automatic congestion avoidance for differentiated service flows |
US20050007961A1 (en) * | 2003-07-09 | 2005-01-13 | Fujitsu Network Communications, Inc. | Processing data packets using markers |
US20060198315A1 (en) * | 2005-03-02 | 2006-09-07 | Fujitsu Limited | Communication apparatus |
US20060221835A1 (en) * | 2005-03-30 | 2006-10-05 | Cisco Technology, Inc. | Converting a network device from data rate traffic management to packet rate |
US20070223372A1 (en) * | 2006-03-23 | 2007-09-27 | Lucent Technologies Inc. | Method and apparatus for preventing congestion in load-balancing networks |
US20070274229A1 (en) * | 2006-05-24 | 2007-11-29 | Sbc Knowledge Ventures, L.P. | Method and apparatus for reliable communications in a packet network |
US20080031247A1 (en) * | 2006-08-04 | 2008-02-07 | Fujitsu Limited | Network device and data control program |
US20090268614A1 (en) * | 2006-12-18 | 2009-10-29 | British Telecommunications Public Limited Company | Method and system for congestion marking |
US20100128619A1 (en) * | 2007-10-30 | 2010-05-27 | Sony Corporation | Relay device, relay method, and program |
US20090122805A1 (en) * | 2007-11-14 | 2009-05-14 | Gary Paul Epps | Instrumenting packet flows |
US7826469B1 (en) * | 2009-03-09 | 2010-11-02 | Juniper Networks, Inc. | Memory utilization in a priority queuing system of a network device |
US20110158248A1 (en) * | 2009-12-24 | 2011-06-30 | Juniper Networks, Inc. | Dynamic prioritized fair share scheduling scheme in over-subscribed port scenario |
US20110310738A1 (en) * | 2010-06-22 | 2011-12-22 | Verizon Patent And Licensing, Inc. | Congestion buffer control in wireless networks |
US20120281697A1 (en) * | 2010-06-24 | 2012-11-08 | Xiaofeng Huang | Method, device and system for implementing multicast |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10382345B2 (en) | 2013-11-05 | 2019-08-13 | Cisco Technology, Inc. | Dynamic flowlet prioritization |
US10778584B2 (en) | 2013-11-05 | 2020-09-15 | Cisco Technology, Inc. | System and method for multi-path load balancing in network fabrics |
US11411770B2 (en) | 2013-11-05 | 2022-08-09 | Cisco Technology, Inc. | Virtual port channel bounce in overlay network |
US10079761B2 (en) | 2013-11-05 | 2018-09-18 | Cisco Technology, Inc. | Hierarchical routing with table management across hardware modules |
US10516612B2 (en) | 2013-11-05 | 2019-12-24 | Cisco Technology, Inc. | System and method for identification of large-data flows |
US11528228B2 (en) | 2013-11-05 | 2022-12-13 | Cisco Technology, Inc. | System and method for multi-path load balancing in network fabrics |
US10148586B2 (en) | 2013-11-05 | 2018-12-04 | Cisco Technology, Inc. | Work conserving scheduler based on ranking |
US10164782B2 (en) | 2013-11-05 | 2018-12-25 | Cisco Technology, Inc. | Method and system for constructing a loop free multicast tree in a data-center fabric |
US10182496B2 (en) | 2013-11-05 | 2019-01-15 | Cisco Technology, Inc. | Spanning tree protocol optimization |
US10187302B2 (en) | 2013-11-05 | 2019-01-22 | Cisco Technology, Inc. | Source address translation in overlay networks |
US9667551B2 (en) | 2013-11-05 | 2017-05-30 | Cisco Technology, Inc. | Policy enforcement proxy |
US10225179B2 (en) | 2013-11-05 | 2019-03-05 | Cisco Technology, Inc. | Virtual port channel bounce in overlay network |
US11018898B2 (en) | 2013-11-05 | 2021-05-25 | Cisco Technology, Inc. | Multicast multipathing in an overlay network |
US10374878B2 (en) | 2013-11-05 | 2019-08-06 | Cisco Technology, Inc. | Forwarding tables for virtual networking devices |
US10020989B2 (en) | 2013-11-05 | 2018-07-10 | Cisco Technology, Inc. | Provisioning services in legacy mode in a data center network |
US10951522B2 (en) | 2013-11-05 | 2021-03-16 | Cisco Technology, Inc. | IP-based forwarding of bridged and routed IP packets and unicast ARP |
US10904146B2 (en) | 2013-11-05 | 2021-01-26 | Cisco Technology, Inc. | Hierarchical routing with table management across hardware modules |
US10581635B2 (en) | 2013-11-05 | 2020-03-03 | Cisco Technology, Inc. | Managing routing information for tunnel endpoints in overlay networks |
US10606454B2 (en) | 2013-11-05 | 2020-03-31 | Cisco Technology, Inc. | Stage upgrade of image versions on devices in a cluster |
US10623206B2 (en) | 2013-11-05 | 2020-04-14 | Cisco Technology, Inc. | Multicast multipathing in an overlay network |
US10652163B2 (en) | 2013-11-05 | 2020-05-12 | Cisco Technology, Inc. | Boosting linked list throughput |
US11811555B2 (en) | 2013-11-05 | 2023-11-07 | Cisco Technology, Inc. | Multicast multipathing in an overlay network |
US11888746B2 (en) | 2013-11-05 | 2024-01-30 | Cisco Technology, Inc. | System and method for multi-path load balancing in network fabrics |
US11625154B2 (en) | 2013-11-05 | 2023-04-11 | Cisco Technology, Inc. | Stage upgrade of image versions on devices in a cluster |
US10776553B2 (en) | 2013-11-06 | 2020-09-15 | Cisco Technology, Inc. | Techniques for optimizing dual track routing |
US9996653B1 (en) | 2013-11-06 | 2018-06-12 | Cisco Technology, Inc. | Techniques for optimizing dual track routing |
US10819563B2 (en) | 2014-11-21 | 2020-10-27 | Cisco Technology, Inc. | Recovering from virtual port channel peer failure |
US10116493B2 (en) | 2014-11-21 | 2018-10-30 | Cisco Technology, Inc. | Recovering from virtual port channel peer failure |
US10142163B2 (en) | 2016-03-07 | 2018-11-27 | Cisco Technology, Inc | BFD over VxLAN on vPC uplinks |
US10333828B2 (en) | 2016-05-31 | 2019-06-25 | Cisco Technology, Inc. | Bidirectional multicasting over virtual port channel |
US11509501B2 (en) | 2016-07-20 | 2022-11-22 | Cisco Technology, Inc. | Automatic port verification and policy application for rogue devices |
US10193750B2 (en) | 2016-09-07 | 2019-01-29 | Cisco Technology, Inc. | Managing virtual port channel switch peers from software-defined network controller |
US10749742B2 (en) | 2016-09-07 | 2020-08-18 | Cisco Technology, Inc. | Managing virtual port channel switch peers from software-defined network controller |
US11438234B2 (en) | 2017-06-19 | 2022-09-06 | Cisco Technology, Inc. | Validation of a virtual port channel (VPC) endpoint in the network fabric |
US10873506B2 (en) | 2017-06-19 | 2020-12-22 | Cisco Technology, Inc. | Validation of a virtual port channel (VPC) endpoint in the network fabric |
US10547509B2 (en) | 2017-06-19 | 2020-01-28 | Cisco Technology, Inc. | Validation of a virtual port channel (VPC) endpoint in the network fabric |
US11159451B2 (en) | 2018-07-05 | 2021-10-26 | Cisco Technology, Inc. | Stretched EPG and micro-segmentation in multisite fabrics |
US11949602B2 (en) | 2018-07-05 | 2024-04-02 | Cisco Technology, Inc. | Stretched EPG and micro-segmentation in multisite fabrics |
US11102129B2 (en) * | 2018-09-09 | 2021-08-24 | Mellanox Technologies, Ltd. | Adjusting rate of outgoing data requests for avoiding incast congestion |
US20220038374A1 (en) * | 2019-04-10 | 2022-02-03 | At&T Intellectual Property I, L.P. | Microburst detection and management |
Also Published As
Publication number | Publication date |
---|---|
US9667551B2 (en) | 2017-05-30 |
US20150124809A1 (en) | 2015-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20150124824A1 (en) | Incast drop cause telemetry | |
CN111201757B (en) | Network access node virtual structure dynamically configured on underlying network | |
EP3151470B1 (en) | Analytics for a distributed network | |
US7593331B2 (en) | Enhancing transmission reliability of monitored data | |
US8005012B1 (en) | Traffic analysis of data flows | |
Gebert et al. | Internet access traffic measurement and analysis | |
US20100054123A1 (en) | Method and device for hign utilization and efficient flow control over networks with long transmission latency | |
US20060153092A1 (en) | Active response communications network tap | |
US20210297350A1 (en) | Reliable fabric control protocol extensions for data center networks with unsolicited packet spraying over multiple alternate data paths | |
US20210297351A1 (en) | Fabric control protocol with congestion control for data center networks | |
US11102273B2 (en) | Uplink performance management | |
US20100226384A1 (en) | Method for reliable transport in data networks | |
WO2018144234A1 (en) | Data bandwidth overhead reduction in a protocol based communication over a wide area network (wan) | |
US8571049B2 (en) | Setting and changing queue sizes in line cards | |
JP2009055114A (en) | Communication device, communication system, transfer efficiency improvement method, and transfer efficiency improvement program | |
US9525635B2 (en) | Network communication apparatus and method of preferential band limitation of transfer frame | |
Marian et al. | Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints | |
US8351426B2 (en) | Ethernet virtualization using assisted frame correction | |
WO2019061302A1 (en) | Message processing method and device | |
US9413627B2 (en) | Data unit counter | |
US8650323B2 (en) | Managing multi-step retry reinitialization protocol flows | |
US20210297343A1 (en) | Reliable fabric control protocol extensions for data center networks with failure resilience | |
US20230403233A1 (en) | Congestion notification in a multi-queue environment | |
US11451998B1 (en) | Systems and methods for communication system resource contention monitoring | |
WO2023280004A1 (en) | Network configuration method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EDSALL, THOMAS J.;ALIZADEH ATTAR, MOHAMMADREZA;SIGNING DATES FROM 20140910 TO 20140911;REEL/FRAME:033725/0658 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |