US20150124824A1 - Incast drop cause telemetry - Google Patents

Incast drop cause telemetry Download PDF

Info

Publication number
US20150124824A1
US20150124824A1 US14/484,181 US201414484181A US2015124824A1 US 20150124824 A1 US20150124824 A1 US 20150124824A1 US 201414484181 A US201414484181 A US 201414484181A US 2015124824 A1 US2015124824 A1 US 2015124824A1
Authority
US
United States
Prior art keywords
packet
packets
network device
buffer
dequeued
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/484,181
Inventor
Thomas J. Edsall
Mohammadreza Alizadeh Attar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cisco Technology Inc
Original Assignee
Cisco Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cisco Technology Inc filed Critical Cisco Technology Inc
Priority to US14/484,181 priority Critical patent/US20150124824A1/en
Assigned to CISCO TECHNOLOGY, INC. reassignment CISCO TECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIZADEH ATTAR, MOHAMMADREZA, EDSALL, THOMAS J.
Publication of US20150124824A1 publication Critical patent/US20150124824A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/20Traffic policing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4641Virtual LANs, VLANs, e.g. virtual private networks [VPN]
    • H04L12/4645Details on frame tagging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/16Implementing security features at a particular protocol layer
    • H04L63/164Implementing security features at a particular protocol layer at the network layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0894Policy-based network configuration management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0895Configuration of virtualised networks or elements, e.g. virtualised network function or OpenFlow elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/74591Address table lookup; Address filtering using content-addressable memories [CAM]

Definitions

  • the subject technology relates to data gathering for packets that are enqueued and dequeued in a buffer and in particular, for collecting packet metadata for use in analyzing incast events.
  • Packet losses can cause timeouts at the transport and application levels, leading to a loss of throughput and an increase in flow transfer times and the number of aborted jobs.
  • FIG. 1 illustrates an example network device, according to certain aspects of the subject technology.
  • FIG. 2 illustrates an example of a network configuration in which an incast event can occur, according to some implementations.
  • FIG. 3 illustrates a conceptual block diagram of a buffer implemented in a network device, according to some aspects.
  • FIG. 4 illustrates a block diagram of an example method for capturing packet metadata, according to some implementations.
  • a computer network is a geographically distributed collection of nodes interconnected by communication links and segments for transporting data between endpoints, such as personal computers and workstations.
  • endpoints such as personal computers and workstations.
  • Many types of networks are available, with the types ranging from local area networks (LANs) and wide area networks (WANs) to overlay and software-defined networks, such as virtual extensible local area networks (VXLANs).
  • LANs local area networks
  • WANs wide area networks
  • VXLANs virtual extensible local area networks
  • LANs typically connect nodes over dedicated private communication links located in the same geographic region, such as a building or campus.
  • WANs typically connect geographically dispersed nodes over long-distance communications links, such as common carrier telephone lines, optical lightpaths, synchronous optical networks (SONET), or synchronous digital hierarchy (SDH) links.
  • LANs and WANs can include layer 2 (L2) and/or layer 3 (L3) networks and devices.
  • the Internet is an example of a WAN that connects disparate networks throughout the world, providing global communication between nodes on various networks.
  • the nodes typically communicate over the network by exchanging discrete frames or packets of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • a protocol can refer to a set of rules defining how the nodes interact with each other.
  • Computer networks may be further interconnected by an intermediate network node, such as a router, to extend the effective “size” of each network.
  • Transmission Control Protocol is widely used to provide reliable, ordered delivery of data from one network entity to another. More particularly, TCP is frequently relied upon to implement Internet applications, such as, for example, the World Wide Web, e-mail, and file transfer.
  • TCP Transmission Control Protocol
  • multiple servers may independently send data to a single common receiver. Provided that the multiple senders simultaneously transmit data to the receiver, congestion, or incast congestion, can occur if the receiver is not capable of receiving the quantity of data being transmitted.
  • incast or “fan-in” congestion can lead to bursty losses and TCP timeouts.
  • incasting occurs when multiple sources simultaneously transfer data to a common client/receiver, overwhelming the buffers to which the client is connected. Incast can cause severe losses of throughput and vastly increase flow transfer times, making its prevention an important factor in ensuring reliable packet delivery across data center interconnect fabrics.
  • HRTs are designed to drastically reduce the minimum retransmission timeout (min-RTO).
  • min-RTO minimum retransmission timeout
  • the approach of reducing the value of the TCP's min-RTO has the effect of drastically reducing the amount of time a TCP source is timed out after bursty packet losses.
  • high resolution timers can be difficult to implement, especially in virtual-machine-rich environments. For instance, reducing min-RTO can require making operating system-specific changes to the TCP stack—imposing potentially serious deployment challenges because of the widespread use of closed-source operating systems like Windows and legacy operating systems.
  • incast events may be predicted, for example by monitoring a rate at which packets are dequeued from a buffer, as compared to a buffer fill rate.
  • a rate at which packets are dequeued from a buffer as compared to a buffer fill rate.
  • information is often of limited use because a given buffer can (on average) be empty—thus, time varying measurements based on bandwidth utilization, or on buffer use, may be too coarse-grained to yield insight into the actual cause/s of an incast event.
  • the subject technology addresses the foregoing need by providing a way to capture data about packets enqueued just before an incast occurrence. With information regarding enqueued packets, network administrators can better analyze and understand the conditions leading to an overflow event. Enqueued packet information can yield clues as to the systemic cause of an incast event, for example, by providing information regarding source(s) and/or destination(s) of buffered packets, as well information identifying application(s) for which they are associated.
  • the subject technology can be implemented by capturing packet metadata for packets residing in a buffer when an incast occurs.
  • packet metadata for each dequeued packet can be captured, up to the last packet that was added to the buffer before the incast is detected.
  • a last packet stored to the buffer can be marked or “flagged” upon the detection of an incast event.
  • packet metadata is captured (e.g., packet header information can be recorded) as each packet is subsequently dequeued. The recordation/capturing of dequeued packet metadata can continue until it is determined that the flagged packet has been dequeued.
  • a “snapshot” of packet metadata e.g., representing all packets in the filled buffer (before the incast event), can be recorded for later analysis.
  • packet metadata e.g., representing all packets in the filled buffer (before the incast event)
  • FIGS. 1 and 2 A brief introductory description of example systems and networks for which metadata information can be captured, as illustrated in FIGS. 1 and 2 , is disclosed herein.
  • FIG. 1 illustrates an example network device 110 (e.g., a router) suitable for implementing the present invention.
  • Network device 110 includes a master central processing unit (CPU) 162 , interfaces 168 , and bus 115 (e.g., a PCI bus).
  • CPU 162 When acting under the control of appropriate software or firmware, CPU 162 is responsible for executing packet management, error detection, and/or routing functions, such as miscabling detection functions, for example. CPU 162 can accomplish all these functions under the control of software including an operating system and any appropriate applications software.
  • CPU 162 may include one or more processors 163 such as a processor from the Motorola family of microprocessors or the MIPS family of microprocessors. In alternative aspects, processor 163 is specially designed hardware for controlling the operations of router 110 .
  • memory 161 such as non-volatile RAM and/or ROM also forms part of CPU 162 . However, there are many different ways in which memory could be coupled to the system.
  • Interfaces 168 are typically provided as interface cards (sometimes referred to as “line cards”). Generally, they control the sending and receiving of data packets over the network and sometimes support other peripherals used with router 110 .
  • interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like.
  • various very high-speed interfaces may be provided such as fast token ring interfaces, wireless interfaces, Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces and the like.
  • FIG. 1 illustrates an example of a network device implementation
  • it is not the only network device architecture on which the subject technology may be implemented.
  • an architecture having a single processor that handles communications as well as routing computations, etc. is often used.
  • other types of interfaces and media can also be implemented.
  • FIG. 2 illustrates a data center network structure in which an environment 100 includes Top of Rack (TOR) switches 202 - 208 , aggregate switches 210 and 212 , aggregate routers 214 and 216 , an access router 218 , and Internet 220 .
  • FIG. 1 illustrates an example of how one of the TOR switches 208 can be connected to a plurality of servers 222 - 226 .
  • each of TOR switches 202 - 208 can be similarly connected to the same plurality of servers 222 - 226 and/or different servers.
  • the environment 200 may represent a basic topology, at the abstract level of a data center network.
  • each of the TOR switches 202 - 208 may be connected to each of the aggregate switches 210 .
  • TOR switch 202 can be connected to both aggregate switch 210 and aggregate switch 212 .
  • each of aggregate switches 210 and 212 can be connected to each of the aggregate routers 214 and 216 , which may be connected to the access router 218 .
  • access router 218 can be connected to Internet 220 . It is contemplated that any number of TOR switches 202 - 208 , aggregate switches 210 and 212 , aggregate routers 214 and 216 , and access routers 218 can be implemented in environment 200 .
  • a network data center may be a facility used to house computer systems and associated components, such as TOR switches 202 - 208 , aggregate switches 210 and 212 , aggregate routers 214 and 216 and/or access router 218 , for example.
  • TOR switches 202 - 208 can refer to small port count switches that are situated on the top or near the top of a rack included in a network data center.
  • aggregate switches 210 and 212 can be used to increase the link speed beyond the limits of any single cable or port.
  • each of TOR switches 202 - 208 may be connected to a plurality of servers 222 - 226 . Although three servers 222 - 226 are shown in FIG. 1 , it is contemplated that each TOR switch 202 - 208 can be connected to any number of servers 222 - 226 .
  • TOR switch 208 is representative of TOR switches 202 - 206 and it may be directly connected to servers 222 - 226 .
  • TOR switch 208 may be connected to dozens of servers 222 - 226 .
  • the number of servers 222 - 226 under the same TOR switch 208 is from 44 to 48, and the TOR switch 208 is a 48-port Gigabit switch with one or multiple 10 Gigabit uplinks.
  • In the environment 200 such as a network data center, data may be stored on multiple servers 222 - 226 .
  • Incast congestion can occur when a file, or a portion thereof, is fetched from multiple of servers 222 - 226 . More specifically, incast congestion may occur when multiple senders (i.e., servers 222 - 226 ), which may be operating under the same TOR switch 202 - 208 , send data to a single receiver either simultaneously or at approximately the same time.
  • the receiver can include any type of server and/or computing device. Even if the senders simultaneously transmit data to the receiver, if the number of senders or the amount of data transmitted by each sender is relatively small, incast congestion may be avoided.
  • the environment 200 includes ten servers 222 - 226 and an allocator that assigns one or more of the servers 222 - 226 to provide data in response to a request for that data.
  • the servers 222 - 226 send their respective data packets to a receiver at approximately the same time, the receiver may not have available bandwidth to receive the data packets (i.e., incast congestion). As a result, data packets may be lost and the server 222 - 226 that transmitted the lost data packet(s) may need to retransmit those data packets.
  • the receiver may need to wait for the lost data packet to be retransmitted in order to receive the data responsive to the request. That is, the performance of environment 200 may be dependent upon the TCP connections between servers 222 - 226 and the receiver. Therefore, the time associated with retransmitting the lost data packets may cause unneeded delay in the environment 200 .
  • FIG. 3 illustrates an example, of buffer (queue) of a network device 300 (e.g., similar to network device 110 , discussed above with respect to FIG. 1 ).
  • network device 300 includes buffer 302 , which stores multiple packets e.g., packets ‘A,’ ‘B,’ ‘C,’ and ‘D.’
  • Network device 300 also include multiple network connections, e.g., a dequeue channel, which removes data from buffer 302 , and multiple enqueued channels, incoming data from which is stored into buffer 302 .
  • network device 300 receives data via the multiple enqueue channels, and stores the data in buffer 302 .
  • network device 300 will dequeue the data in buffer 302 at a rate that is equal to, or faster than, a rate at which new data is being stored or added to buffer 302 .
  • new data e.g., packets
  • buffer 302 can fill to capacity, and subsequently received packets, such as packet ‘E’ are dropped.
  • an incast event can occur when multiple enqueue channels are used to push data/packets onto buffer 302 faster than the data/packets can be dequeued.
  • an incast event it can be helpful to know more about the network conditions preceding the event, for example, by observing the packets stored in buffer 302 before the incast event occurred.
  • data may be collected about the packets stored to buffer 302 , for example by capturing packet header metadata for each packet as it is dequeue from buffer 302 .
  • the storing/capturing of packet metadata can be initialized by the detection of an incast event and can be continued, for example, until a marked/flag packet is dequeue.
  • the marked/flagged packet can be a packet last stored to buffer 302 , before the incast event was detected.
  • the packet last stored to buffer 302 can be flagged/marked e.g., to indicate a time immediately preceding a packet drop.
  • the marking/flagging of a last packet stored to buffer 302 can be performed by modifying packet header information of the marked packet.
  • packet ‘D’ is a last packet stored to buffer 302 .
  • Packet ‘E’ represents a first packet dropped after buffer 302 is filled, e.g., due to data incast.
  • packet ‘D’ has been marked, by network device 300 , such that a bit in the packet header has been flipped, distinguishing packet ‘D’ from the other packets in buffer 302 .
  • FIG. 3 is merely an illustration of an example marking process. However, depending on implementation, the manner and/or process in which packet marking is performed may vary.
  • the respective metadata information for each packet can be captured/recorded and stored for later analysis.
  • network administrators may better troubleshoot the causes of incast events.
  • packet metadata information may be analyzed locally, or remotely (e.g., across one or more remote collectors), depending on the desired implementation. That is, packet metadata may be stored and/or analyzed locally, e.g. on a network device in which metadata information is captured. Alternatively, any portion of captured metadata information may be sent to one or more remote systems/collectors for further storage and/or analysis.
  • FIG. 4 illustrates an example block diagram of a process 400 that can be used to implement aspects of the subject technology.
  • Process 400 begins with step 402 , in which one or more data packets are received at a network device.
  • the network device can include any of a variety of network enabled, processor-based devices, such as one or more switches (e.g. TOR switches) or routers, etc.
  • each of the received data packets are stored in a buffer (e.g., a queue) associated with the network device.
  • a buffer e.g., a queue
  • the packets can be stored in a queue or buffer as they are processed/routed e.g., before being dequeued and transmitted/routed to another node, or network end-point.
  • a packet drop condition is determined. If in decision step 406 it is determined that no packet drop has been detected, process 400 proceeds back to step 404 , in which incoming packets continue to be stored in a queue of the network device. Alternatively, if in decision step 406 it is determined that a packet drop has been detected, process 400 proceeds to step 408 in which a packet presently stored in the queue (buffer) is marked, indicating a time marker before the drop event. As discussed in further detail below, the marked packet can be used to identify a time-frame for which packet information (for dequeued packets) is to be captured/collected.
  • the marked packet is the last packet enqueued before the drop event was detected. That is, the most recent packet stored to the buffer is identified and marked, for example, by modifying one or more bits in the packet header.
  • step 410 packets stored in the buffer prior to the drop event are dequeued.
  • the packets are dequeued in a particular order, such as in a first-in-first-our order.
  • the marked packet is the last packet to be dequeued, from among the set of total packets residing in the buffer when the packet drop was detected.
  • packet data e.g., packet metadata
  • the capturing of packet metadata is stopped after the marked packet has been dequeued, e.g., once a ‘snap-shot’ of buffered metadata has been captured.
  • captured metadata information is analyzed, for example, to better understand the circumstances preceding the incast event.
  • a network administrator, or other user diagnosing the cause of a packet drop event may find such information useful, for example, in determining what applications or network paths/links are associated with the incast.
  • captured packet metadata can contain information indicating one or more originating applications, source/origination addresses, destination addresses, tenant network identifier(s), virtual local area network (VLAN) identification(s), etc.
  • any specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged, or that only a portion of the illustrated steps be performed. Some of the steps may be performed simultaneously. For example, in certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
  • a phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology.
  • a disclosure relating to an aspect may apply to all configurations, or one or more configurations.
  • a phrase such as an aspect can refer to one or more aspects and vice versa.
  • a phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology.
  • a disclosure relating to a configuration may apply to all configurations, or one or more configurations.
  • a phrase such as a configuration may refer to one or more configurations and vice versa.

Abstract

Aspects of the subject disclosure relate to ways to capture packet metadata following an incast event. In some implementations, a method of the subject technology can include steps for receiving a plurality of data packets at a network device, storing each of the plurality of packets in a buffer, and detecting a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the queue. In some aspects, the method can further include steps for indicating a marked packet from among the received data packets, dequeuing each of the plurality of packets in the buffer, capturing metadata for each dequeued packet until the marked packet is dequeued.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This application claims the benefit of U.S. Provisional Application No. 61/900,324, filed Nov. 5, 2013, entitled “SYSTEMS AND METHODS FOR DETERMINING METRICS AND WORKLOAD MANAGEMENT,” which is incorporated herein by reference in its entirety.
  • BACKGROUND Field of the Invention
  • The subject technology relates to data gathering for packets that are enqueued and dequeued in a buffer and in particular, for collecting packet metadata for use in analyzing incast events.
  • Introduction:
  • As data centers grow in the number of server nodes and operating speed of the interconnecting network, it has become challenging to ensure reliable packet delivery. Moreover, the workload in large data centers is generated by an increasingly heterogeneous mix of applications, such as search, retail, high-performance computing and storage, and social networking.
  • There are two main causes of packets loss/drops: (1) drops due to congestion episodes, particularly “incast” events, and (2) corruption on the channel due to increasing line rates. Packet losses can cause timeouts at the transport and application levels, leading to a loss of throughput and an increase in flow transfer times and the number of aborted jobs.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Certain features of the subject technology are set forth in the appended claims. However, the accompanying drawings, which are included to provide further understanding, illustrate disclosed aspects and together with the description serve to explain the principles of the subject technology. In the drawings:
  • FIG. 1 illustrates an example network device, according to certain aspects of the subject technology.
  • FIG. 2 illustrates an example of a network configuration in which an incast event can occur, according to some implementations.
  • FIG. 3 illustrates a conceptual block diagram of a buffer implemented in a network device, according to some aspects.
  • FIG. 4 illustrates a block diagram of an example method for capturing packet metadata, according to some implementations.
  • DETAILED DESCRIPTION
  • The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which aspects of the disclosure can be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a more thorough understanding of the subject technology. However, it will be clear and apparent that the subject technology is not limited to the specific details set forth herein and may be practiced without these details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology.
  • Overview:
  • A computer network is a geographically distributed collection of nodes interconnected by communication links and segments for transporting data between endpoints, such as personal computers and workstations. Many types of networks are available, with the types ranging from local area networks (LANs) and wide area networks (WANs) to overlay and software-defined networks, such as virtual extensible local area networks (VXLANs).
  • LANs typically connect nodes over dedicated private communication links located in the same geographic region, such as a building or campus. WANs, on the other hand, typically connect geographically dispersed nodes over long-distance communications links, such as common carrier telephone lines, optical lightpaths, synchronous optical networks (SONET), or synchronous digital hierarchy (SDH) links. LANs and WANs can include layer 2 (L2) and/or layer 3 (L3) networks and devices.
  • The Internet is an example of a WAN that connects disparate networks throughout the world, providing global communication between nodes on various networks. The nodes typically communicate over the network by exchanging discrete frames or packets of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP). In this context, a protocol can refer to a set of rules defining how the nodes interact with each other. Computer networks may be further interconnected by an intermediate network node, such as a router, to extend the effective “size” of each network.
  • Transmission Control Protocol (TCP) is widely used to provide reliable, ordered delivery of data from one network entity to another. More particularly, TCP is frequently relied upon to implement Internet applications, such as, for example, the World Wide Web, e-mail, and file transfer. In a high-bandwidth and low latency network utilizing TCP, multiple servers may independently send data to a single common receiver. Provided that the multiple senders simultaneously transmit data to the receiver, congestion, or incast congestion, can occur if the receiver is not capable of receiving the quantity of data being transmitted.
  • Description:
  • The congestion episode termed “incast” or “fan-in” congestion can lead to bursty losses and TCP timeouts. Essentially, incasting occurs when multiple sources simultaneously transfer data to a common client/receiver, overwhelming the buffers to which the client is connected. Incast can cause severe losses of throughput and vastly increase flow transfer times, making its prevention an important factor in ensuring reliable packet delivery across data center interconnect fabrics.
  • Two approaches are conventionally implemented to address the incast problem in data centers: (1) reducing the duration of TCP timeouts using high resolution timers (HRTs), and (2) increasing switch buffer sizes to reduce loss events.
  • The use of HRTs is designed to drastically reduce the minimum retransmission timeout (min-RTO). The approach of reducing the value of the TCP's min-RTO has the effect of drastically reducing the amount of time a TCP source is timed out after bursty packet losses. However, high resolution timers can be difficult to implement, especially in virtual-machine-rich environments. For instance, reducing min-RTO can require making operating system-specific changes to the TCP stack—imposing potentially serious deployment challenges because of the widespread use of closed-source operating systems like Windows and legacy operating systems.
  • The other approach to the incasting problem is to reduce packet losses using switches with very large buffers. However, increasing switch buffer sizes is very expensive, and increases latency and power dissipation. Moreover, large, high-bandwidth buffers such as needed for high-speed data center switches require expensive, complex and power-hungry memories. In terms of performance, while they can reduce packet drops and hence timeouts due to incast, they may also increase the latency of short messages, potentially leading to the violation of service level agreements (SLAs) for latency-sensitive applications.
  • In some other implementations, incast events may be predicted, for example by monitoring a rate at which packets are dequeued from a buffer, as compared to a buffer fill rate. However, such information is often of limited use because a given buffer can (on average) be empty—thus, time varying measurements based on bandwidth utilization, or on buffer use, may be too coarse-grained to yield insight into the actual cause/s of an incast event.
  • Accordingly, there remains a need to better understand network conditions that exist just before and during, the occurrence of an incast event.
  • The subject technology addresses the foregoing need by providing a way to capture data about packets enqueued just before an incast occurrence. With information regarding enqueued packets, network administrators can better analyze and understand the conditions leading to an overflow event. Enqueued packet information can yield clues as to the systemic cause of an incast event, for example, by providing information regarding source(s) and/or destination(s) of buffered packets, as well information identifying application(s) for which they are associated.
  • In some aspects, the subject technology can be implemented by capturing packet metadata for packets residing in a buffer when an incast occurs. As discussed in further detail below, packet metadata for each dequeued packet can be captured, up to the last packet that was added to the buffer before the incast is detected. In some implementations, a last packet stored to the buffer can be marked or “flagged” upon the detection of an incast event. Thereafter, packet metadata is captured (e.g., packet header information can be recorded) as each packet is subsequently dequeued. The recordation/capturing of dequeued packet metadata can continue until it is determined that the flagged packet has been dequeued. Thus, a “snapshot” of packet metadata, e.g., representing all packets in the filled buffer (before the incast event), can be recorded for later analysis. A brief introductory description of example systems and networks for which metadata information can be captured, as illustrated in FIGS. 1 and 2, is disclosed herein.
  • FIG. 1 illustrates an example network device 110 (e.g., a router) suitable for implementing the present invention. Network device 110 includes a master central processing unit (CPU) 162, interfaces 168, and bus 115 (e.g., a PCI bus). When acting under the control of appropriate software or firmware, CPU 162 is responsible for executing packet management, error detection, and/or routing functions, such as miscabling detection functions, for example. CPU 162 can accomplish all these functions under the control of software including an operating system and any appropriate applications software. CPU 162 may include one or more processors 163 such as a processor from the Motorola family of microprocessors or the MIPS family of microprocessors. In alternative aspects, processor 163 is specially designed hardware for controlling the operations of router 110. In a specific implementation, memory 161 (such as non-volatile RAM and/or ROM) also forms part of CPU 162. However, there are many different ways in which memory could be coupled to the system.
  • Interfaces 168 are typically provided as interface cards (sometimes referred to as “line cards”). Generally, they control the sending and receiving of data packets over the network and sometimes support other peripherals used with router 110. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. In addition, various very high-speed interfaces may be provided such as fast token ring interfaces, wireless interfaces, Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces and the like.
  • Although the system shown in FIG. 1 illustrates an example of a network device implementation, it is not the only network device architecture on which the subject technology may be implemented. For example, an architecture having a single processor that handles communications as well as routing computations, etc. is often used. Further, other types of interfaces and media can also be implemented.
  • FIG. 2 illustrates a data center network structure in which an environment 100 includes Top of Rack (TOR) switches 202-208, aggregate switches 210 and 212, aggregate routers 214 and 216, an access router 218, and Internet 220. Furthermore, FIG. 1 illustrates an example of how one of the TOR switches 208 can be connected to a plurality of servers 222-226. However, it is contemplated that each of TOR switches 202-208 can be similarly connected to the same plurality of servers 222-226 and/or different servers. In various embodiments, the environment 200 may represent a basic topology, at the abstract level of a data center network. As shown, each of the TOR switches 202-208 may be connected to each of the aggregate switches 210. For instance, TOR switch 202 can be connected to both aggregate switch 210 and aggregate switch 212. Moreover, each of aggregate switches 210 and 212 can be connected to each of the aggregate routers 214 and 216, which may be connected to the access router 218. Lastly, access router 218 can be connected to Internet 220. It is contemplated that any number of TOR switches 202-208, aggregate switches 210 and 212, aggregate routers 214 and 216, and access routers 218 can be implemented in environment 200.
  • In various aspects, a network data center may be a facility used to house computer systems and associated components, such as TOR switches 202-208, aggregate switches 210 and 212, aggregate routers 214 and 216 and/or access router 218, for example. Moreover, TOR switches 202-208 can refer to small port count switches that are situated on the top or near the top of a rack included in a network data center. In addition, aggregate switches 210 and 212 can be used to increase the link speed beyond the limits of any single cable or port.
  • As stated above, each of TOR switches 202-208 may be connected to a plurality of servers 222-226. Although three servers 222-226 are shown in FIG. 1, it is contemplated that each TOR switch 202-208 can be connected to any number of servers 222-226. In this embodiment, TOR switch 208 is representative of TOR switches 202-206 and it may be directly connected to servers 222-226. TOR switch 208 may be connected to dozens of servers 222-226. In one embodiment, the number of servers 222-226 under the same TOR switch 208 is from 44 to 48, and the TOR switch 208 is a 48-port Gigabit switch with one or multiple 10 Gigabit uplinks.
  • In the environment 200, such as a network data center, data may be stored on multiple servers 222-226. Incast congestion can occur when a file, or a portion thereof, is fetched from multiple of servers 222-226. More specifically, incast congestion may occur when multiple senders (i.e., servers 222-226), which may be operating under the same TOR switch 202-208, send data to a single receiver either simultaneously or at approximately the same time. In various implementations, the receiver can include any type of server and/or computing device. Even if the senders simultaneously transmit data to the receiver, if the number of senders or the amount of data transmitted by each sender is relatively small, incast congestion may be avoided. However, when the amount of data transmitted by the senders exceeds the available buffering at the receiver's access port, data packets that were transmitted by a sender may be lost and therefore, not be received by the receiver. Hence, throughput can decline due to one or more TCP connections experiencing time out caused by data packet drops and/or loss.
  • For instance, assume that the environment 200 includes ten servers 222-226 and an allocator that assigns one or more of the servers 222-226 to provide data in response to a request for that data. In various embodiments, if the servers 222-226 send their respective data packets to a receiver at approximately the same time, the receiver may not have available bandwidth to receive the data packets (i.e., incast congestion). As a result, data packets may be lost and the server 222-226 that transmitted the lost data packet(s) may need to retransmit those data packets. Accordingly, provided that the receiver requested a particular piece of data from the servers 222-226, the receiver may need to wait for the lost data packet to be retransmitted in order to receive the data responsive to the request. That is, the performance of environment 200 may be dependent upon the TCP connections between servers 222-226 and the receiver. Therefore, the time associated with retransmitting the lost data packets may cause unneeded delay in the environment 200.
  • FIG. 3 illustrates an example, of buffer (queue) of a network device 300 (e.g., similar to network device 110, discussed above with respect to FIG. 1). As illustrated, network device 300 includes buffer 302, which stores multiple packets e.g., packets ‘A,’ ‘B,’ ‘C,’ and ‘D.’ Network device 300 also include multiple network connections, e.g., a dequeue channel, which removes data from buffer 302, and multiple enqueued channels, incoming data from which is stored into buffer 302.
  • In practice, network device 300 receives data via the multiple enqueue channels, and stores the data in buffer 302. When properly functioning, network device 300 will dequeue the data in buffer 302 at a rate that is equal to, or faster than, a rate at which new data is being stored or added to buffer 302. However, in some instances new data (e.g., packets) are stored to buffer 302 at a rate exceeding that at which stored data (packets) its data can be dequeued. In such instances, buffer 302 can fill to capacity, and subsequently received packets, such as packet ‘E’ are dropped. As discussed above, an incast event can occur when multiple enqueue channels are used to push data/packets onto buffer 302 faster than the data/packets can be dequeued.
  • To better understand the nature of an incast event, it can be helpful to know more about the network conditions preceding the event, for example, by observing the packets stored in buffer 302 before the incast event occurred. In practice, data may be collected about the packets stored to buffer 302, for example by capturing packet header metadata for each packet as it is dequeue from buffer 302. The storing/capturing of packet metadata can be initialized by the detection of an incast event and can be continued, for example, until a marked/flag packet is dequeue. In such implementations the marked/flagged packet can be a packet last stored to buffer 302, before the incast event was detected. That is, upon detection of an incast event, the packet last stored to buffer 302 can be flagged/marked e.g., to indicate a time immediately preceding a packet drop. In some implementations the marking/flagging of a last packet stored to buffer 302 can be performed by modifying packet header information of the marked packet.
  • In the example illustrated in FIG. 3, packet ‘D’ is a last packet stored to buffer 302. Packet ‘E’ represents a first packet dropped after buffer 302 is filled, e.g., due to data incast. As illustrated, packet ‘D’ has been marked, by network device 300, such that a bit in the packet header has been flipped, distinguishing packet ‘D’ from the other packets in buffer 302. It is understood that the foregoing implementation depicted by FIG. 3 is merely an illustration of an example marking process. However, depending on implementation, the manner and/or process in which packet marking is performed may vary.
  • Further to the example of FIG. 3, as each of the stored packets are dequeue, the respective metadata information for each packet can be captured/recorded and stored for later analysis. By better understanding the nature of packets contained in buffer 302 when incast event occurs, network administrators may better troubleshoot the causes of incast events.
  • It is understood that packet metadata information may be analyzed locally, or remotely (e.g., across one or more remote collectors), depending on the desired implementation. That is, packet metadata may be stored and/or analyzed locally, e.g. on a network device in which metadata information is captured. Alternatively, any portion of captured metadata information may be sent to one or more remote systems/collectors for further storage and/or analysis.
  • FIG. 4. illustrates an example block diagram of a process 400 that can be used to implement aspects of the subject technology. Process 400 begins with step 402, in which one or more data packets are received at a network device. It is understood that the network device can include any of a variety of network enabled, processor-based devices, such as one or more switches (e.g. TOR switches) or routers, etc.
  • In step 402, each of the received data packets are stored in a buffer (e.g., a queue) associated with the network device. For example the packets can be stored in a queue or buffer as they are processed/routed e.g., before being dequeued and transmitted/routed to another node, or network end-point.
  • Subsequently, in decision step 406, a packet drop condition is determined. If in decision step 406 it is determined that no packet drop has been detected, process 400 proceeds back to step 404, in which incoming packets continue to be stored in a queue of the network device. Alternatively, if in decision step 406 it is determined that a packet drop has been detected, process 400 proceeds to step 408 in which a packet presently stored in the queue (buffer) is marked, indicating a time marker before the drop event. As discussed in further detail below, the marked packet can be used to identify a time-frame for which packet information (for dequeued packets) is to be captured/collected.
  • Although any packet in the queue can be marked, in some implementations, the marked packet is the last packet enqueued before the drop event was detected. That is, the most recent packet stored to the buffer is identified and marked, for example, by modifying one or more bits in the packet header.
  • In step 410, packets stored in the buffer prior to the drop event are dequeued. In some implementations, the packets are dequeued in a particular order, such as in a first-in-first-our order. As such, the marked packet is the last packet to be dequeued, from among the set of total packets residing in the buffer when the packet drop was detected. In this manner, packet data (e.g., packet metadata) is captured for all packets residing in the buffer when the drop (incast event) occurred. In certain implementations, the capturing of packet metadata is stopped after the marked packet has been dequeued, e.g., once a ‘snap-shot’ of buffered metadata has been captured.
  • Subsequently, in step 412, captured metadata information is analyzed, for example, to better understand the circumstances preceding the incast event. In some implementations, a network administrator, or other user diagnosing the cause of a packet drop event, may find such information useful, for example, in determining what applications or network paths/links are associated with the incast. For example, captured packet metadata can contain information indicating one or more originating applications, source/origination addresses, destination addresses, tenant network identifier(s), virtual local area network (VLAN) identification(s), etc. By better understanding the network conditions leading to an incast, network administrators are provided more information with which to diagnose network problems.
  • It is understood that any specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged, or that only a portion of the illustrated steps be performed. Some of the steps may be performed simultaneously. For example, in certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
  • The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but are to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.”
  • A phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology. A disclosure relating to an aspect may apply to all configurations, or one or more configurations. A phrase such as an aspect can refer to one or more aspects and vice versa. A phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology. A disclosure relating to a configuration may apply to all configurations, or one or more configurations. A phrase such as a configuration may refer to one or more configurations and vice versa.
  • The word “exemplary” is used herein to mean “serving as an example or illustration.” Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.

Claims (23)

What is claimed is:
1. A computer-implemented method comprising:
receiving a plurality of data packets at a network device;
storing, by the network device, each of the plurality of packets in a buffer;
detecting, by the network device, a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the buffer;
indicating a marked packet from among the plurality of received data packets, wherein the marked packet indicates a last packet enqueued prior to the packet drop event;
dequeuing each of the plurality of packets in the buffer; and
capturing metadata for each dequeued packet until the marked packet is dequeued.
2. The computer-implemented method of claim 1, further comprising:
determining a cause of an incast event at the network device based on the metadata.
3. The computer-implemented method of claim 1, further comprising:
sending the captured metadata for one or more of the dequeued packets to one or more remote collectors for further processing.
4. The computer-implemented method of claim 1, further comprising:
identifying one or more applications associated with one or more of the plurality of packets in the buffer prior to the drop event.
5. The computer-implemented method of claim 1, wherein the packet drop event corresponds with an incast event at the network device.
6. The computer-implemented method of claim 1, wherein identifying the marked packet further comprises:
modifying, by the network device, packet header information associated with the marked packet.
7. The computer-implemented method of claim 1, wherein the metadata for each dequeued packet comprises one or more of the following: a destination address, an origination address, a tenant network identifier, a protocol type, or a virtual local area network (VLAN) identification.
8. The computer-implemented method of claim 1, wherein indicating the marked packet from among the plurality of received data packets, further comprises:
modifying packet header information of the marked packet.
9. A system for capturing metadata information after an incast event, the system comprising:
a memory; and
one or more processors coupled to the memory, wherein the one or more processors are configured to perform operations comprising:
receiving a plurality of data packets at a network device;
storing, by the network device, each of the plurality of packets in a buffer;
detecting, by the network device, a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the buffer;
indicating a marked packet from among the plurality of received data packets, wherein the marked packet indicates a last packet enqueued prior to the packet drop event;
dequeuing each of the plurality of packets in the buffer; and
capturing metadata for each dequeued packet until the marked packet is dequeued.
10. The system of claim 9, wherein the one or more processors are further configured to perform operations comprising:
determining a cause of an incast event at the network device based on the metadata.
11. The system of claim 9, wherein the one or more processors are further configured to perform operations comprising:
sending the captured metadata for one or more of the dequeued packets to one or more remote collectors for further processing.
12. The system of claim 9, wherein the one or more processors are further configured to perform operations comprising:
identifying one or more applications associated with one or more of the plurality of packets in the buffer prior to the drop event.
13. The system of claim 9, wherein the packet drop event corresponds with an incast event at the network device.
14. The system of claim 9, wherein identifying the marked packet further comprises:
modifying, by the network device, packet header information associated with the marked packet.
15. The system of claim 9, wherein the metadata for each dequeued packet comprises one or more of the following: a destination address, an origination address, a tenant network identifier, a protocol type, or a virtual local area network (VLAN) identification.
16. The system of claim 9, wherein indicating the marked packet from among the plurality of received data packets, further comprises:
modifying packet header information of the marked packet.
17. A non-transitory computer-readable storage medium comprising instructions stored therein, which when executed by one or more processors, cause the processors to perform operations comprising:
receiving a plurality of data packets at a network device;
storing, by the network device, each of the plurality of packets in a buffer;
detecting, by the network device, a packet drop event for one or more incoming packets, wherein the one or more incoming packets are not stored in the buffer;
indicating a marked packet from among the plurality of received data packets, wherein the marked packet indicates a last packet enqueued prior to the packet drop event;
dequeuing each of the plurality of packets in the buffer; and
capturing metadata for each dequeued packet until the marked packet is dequeued.
18. The non-transitory computer-readable storage medium of claim 17, wherein the processors are further configured to perform operations comprising:
determining a cause of an incast event at the network device based on the metadata.
19. The non-transitory computer-readable storage medium of claim 17, wherein the one or more processors are further configured to perform operations comprising:
sending the captured metadata for one or more of the dequeued packets to one or more remote collectors for further processing.
20. The non-transitory computer-readable storage medium of claim 17, wherein the processors are further configured to perform operations comprising:
identifying one or more applications associated with one or more of the plurality of packets in the buffer prior to the drop event.
21. The non-transitory computer-readable storage medium of claim 17, wherein the packet drop event corresponds with an incast event at the network device.
22. The non-transitory computer-readable storage medium of claim 17, wherein identifying the marked packet further comprises:
modifying, by the network device, packet header information associated with the marked packet.
23. The non-transitory computer-readable storage medium of claim 17, wherein the metadata for each dequeued packet comprises one or more of the following: a destination address, an origination address, a tenant network identifier, a protocol type, or virtual local area network (VLAN) identification.
US14/484,181 2013-11-05 2014-09-11 Incast drop cause telemetry Abandoned US20150124824A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/484,181 US20150124824A1 (en) 2013-11-05 2014-09-11 Incast drop cause telemetry

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361900324P 2013-11-05 2013-11-05
US14/484,181 US20150124824A1 (en) 2013-11-05 2014-09-11 Incast drop cause telemetry

Publications (1)

Publication Number Publication Date
US20150124824A1 true US20150124824A1 (en) 2015-05-07

Family

ID=53007003

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/484,181 Abandoned US20150124824A1 (en) 2013-11-05 2014-09-11 Incast drop cause telemetry
US14/532,787 Active 2035-06-16 US9667551B2 (en) 2013-11-05 2014-11-04 Policy enforcement proxy

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/532,787 Active 2035-06-16 US9667551B2 (en) 2013-11-05 2014-11-04 Policy enforcement proxy

Country Status (1)

Country Link
US (2) US20150124824A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9667551B2 (en) 2013-11-05 2017-05-30 Cisco Technology, Inc. Policy enforcement proxy
US9996653B1 (en) 2013-11-06 2018-06-12 Cisco Technology, Inc. Techniques for optimizing dual track routing
US10020989B2 (en) 2013-11-05 2018-07-10 Cisco Technology, Inc. Provisioning services in legacy mode in a data center network
US10079761B2 (en) 2013-11-05 2018-09-18 Cisco Technology, Inc. Hierarchical routing with table management across hardware modules
US10116493B2 (en) 2014-11-21 2018-10-30 Cisco Technology, Inc. Recovering from virtual port channel peer failure
US10142163B2 (en) 2016-03-07 2018-11-27 Cisco Technology, Inc BFD over VxLAN on vPC uplinks
US10148586B2 (en) 2013-11-05 2018-12-04 Cisco Technology, Inc. Work conserving scheduler based on ranking
US10164782B2 (en) 2013-11-05 2018-12-25 Cisco Technology, Inc. Method and system for constructing a loop free multicast tree in a data-center fabric
US10182496B2 (en) 2013-11-05 2019-01-15 Cisco Technology, Inc. Spanning tree protocol optimization
US10187302B2 (en) 2013-11-05 2019-01-22 Cisco Technology, Inc. Source address translation in overlay networks
US10193750B2 (en) 2016-09-07 2019-01-29 Cisco Technology, Inc. Managing virtual port channel switch peers from software-defined network controller
US10333828B2 (en) 2016-05-31 2019-06-25 Cisco Technology, Inc. Bidirectional multicasting over virtual port channel
US10382345B2 (en) 2013-11-05 2019-08-13 Cisco Technology, Inc. Dynamic flowlet prioritization
US10516612B2 (en) 2013-11-05 2019-12-24 Cisco Technology, Inc. System and method for identification of large-data flows
US10547509B2 (en) 2017-06-19 2020-01-28 Cisco Technology, Inc. Validation of a virtual port channel (VPC) endpoint in the network fabric
US10778584B2 (en) 2013-11-05 2020-09-15 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US10951522B2 (en) 2013-11-05 2021-03-16 Cisco Technology, Inc. IP-based forwarding of bridged and routed IP packets and unicast ARP
US11102129B2 (en) * 2018-09-09 2021-08-24 Mellanox Technologies, Ltd. Adjusting rate of outgoing data requests for avoiding incast congestion
US11159451B2 (en) 2018-07-05 2021-10-26 Cisco Technology, Inc. Stretched EPG and micro-segmentation in multisite fabrics
US20220038374A1 (en) * 2019-04-10 2022-02-03 At&T Intellectual Property I, L.P. Microburst detection and management
US11509501B2 (en) 2016-07-20 2022-11-22 Cisco Technology, Inc. Automatic port verification and policy application for rogue devices

Families Citing this family (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9699070B2 (en) 2013-10-04 2017-07-04 Nicira, Inc. Database protocol for exchanging forwarding state with hardware switches
WO2015069576A1 (en) * 2013-11-05 2015-05-14 Cisco Technology, Inc. Network fabric overlay
US20150289001A1 (en) * 2014-04-03 2015-10-08 Piksel, Inc. Digital Signage System
CN105490995B (en) * 2014-09-30 2018-04-20 国际商业机器公司 A kind of method and apparatus that NVE E-Packets in NVO3 networks
US10375043B2 (en) * 2014-10-28 2019-08-06 International Business Machines Corporation End-to-end encryption in a software defined network
US10205658B1 (en) * 2015-01-08 2019-02-12 Marvell Israel (M.I.S.L) Ltd. Reducing size of policy databases using bidirectional rules
US9800508B2 (en) * 2015-01-09 2017-10-24 Dell Products L.P. System and method of flow shaping to reduce impact of incast communications
EP3054646B1 (en) * 2015-02-06 2017-03-22 Axiomatics AB Policy separation
US9992202B2 (en) * 2015-02-28 2018-06-05 Aruba Networks, Inc Access control through dynamic grouping
US9942058B2 (en) 2015-04-17 2018-04-10 Nicira, Inc. Managing tunnel endpoints for facilitating creation of logical networks
US9825814B2 (en) * 2015-05-28 2017-11-21 Cisco Technology, Inc. Dynamic attribute based application policy
US10554484B2 (en) 2015-06-26 2020-02-04 Nicira, Inc. Control plane integration with hardware switches
US9819581B2 (en) 2015-07-31 2017-11-14 Nicira, Inc. Configuring a hardware switch as an edge node for a logical router
US9967182B2 (en) 2015-07-31 2018-05-08 Nicira, Inc. Enabling hardware switches to perform logical routing functionalities
US9847938B2 (en) 2015-07-31 2017-12-19 Nicira, Inc. Configuring logical routers on hardware switches
US10313186B2 (en) 2015-08-31 2019-06-04 Nicira, Inc. Scalable controller for hardware VTEPS
US10263828B2 (en) 2015-09-30 2019-04-16 Nicira, Inc. Preventing concurrent distribution of network data to a hardware switch by multiple controllers
US9948577B2 (en) 2015-09-30 2018-04-17 Nicira, Inc. IP aliases in logical networks with hardware switches
US9998324B2 (en) 2015-09-30 2018-06-12 Nicira, Inc. Logical L3 processing for L2 hardware switches
US10230576B2 (en) 2015-09-30 2019-03-12 Nicira, Inc. Managing administrative statuses of hardware VTEPs
US10079798B2 (en) 2015-10-23 2018-09-18 Inernational Business Machines Corporation Domain intercommunication in shared computing environments
US9806911B2 (en) 2015-11-02 2017-10-31 International Business Machines Corporation Distributed virtual gateway appliance
US10250553B2 (en) 2015-11-03 2019-04-02 Nicira, Inc. ARP offloading for managed hardware forwarding elements
US9917799B2 (en) 2015-12-15 2018-03-13 Nicira, Inc. Transactional controls for supplying control plane data to managed hardware forwarding elements
US9998375B2 (en) * 2015-12-15 2018-06-12 Nicira, Inc. Transactional controls for supplying control plane data to managed hardware forwarding elements
CN108886515B (en) * 2016-01-08 2021-06-15 百通股份有限公司 Method and protection device for preventing malicious information communication in an IP network by utilizing a benign networking protocol
US10200343B2 (en) 2016-06-29 2019-02-05 Nicira, Inc. Implementing logical network security on a hardware switch
US10868737B2 (en) * 2016-10-26 2020-12-15 Arizona Board Of Regents On Behalf Of Arizona State University Security policy analysis framework for distributed software defined networking (SDN) based cloud environments
US10581744B2 (en) * 2016-12-02 2020-03-03 Cisco Technology, Inc. Group-based pruning in a software defined networking environment
US10171344B1 (en) * 2017-02-02 2019-01-01 Cisco Technology, Inc. Isolation of endpoints within an endpoint group
US10382390B1 (en) 2017-04-28 2019-08-13 Cisco Technology, Inc. Support for optimized microsegmentation of end points using layer 2 isolation and proxy-ARP within data center
US10382265B1 (en) * 2017-08-28 2019-08-13 Juniper Networks, Inc. Reversible yang-based translators
US10855766B2 (en) * 2017-09-28 2020-12-01 Intel Corporation Networking switch with object storage system intelligence
US10728288B2 (en) 2017-11-21 2020-07-28 Juniper Networks, Inc. Policy-driven workload launching based on software defined networking encryption policies
US10742690B2 (en) 2017-11-21 2020-08-11 Juniper Networks, Inc. Scalable policy management for virtual networks
US11489872B2 (en) * 2018-05-10 2022-11-01 Jayant Shukla Identity-based segmentation of applications and containers in a dynamic environment
US10742557B1 (en) * 2018-06-29 2020-08-11 Juniper Networks, Inc. Extending scalable policy management to supporting network devices
US10778724B1 (en) 2018-06-29 2020-09-15 Juniper Networks, Inc. Scalable port range management for security policies
US11178071B2 (en) 2018-07-05 2021-11-16 Cisco Technology, Inc. Multisite interconnect and policy with switching fabrics
US11394693B2 (en) * 2019-03-04 2022-07-19 Cyxtera Cybersecurity, Inc. Establishing network tunnel in response to access request
US11201800B2 (en) 2019-04-03 2021-12-14 Cisco Technology, Inc. On-path dynamic policy enforcement and endpoint-aware policy enforcement for endpoints
US11184325B2 (en) 2019-06-04 2021-11-23 Cisco Technology, Inc. Application-centric enforcement for multi-tenant workloads with multi site data center fabrics
US11216309B2 (en) 2019-06-18 2022-01-04 Juniper Networks, Inc. Using multidimensional metadata tag sets to determine resource allocation in a distributed computing environment
US11171992B2 (en) 2019-07-29 2021-11-09 Cisco Technology, Inc. System resource management in self-healing networks
CN113132326B (en) * 2019-12-31 2022-08-09 华为技术有限公司 Access control method, device and system
US11418435B2 (en) * 2020-01-31 2022-08-16 Cisco Technology, Inc. Inband group-based network policy using SRV6
US20210266255A1 (en) * 2020-02-24 2021-08-26 Cisco Technology, Inc. Vrf segregation for shared services in multi-fabric cloud networks
US11700236B2 (en) 2020-02-27 2023-07-11 Juniper Networks, Inc. Packet steering to a host-based firewall in virtualized environments
US11277447B2 (en) 2020-07-17 2022-03-15 Cisco Technology, Inc. Distributed policy enforcement proxy with dynamic EPG sharding
WO2022017582A1 (en) * 2020-07-21 2022-01-27 Siemens Aktiengesellschaft Method and system for securing data communication in a computing environment
US11743189B2 (en) * 2020-09-14 2023-08-29 Microsoft Technology Licensing, Llc Fault tolerance for SDN gateways using network switches
US11570109B2 (en) * 2021-04-28 2023-01-31 Cisco Technology, Inc. Software-defined service insertion for network fabrics
US11502872B1 (en) * 2021-06-07 2022-11-15 Cisco Technology, Inc. Isolation of clients within a virtual local area network (VLAN) in a fabric network

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020146026A1 (en) * 2000-05-14 2002-10-10 Brian Unitt Data stream filtering apparatus & method
US20030035385A1 (en) * 2001-08-09 2003-02-20 William Walsh Method, apparatus, and system for identifying and efficiently treating classes of traffic
US20030097461A1 (en) * 2001-11-08 2003-05-22 Paul Barham System and method for controlling network demand via congestion pricing
US20030137940A1 (en) * 1998-11-24 2003-07-24 Schwartz Steven J. Pass/drop apparatus and method for network switching node
US20030174650A1 (en) * 2002-03-15 2003-09-18 Broadcom Corporation Weighted fair queuing (WFQ) shaper
US20030231646A1 (en) * 2002-06-14 2003-12-18 Chandra Prashant R. Method and system for efficient random packet enqueue, drop or mark processing in network traffic
US20040062259A1 (en) * 2002-09-27 2004-04-01 International Business Machines Corporation Token-based active queue management
US20040100901A1 (en) * 2002-11-27 2004-05-27 International Business Machines Corporation Method and apparatus for automatic congestion avoidance for differentiated service flows
US20050007961A1 (en) * 2003-07-09 2005-01-13 Fujitsu Network Communications, Inc. Processing data packets using markers
US20060198315A1 (en) * 2005-03-02 2006-09-07 Fujitsu Limited Communication apparatus
US20060221835A1 (en) * 2005-03-30 2006-10-05 Cisco Technology, Inc. Converting a network device from data rate traffic management to packet rate
US20070223372A1 (en) * 2006-03-23 2007-09-27 Lucent Technologies Inc. Method and apparatus for preventing congestion in load-balancing networks
US20070274229A1 (en) * 2006-05-24 2007-11-29 Sbc Knowledge Ventures, L.P. Method and apparatus for reliable communications in a packet network
US20080031247A1 (en) * 2006-08-04 2008-02-07 Fujitsu Limited Network device and data control program
US20090122805A1 (en) * 2007-11-14 2009-05-14 Gary Paul Epps Instrumenting packet flows
US20090268614A1 (en) * 2006-12-18 2009-10-29 British Telecommunications Public Limited Company Method and system for congestion marking
US20100128619A1 (en) * 2007-10-30 2010-05-27 Sony Corporation Relay device, relay method, and program
US7826469B1 (en) * 2009-03-09 2010-11-02 Juniper Networks, Inc. Memory utilization in a priority queuing system of a network device
US20110158248A1 (en) * 2009-12-24 2011-06-30 Juniper Networks, Inc. Dynamic prioritized fair share scheduling scheme in over-subscribed port scenario
US20110310738A1 (en) * 2010-06-22 2011-12-22 Verizon Patent And Licensing, Inc. Congestion buffer control in wireless networks
US20120063318A1 (en) * 2002-04-04 2012-03-15 Juniper Networks, Inc. Dequeuing and congestion control systems and methods for single stream multicast
US20120281697A1 (en) * 2010-06-24 2012-11-08 Xiaofeng Huang Method, device and system for implementing multicast

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7530112B2 (en) * 2003-09-10 2009-05-05 Cisco Technology, Inc. Method and apparatus for providing network security using role-based access control
US7877796B2 (en) * 2004-11-16 2011-01-25 Cisco Technology, Inc. Method and apparatus for best effort propagation of security group information
US7840708B2 (en) * 2007-08-13 2010-11-23 Cisco Technology, Inc. Method and system for the assignment of security group information using a proxy
US20150124824A1 (en) 2013-11-05 2015-05-07 Cisco Technology, Inc. Incast drop cause telemetry

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030137940A1 (en) * 1998-11-24 2003-07-24 Schwartz Steven J. Pass/drop apparatus and method for network switching node
US20020146026A1 (en) * 2000-05-14 2002-10-10 Brian Unitt Data stream filtering apparatus & method
US20030035385A1 (en) * 2001-08-09 2003-02-20 William Walsh Method, apparatus, and system for identifying and efficiently treating classes of traffic
US20030097461A1 (en) * 2001-11-08 2003-05-22 Paul Barham System and method for controlling network demand via congestion pricing
US20030174650A1 (en) * 2002-03-15 2003-09-18 Broadcom Corporation Weighted fair queuing (WFQ) shaper
US20120063318A1 (en) * 2002-04-04 2012-03-15 Juniper Networks, Inc. Dequeuing and congestion control systems and methods for single stream multicast
US20030231646A1 (en) * 2002-06-14 2003-12-18 Chandra Prashant R. Method and system for efficient random packet enqueue, drop or mark processing in network traffic
US20040062259A1 (en) * 2002-09-27 2004-04-01 International Business Machines Corporation Token-based active queue management
US20040100901A1 (en) * 2002-11-27 2004-05-27 International Business Machines Corporation Method and apparatus for automatic congestion avoidance for differentiated service flows
US20050007961A1 (en) * 2003-07-09 2005-01-13 Fujitsu Network Communications, Inc. Processing data packets using markers
US20060198315A1 (en) * 2005-03-02 2006-09-07 Fujitsu Limited Communication apparatus
US20060221835A1 (en) * 2005-03-30 2006-10-05 Cisco Technology, Inc. Converting a network device from data rate traffic management to packet rate
US20070223372A1 (en) * 2006-03-23 2007-09-27 Lucent Technologies Inc. Method and apparatus for preventing congestion in load-balancing networks
US20070274229A1 (en) * 2006-05-24 2007-11-29 Sbc Knowledge Ventures, L.P. Method and apparatus for reliable communications in a packet network
US20080031247A1 (en) * 2006-08-04 2008-02-07 Fujitsu Limited Network device and data control program
US20090268614A1 (en) * 2006-12-18 2009-10-29 British Telecommunications Public Limited Company Method and system for congestion marking
US20100128619A1 (en) * 2007-10-30 2010-05-27 Sony Corporation Relay device, relay method, and program
US20090122805A1 (en) * 2007-11-14 2009-05-14 Gary Paul Epps Instrumenting packet flows
US7826469B1 (en) * 2009-03-09 2010-11-02 Juniper Networks, Inc. Memory utilization in a priority queuing system of a network device
US20110158248A1 (en) * 2009-12-24 2011-06-30 Juniper Networks, Inc. Dynamic prioritized fair share scheduling scheme in over-subscribed port scenario
US20110310738A1 (en) * 2010-06-22 2011-12-22 Verizon Patent And Licensing, Inc. Congestion buffer control in wireless networks
US20120281697A1 (en) * 2010-06-24 2012-11-08 Xiaofeng Huang Method, device and system for implementing multicast

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10382345B2 (en) 2013-11-05 2019-08-13 Cisco Technology, Inc. Dynamic flowlet prioritization
US10778584B2 (en) 2013-11-05 2020-09-15 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US11411770B2 (en) 2013-11-05 2022-08-09 Cisco Technology, Inc. Virtual port channel bounce in overlay network
US10079761B2 (en) 2013-11-05 2018-09-18 Cisco Technology, Inc. Hierarchical routing with table management across hardware modules
US10516612B2 (en) 2013-11-05 2019-12-24 Cisco Technology, Inc. System and method for identification of large-data flows
US11528228B2 (en) 2013-11-05 2022-12-13 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US10148586B2 (en) 2013-11-05 2018-12-04 Cisco Technology, Inc. Work conserving scheduler based on ranking
US10164782B2 (en) 2013-11-05 2018-12-25 Cisco Technology, Inc. Method and system for constructing a loop free multicast tree in a data-center fabric
US10182496B2 (en) 2013-11-05 2019-01-15 Cisco Technology, Inc. Spanning tree protocol optimization
US10187302B2 (en) 2013-11-05 2019-01-22 Cisco Technology, Inc. Source address translation in overlay networks
US9667551B2 (en) 2013-11-05 2017-05-30 Cisco Technology, Inc. Policy enforcement proxy
US10225179B2 (en) 2013-11-05 2019-03-05 Cisco Technology, Inc. Virtual port channel bounce in overlay network
US11018898B2 (en) 2013-11-05 2021-05-25 Cisco Technology, Inc. Multicast multipathing in an overlay network
US10374878B2 (en) 2013-11-05 2019-08-06 Cisco Technology, Inc. Forwarding tables for virtual networking devices
US10020989B2 (en) 2013-11-05 2018-07-10 Cisco Technology, Inc. Provisioning services in legacy mode in a data center network
US10951522B2 (en) 2013-11-05 2021-03-16 Cisco Technology, Inc. IP-based forwarding of bridged and routed IP packets and unicast ARP
US10904146B2 (en) 2013-11-05 2021-01-26 Cisco Technology, Inc. Hierarchical routing with table management across hardware modules
US10581635B2 (en) 2013-11-05 2020-03-03 Cisco Technology, Inc. Managing routing information for tunnel endpoints in overlay networks
US10606454B2 (en) 2013-11-05 2020-03-31 Cisco Technology, Inc. Stage upgrade of image versions on devices in a cluster
US10623206B2 (en) 2013-11-05 2020-04-14 Cisco Technology, Inc. Multicast multipathing in an overlay network
US10652163B2 (en) 2013-11-05 2020-05-12 Cisco Technology, Inc. Boosting linked list throughput
US11811555B2 (en) 2013-11-05 2023-11-07 Cisco Technology, Inc. Multicast multipathing in an overlay network
US11888746B2 (en) 2013-11-05 2024-01-30 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US11625154B2 (en) 2013-11-05 2023-04-11 Cisco Technology, Inc. Stage upgrade of image versions on devices in a cluster
US10776553B2 (en) 2013-11-06 2020-09-15 Cisco Technology, Inc. Techniques for optimizing dual track routing
US9996653B1 (en) 2013-11-06 2018-06-12 Cisco Technology, Inc. Techniques for optimizing dual track routing
US10819563B2 (en) 2014-11-21 2020-10-27 Cisco Technology, Inc. Recovering from virtual port channel peer failure
US10116493B2 (en) 2014-11-21 2018-10-30 Cisco Technology, Inc. Recovering from virtual port channel peer failure
US10142163B2 (en) 2016-03-07 2018-11-27 Cisco Technology, Inc BFD over VxLAN on vPC uplinks
US10333828B2 (en) 2016-05-31 2019-06-25 Cisco Technology, Inc. Bidirectional multicasting over virtual port channel
US11509501B2 (en) 2016-07-20 2022-11-22 Cisco Technology, Inc. Automatic port verification and policy application for rogue devices
US10193750B2 (en) 2016-09-07 2019-01-29 Cisco Technology, Inc. Managing virtual port channel switch peers from software-defined network controller
US10749742B2 (en) 2016-09-07 2020-08-18 Cisco Technology, Inc. Managing virtual port channel switch peers from software-defined network controller
US11438234B2 (en) 2017-06-19 2022-09-06 Cisco Technology, Inc. Validation of a virtual port channel (VPC) endpoint in the network fabric
US10873506B2 (en) 2017-06-19 2020-12-22 Cisco Technology, Inc. Validation of a virtual port channel (VPC) endpoint in the network fabric
US10547509B2 (en) 2017-06-19 2020-01-28 Cisco Technology, Inc. Validation of a virtual port channel (VPC) endpoint in the network fabric
US11159451B2 (en) 2018-07-05 2021-10-26 Cisco Technology, Inc. Stretched EPG and micro-segmentation in multisite fabrics
US11949602B2 (en) 2018-07-05 2024-04-02 Cisco Technology, Inc. Stretched EPG and micro-segmentation in multisite fabrics
US11102129B2 (en) * 2018-09-09 2021-08-24 Mellanox Technologies, Ltd. Adjusting rate of outgoing data requests for avoiding incast congestion
US20220038374A1 (en) * 2019-04-10 2022-02-03 At&T Intellectual Property I, L.P. Microburst detection and management

Also Published As

Publication number Publication date
US9667551B2 (en) 2017-05-30
US20150124809A1 (en) 2015-05-07

Similar Documents

Publication Publication Date Title
US20150124824A1 (en) Incast drop cause telemetry
CN111201757B (en) Network access node virtual structure dynamically configured on underlying network
EP3151470B1 (en) Analytics for a distributed network
US7593331B2 (en) Enhancing transmission reliability of monitored data
US8005012B1 (en) Traffic analysis of data flows
Gebert et al. Internet access traffic measurement and analysis
US20100054123A1 (en) Method and device for hign utilization and efficient flow control over networks with long transmission latency
US20060153092A1 (en) Active response communications network tap
US20210297350A1 (en) Reliable fabric control protocol extensions for data center networks with unsolicited packet spraying over multiple alternate data paths
US20210297351A1 (en) Fabric control protocol with congestion control for data center networks
US11102273B2 (en) Uplink performance management
US20100226384A1 (en) Method for reliable transport in data networks
WO2018144234A1 (en) Data bandwidth overhead reduction in a protocol based communication over a wide area network (wan)
US8571049B2 (en) Setting and changing queue sizes in line cards
JP2009055114A (en) Communication device, communication system, transfer efficiency improvement method, and transfer efficiency improvement program
US9525635B2 (en) Network communication apparatus and method of preferential band limitation of transfer frame
Marian et al. Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints
US8351426B2 (en) Ethernet virtualization using assisted frame correction
WO2019061302A1 (en) Message processing method and device
US9413627B2 (en) Data unit counter
US8650323B2 (en) Managing multi-step retry reinitialization protocol flows
US20210297343A1 (en) Reliable fabric control protocol extensions for data center networks with failure resilience
US20230403233A1 (en) Congestion notification in a multi-queue environment
US11451998B1 (en) Systems and methods for communication system resource contention monitoring
WO2023280004A1 (en) Network configuration method, device and system

Legal Events

Date Code Title Description
AS Assignment

Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EDSALL, THOMAS J.;ALIZADEH ATTAR, MOHAMMADREZA;SIGNING DATES FROM 20140910 TO 20140911;REEL/FRAME:033725/0658

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION