|Publication number||US8041929 B2|
|Application number||US 11/454,820|
|Publication date||18 Oct 2011|
|Filing date||16 Jun 2006|
|Priority date||16 Jun 2006|
|Also published as||US20070294694|
|Publication number||11454820, 454820, US 8041929 B2, US 8041929B2, US-B2-8041929, US8041929 B2, US8041929B2|
|Inventors||Robert Jeter, Trevor Gamer, William Lee, Scott Smith, Gegory Goss|
|Original Assignee||Cisco Technology, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (117), Non-Patent Citations (8), Referenced by (3), Classifications (10), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
1. Field of the Invention
The present invention relates to using hardware to assist in multi-threaded processing and, in particular, to using hardware to select a sliding window in a register bank and data random access memory (RAM) when switching between threads in fewer clock cycles than in conventional thread switching mechanisms.
2. Description of the Related Art
Many processors are designed to reduce idle time by swapping multiple processing threads. A thread is a set of data contents for processor registers and memory and a sequence of instructions to operate on those contents that can be executed independently of other threads. Some instructions involve sending a request or command to another component of the device or system, such as input/output devices or one or more high valued, high-latency components that take many processor clock cycles to respond. Rather than waiting idly for the other component to respond, the processor stores the contents of the registers and the current command or commands of the current thread to local memory, thus “swapping” the thread out, also described as “switching” threads and causing the thread to “sleep.” Then the contents and commands of a different sleeping thread are taken on board, so called “swapped” or “switched” onto the processor, also described as “awakening” the thread. The woken thread is then processed until another wait condition occurs. A thread-scheduler is responsible for swapping threads on or off the processor, or both, from and to local memory. Threads are widely known and used commercially, for example in operating systems for most computers.
Some thread wait conditions result from use of a high-value, long-latency shared resource, such as expensive static random access memory (SRAM), quad data rate (QDR) SRAM, content access memory (CAM) and ternary CAM (TCAM), all components well known in the art of digital processing. For example, in a router used as an intermediate network node to facilitate the passage of data packets among end nodes, a TCAM and QDR SRAM are shared by multiple processors to parse data packets and classify them as a certain type using a certain protocol or belonging to a particular stream of data packets with the same source and destination. Processing for each packet typically involves from five to seven long-latency memory operations invoking the TCAM or QDR or both. These long-latency memory operations can take about 125 clock cycles or more of a 500 MegaHertz clock (MHz, 1 MHz=106 cycles per second) that paces processor operations. The parse and classify software programs typically execute twenty to fifty instructions between issuing a request for a long-latency memory operation. A typical instruction is executed in one or two clock cycles.
A desirable goal of a router is to achieve line rate processing. In line rate processing, the router processes and forwards data packets at the same rate that those data packets arrive on the router's communications links. Assuming a minimum-sized packet, a Gigabit Ethernet link line rate yields about 1.49 million data packets per second, and a 10 Gigabit Ethernet link line rate yields about 14.9 million data packets per second. A router typically includes multiple links. To achieve line rate processing, routers are configured with multiple processors. The idle time introduced by the frequent long-latency memory accesses increases the number of processors needed to support line rate processing.
In one approach, commercially available multi-threaded processors are used. While suitable for many purposes, the commercially available multi-threaded processors suffer some disadvantages. One disadvantage is that thread switching involves many clock cycles as all the contents of multiple registers used by the processor as operands and results of instructions are swapped, i.e., moved off the processor to a more spacious memory (that is often off the chip and requires use of a shared bus) and replaced by contents of registers for a different thread in another portion of that memory. These multi-threaded processors also consume clock cycles to swap instructions and data in local caches to other more spacious memory, such as off-chip memory. These extra cycles reduce the effectiveness of each multi-threaded processor and requires the deployment of more such processors to achieve line rate processing in a router
Another disadvantage is that some multi-threaded processors use a thread scheduler process that forces a thread switch at arbitrary times, such as after a certain number of clock cycles unrelated to when the thread issues a long-latency memory operation. Such processors incur the cost of thread switching without the benefit of reduced idle time on the processor.
In another approach, a processor could be designed to switch threads when long-latency commands are issued using larger on-chip memories to reduce clock cycles in swapping information with more distant larger capacity memories when threads are switched, and also provide an option to avoid swapping instruction sets. However, the design and development of a new processor is an extremely costly effort that takes many years. Such effort is typically justified only for a mass market. Thus, there is little likelihood that such an effort can or will be undertaken soon.
Based on the foregoing, there is a clear need for techniques for thread-switching that do not suffer some or all the deficiencies of the conventional approaches in multi-threaded processors.
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art to the claims in this application merely due to the presence of these approaches in this background section.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Techniques are described for switching among processing threads using a hardware assist. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
Embodiments of the invention are described in detail below in the context of a data packet switching system on a router that has four processors, each allowing up to four threads that share use of a TCAM and QDR for updating routing tables while forwarding data packets at a high speed line rate. However, the invention is not limited to this context. In various other embodiments, more or fewer processors allowing more or fewer threads share more or fewer components of the same or different types in the same or different devices.
1.0 Network Overview
Information is exchanged between network nodes according to one or more of many well known, new or still developing protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other based on information sent over the communication links. Communications between nodes are typically effected by exchanging discrete packets of data. Each data packet typically comprises 1] header information associated with a particular protocol, and 2] payload information that follows the header information and contains information to be processed independently of that particular protocol. In some protocols, the packet includes 3] trailer information following the payload and indicating the end of the payload information. The header includes information such as the source of the data packet, its destination, the length of the payload, and other properties used by the protocol. Often, the data in the payload for the particular protocol includes a header and payload for a different protocol associated with a different function at the network node.
The intermediate node (e.g., node 102) typically receives data packets and forwards the packets in accordance with predetermined routing information that is distributed among intermediate nodes in control plane data packets using a routing protocol. The intermediate network node 102 is configured to store data packets between reception and transmission, to determine a link over which to transmit a received data packet, and to transmit the data packet over that link.
According to some embodiments of the invention described below, the intermediate network node 102 includes four processors that allow up to four threads, and includes a TCAM and QDR memory to share among those threads. The TCAM is used to store route information so that it can be retrieved quickly based on a destination address, e.g., an Internet Protocol (IP) address or a Media Access Control (MAC) address.
According to embodiments of the invention, a data processing system, such as the intermediate network node 102, includes a modified mechanism for switching threads on each of one or more processors. Such a mechanism makes line rate performance at the router more likely. In these embodiments, all contents and data for all threads for one processor are stored in a larger register bank, and data RAM, respectively, so that swapping of contents does not occur during a thread switch. Instead a different portion (also called a “slice” or “window”) of the register bank and data RAM is reserved for each thread and accessed based on a thread ID for the current thread. The current thread ID is supplied by a thread scheduler block separate from the core processor. The modified mechanism allows a conventional single or multi-threaded processor to be used to switch threads in fewer cycles than conventional approaches.
2.0 Structural Overview
Conventional elements of a router which may serve as the intermediate network node 102 in some embodiments are described in greater detail in a later section with reference to
The shared resources block 270 includes one or more shared resources and a lock controller 272 for issuing locks as requested for any of those shared resources. In the illustrated embodiment, the shared resources include a TCAM 274 and a QDR SRAM 276.
The core processor 242 is a circuitry block that implements logic to execute coded instructions. Input and output to the core processor are effected through leads that are exposed on the outside of the core processor. A data channel connects certain leads on the core processor 242 to leads that serve as input and output on other blocks of circuitry. The IRAM 244 is a circuitry block that supports random access memory read and write operations. In some embodiments, IRAM 244 is read only memory and supports only read operations for retrieving IRAM contents. The register bank 246 is a circuitry block that supports random access memory read and write operations for a small set of memory locations that are easily addressed by a few bits in the instructions retrieved from IRAM. The register bank 246 has 2c addressable locations. In the illustrated embodiment C=4 so that register bank 246 has 24=16 addressable locations. The data RAM 248 is a circuitry block that supports random access memory read and write operations for a larger set of memory locations. The data RAM 248 serves as a fast local storage for data used by the current thread. An instruction typically involves moving the contents of one location to or from a register in register bank 246. The data RAM 248 has 2D addressable locations. In the illustrated embodiment D=12 so that register bank 246 has 212=4096 addressable locations. In some other available processors 240 one or more of C and D are equal to different values. At each addressable location in register 246 and data RAM 248, 64 bits of content are stored.
The core processor 242 uses some leads connected to core-IRAM channel 234 to retrieve a next instruction from IRAM 244. In the illustrated embodiment, the core-IRAM channel 234 has a width of 45 bits, i.e., includes 45 parallel wires that connect leads on the core processor 242 to leads on the IRAM 244. On the 45 bit core-IRAM channel 234, 13 bits are used to express a location in the IRAM from which an instruction is to be retrieved, and 32 bits are used to indicate the instruction retrieved. The instruction typically includes data that indicates an address for one or more registers in register bank 246 or an address for a location in data RAM 248. The address of a register is expressed by a value made up of C bits, while an address for a location in the data RAM is expressed by a value made up of D bits. In the illustrated embodiment, C=4 and D=12; in some other processors one or more of C and D are equal to different numbers of bits. After the instruction is retrieved, it is executed and the result is indicated on leads of the core processor specified by the instruction.
Some leads on core processor 242 are connected to a core-register channel to get or put contents for a particular register in register bank 246. The core-register channel includes a C bit core-register access channel 236 to indicate a particular register and a wider channel (not shown) to hold the contents. The core-register access channel 236 is connected to a bank input 245 comprising C leads. In the illustrated embodiment, C=4 bits, and the wider channel (not shown) is 64 bits wide.
Similarly, some leads are connected to a core-data RAM channel to get or put contents for a particular location in data RAM 248. In an illustrated embodiment, the core-data RAM channel includes a D=12 bit core-data RAM access channel 238 to indicate a particular location and a 64 bit channel (not shown) to hold the contents. The core-data RAM access channel 238 is connected to a data RAM input 247 comprising D leads.
Some instructions involve using still other leads (not shown) to send a request to off-processor elements, such as shared resources 270, using other data channels (not shown).
In some conventional approaches to using processor 240 for multiple threads, a thread scheduler application executes on core processor 242 and determines when to switch threads. When a thread is to be switched, the thread scheduler application swaps register bank 246 and, sometimes, data RAM 248 used by the current thread with the contents for those elements of a different thread, which contents are stored on some more distant memory. This approach for switching threads consumes many clock cycles in the process. Also the thread scheduling application itself consumes some of the space on IRAM 244, register bank 246 and data RAM 248, leaving less of that space for the threads themselves.
3.0 Thread Switching Apparatus
According to various embodiments of the invention, the core processor 242 continues to access registers, data RAM locations and IRAM locations as if these components were of a size only for a single thread, using the same leads and data channels as in the conventional approaches. However, in these embodiments of the invention, one or more larger components are used to store contents for all threads that may share the core processor. Contents for different threads are stored in different portions of the larger components (also called slices or windows of the larger components). The different portions are indicated by additional bits in the address inputs, and those bits are provided by an external thread scheduler as a unique thread ID for the current thread to be executed by the core processor. A unique thread ID of T bits is associated with each thread of the up to 2T threads allowed to share core processor 242. Thus, during thread switching, only the value of the bits supplied by the external thread scheduler change, and contents of the register bank, data RAM and IRAM need not be swapped. This approach reduces the work load during thread switching and significantly reduces the number of clock cycles consumed to complete the switch.
For purposes of illustration, a blocked thread scheme is used to determine when to switch threads, but in other embodiments, other thread switching schemes are used. In a blocked thread system, a thread executes on a processor until the thread determines that it should relinquish the processor to another thread and issues a switch thread command. The thread scheduler determines which of the sleeping threads, if any, to awaken and execute on the processor after the switch. It is further assumed, for purposes of illustration that a thread relinquishes control of the processor after issuing a request for a long-latency memory operation.
The core processor 242, IRAM 244, core-IRAM channel 234, core-register access channel 236, core-data RAM access channel 238, and 64-bit channels (not shown) connected to core processor 242 are as described above with reference to
The register bank 346 is 2T times the size of the register bank 246, so that the registers for all 2T threads can be stored in register bank 346. A particular register in register bank 346 is accessed by a particular value placed on the leads in bank input 345. Bank input 345 includes T leads more than the number of leads in bank input 245. Similarly, the data RAM 348 is 2T times the size of the data RAM 248, so that the data for all 2T threads can be stored in data RAM 348. A particular location in data RAM 348 is accessed by a particular value placed on the leads in data RAM input 347. Data RAM input 347 includes T leads more than the number of leads in data RAM input 247. Thus in the illustrated embodiment bank input 345 involves 6 leads and data RAM input 347 involves 14 leads.
In the illustrated embodiment, the IRAM 244 is the same as in conventional processor 240, because all 2T threads execute the same instructions. In some embodiments in which different threads execute different sequences of instructions, IRAM 244 is replaced by a larger IRAM sufficient to hold the instructions for all 2T threads. A location in the larger IRAM is indicated by more than the conventional 13 bits. In some embodiments, the larger IRAM may be less than 2T times the size of IRAM 244. For example, in some embodiments, some threads execute one sequence of instructions and all other threads execute a different second set of instructions, so that only two sets of instructions are involved and an IRAM twice the size of IRAM 244 is sufficient.
The apparatus 300 includes thread scheduler 350, flip-flop register 349, and data channels 351, 352, 353, 336 and 339 connecting them to each other and to the other components. The external thread scheduler 350 is a circuitry block with logic to receive information from the core processor 242 about a current thread when the thread is to be switched, and determine which of the other threads, if any, is eligible to be awakened and next take possession of the core processor 242. Any method may be implemented in thread scheduler 350 to determine whether a thread is eligible. For example, in some embodiments a thread is eligible if the thread scheduler has received a response for every request for shared resource operations, such as a long-latency memory operation. If no thread is eligible, then a default idle thread is selected by the thread scheduler 350. If several threads are eligible, the external thread scheduler 350 arbitrates to select one of the eligible threads. Any method may be used to select the eligible thread. For example, in some embodiments, the oldest eligible thread is selected as the next thread. In some embodiments, a highest priority eligible thread is selected, and the oldest of the highest priority eligible threads is selected if more than one has the same highest priority.
The thread scheduler 350 includes thread status registers 360 where information is stored about each of up to 2T threads. The information in the thread status registers 360 is used by the external thread scheduler 350 to determine the eligibility, priority, and activity of the threads that share core processor 242. The thread status registers 360 are described in more detail below with reference to
A switch preparation channel 351 is wide enough to transfer, from leads on the core processor 242 to leads on the external thread scheduler 350, the location in the IRAM for the next instruction of the current thread to be executed when the current thread is revived at some later time. In an illustrated embodiment, the switch preparation channel is 15 bits (13 bits for IRAM address, 1 bit to indicate whether the thread is done or retired, and 1 bit to indicate priority when thread received). In other embodiments more or fewer bits are included. A thread instruction channel 352 is wide enough to transfer, from leads on the thread scheduler 350 to leads on the core processor 242, a location in the IRAM for the next instruction of the reviving next thread. In the illustrated embodiment, the thread instruction channel 352 is 13 bits wide. A thread ID load channel is T bits wide to provide the thread ID for the next thread used as the additional T bits in the bank input 345 and data RAM input 347.
In the illustrated embodiment, the thread ID load channel 353 connects leads on the thread scheduler 350 to leads on the flip-flop register 349. When the thread switch signal is received at the flip-flop register 349, the thread ID is registered and stored by the flip-flop register on the leads connected to a thread ID input channel 336. The thread ID input channel 336 connects the flip-flop register 349 to T leads of the bank input 345 and T leads of the data RAM input 347. The thread switch signal is received at the flip-flop register 349 from the core processor 242 on a thread switch output channel 339. In the illustrated embodiment, the switch output channel 339 is 1 bit wide.
The value on the T additional leads in register bank input 345 and data RAM input 347 are provided on thread ID input channel 336. In an example embodiment, the thread ID input channel is connected to the T leads of the inputs 345 and 347 that correspond to the most significant bits. However, the invention is not limited to this choice. In other embodiments any T leads of inputs 345 and 347 are connected to the thread ID input channel 336, as long as none of those T leads are the same as the C leads connected to core-register access channel 236 or the D leads connected to core-data RAM access channel 238.
In embodiments in which different threads use different instructions, then the thread ID input channel 336 also connects to one or more bits for an input for an enlarged IRAM (not shown).
The thread ID field holds T bits that uniquely identify each thread that shares use of core processor 242. In the illustrated embodiment, the thread ID field 372 is 2 bits in size. The revival IRAM address field 374 holds data that indicates a location in IRAM 244 where the instruction resides that is to be executed next when the thread identified in the thread ID field 372 is switched back onto the core processor 242. In the illustrated embodiment, the revival IRAM address field 374 is 13 bits in size. The status field 376 holds data that indicates a status of the thread identified in the thread ID field 372. In the illustrated embodiment, the status field is 4 bits. Three bits are used to indicate thread state: (1) Idle, (2) Waiting for responses, (3) Ready, (4) Running, (5) Retired or complete. One bit is used to indicate priority.
Although fields 372, 374, 376 are shown as contiguous portions of an integral register 370 in the illustrated embodiment for purposes of illustration, in various other embodiments one or more fields or other portions of register 370 are stored as more or fewer fields in the same or more registers on or near the thread scheduler 350. In some embodiments, additional fields are included in the thread status register 370, or associated with the thread having the thread ID in field 372. For example, in some embodiments a priority field indicates a priority for the thread and an age rank field indicates how many threads preceded the thread in being switched off the core processor 242. In some embodiments another field with more than T bits is used as a thread descriptor in addition to the thread ID in thread ID field 372.
4.0 Method at Core Processor
The apparatus described above supports very fast switching among multiple threads on core processor 242, whether the core processor is a single threaded processor or a multi-threaded processor using conventional thread switching. In the latter case, the internal thread switching and thread scheduler is bypassed and, instead, the external thread scheduler 350 and switching is performed. In this section is described a method used on the core processor 242 to interact with the components of the apparatus 300.
In step 410, the core processor executes an instruction retrieved from the IRAM, which causes a thread switch condition. For example, the instruction requests one or more operations on a long-latency shared memory component. The programmer who wrote these instructions knows that after one or more such operations, the thread should switch off the core processor and so recognizes that the instruction causes a thread switch condition. In some embodiments, an interpreter or compiler recognizes the switch condition automatically.
In step 420, a prepare-to-switch signal is sent. The prepare-to-switch signal includes data that indicates the next instruction to execute when the thread is switched back onto the core processor and resumes processing. In addition, the prepare-to-switch signal includes data that indicates whether or not the current thread is completed processing (retired), or, if not retired, the priority of the thread when it is revived. For example, the following C language statements are used to generate the core processor instructions.
path_m: ... prepare_to_switch (path_n); ... next_thread = switch_thread(); jump next_thread; path_n: ...
At the end of the C statements indicated by the first ellipsis, a long-latency memory operation is issued. The statements through jump next_thread are executed before the current thread is switched off the processor. When the current thread is switched back on, the next instruction is generated by the C statement at the label path13 n. Therefore, the prepare_to_switch statement includes as an argument the path_n label. The compiler or interpreter, as is well known in the art, translates the C language label path_n to a processing instruction address in the IRAM.
In step 430, the processor waits sufficient time for the external thread scheduler 350 to determine the next thread. It is assumed for purposes of illustration that 6 cycles of a 500 MHz clock is sufficient for the external thread scheduler 350 to determine the next thread and to send the thread ID for the next thread to flip-flop register 349 over thread ID load channel 353. In the illustrated embodiment, the second ellipsis in the C language statements above stands for one or more statements that enforce this wait. Any instructions that have the desired effect may be used.
In step 440, a thread revival instruction location for the next thread is received. During the waiting time interval of step 430, the external thread scheduler 350 sends a location in IRAM for an instruction for the next thread to the core processor through thread instruction channel 352. In the illustrated embodiment, thread revival instruction location for the next thread is received, from the external thread scheduler 350, at leads on the core processor 242 connected to thread instruction channel 352.
In step 450 a send switch thread signal is sent. For example, a 1 bit signal is sent from core processor 242 through switch output channel 339 to the flip-flop register 249. As a result, the T bits that identify the next thread are placed on the thread ID input channel 336 by the flip-flop register 349. As a consequence, the portion of the register bank 346 and data RAM 348 associated with the next thread will be accessed as a result of subsequent addresses placed on core-register access channel 236 and core-data RAM access channel 238, respectively, by core processor 242.
In the illustrated embodiment, steps 440 and 450 are performed by the C language statement next_thread=switch_thread ( ). The routine call switch_thread ( ) causes the switch signal to be sent for step 450, and the routine switch_thread ( ) returns a value of the IRAM location received over channel 352. That IRAM location is stored in the C language variable next_thread.
In step 460, the instruction at the IRAM location for the next thread is retrieved and executed. For example, the C language statement jump next_thread retrieves and executes the instruction at the IRAM location stored in the C language variable next_thread.
In step 470 a register or data RAM location indicated in a retrieved instruction is accessed by the core processor using the C bits on channel 236 or D bits on channel 238, respectively, and relying on the T bits from channel 336 to indicate the appropriate portion of the register bank and data RAM, respectively. For example, if the retrieved instruction indicates that register 1001 (binary) is to be accessed and the thread scheduler has determined that the next thread has thread ID 10 (binary), then the contents of register 101001 (binary) are accessed. If the thread had thread ID 01 (binary) instead of 10 (binary), then the contents of register 011001 (binary) are accessed. Thus register contents for the next thread are accessed without moving contents into or out of 2c register locations (e.g., 16 register locations). Similarly, data RAM contents for the next thread are accessed without moving contents into or out of 2D data RAM locations (e.g., 4096 data RAM locations). Many clock cycles are save compared to conventional thread switching approaches.
5.0 Method at Thread Scheduler
In this section is described a method used on external thread scheduler 350 to interact with the components of the apparatus 300.
In step 520 a prepare-to-switch signal is received from the core processor 242. The prepare-to-switch signal includes data that indicates an IRAM location where resides an instruction to execute when the current thread is revived to switch back onto core processor 242. For example, a signal is received on thread preparation channel 351 that includes a IRAM location associated with the C language path_n label. It is assumed for purposes of illustration that the path_n label is associated with the 13 bit IRAM location 1000110001100 (binary). As described above, the prepare-to-switch signal includes data that indicates whether the thread is done, retired, or complete as well as priority when revived.
In step 524, the IRAM location for the thread revival instruction is stored in the thread status registers 360 in associations with a thread ID for the current thread. For example, if the thread ID of the current thread is 10 (binary), then the IRAM location is stored in the revival IRAM address field 374 in the thread status register 370 where the thread ID field 372 includes the value 10 (binary). It is further assumed for purposes of illustration that a value is also stored in the status field 376 for the same register, which indicates that the thread is active but ineligible.
In step 530, the external thread scheduler 350 determines the next thread in response to receiving the prepare-to-switch signal. As described above, any method may be used. In the illustrated embodiment, the next thread is the oldest eligible thread. It is assumed for purposes of illustration that the thread with thread ID 01 (binary) is the oldest eligible thread.
In step 534, the thread ID for the next thread is sent to determine the portion of the register bank and data RAM reserved for the next thread. For example, the two bit thread ID 01 is sent to the flip-flop register 349 over thread ID load channel 353. When a switch thread signal is later received at flip-flop register 349, the two bits 01 will be provided over thread ID input channel 336 to the register bank input 345 and data RAM input 347.
In step 540, the IRAM location for the thread revival instruction for the next thread is retrieved. It is assumed for purposes of illustration that the contents of revival IRAM address field 374 is 0100110001111 (binary) for the register in which the contents of the thread ID field is 01. Thus, during step 540 the value 0100110001111 is retrieved from the thread status registers 360.
In step 550, the IRAM location for the thread revival instruction for the next thread is sent to the core processor 242. For example the value 0100110001111 is sent over thread instruction channel 352 and thus provided to the leads on core processor 242 connected to channel 352. The core processor 242 uses this value to retrieve the next instruction from IRAM 244 after the switch thread signal is issued to the flip-flop register 349, as described above in method 400. If it is further assumed that the IRAM location 0100110001111 corresponds to C language statement path_m, then the core processor 242 begins executing thread 01 at C language statement path_m.
Using the apparatus 300 and method 400 at core processor 242 and method 500 at external thread scheduler 350, threads are switched at core processor 242 much faster, on the order of 6 clock cycles, than is possible using other approaches that require saving register values or other architecture state before switching threads. Furthermore, the 6 cycles between the “prepare to switch” and “switch” signals can filled with other useful instructions so that there is effectively no switch overhead. The thread switch time can be as small as the number of cycles for a taken branch.
6.0 Router Hardware Overview
Computer system 600 includes a communication mechanism such as a bus 610 for passing information between other internal and external components of the computer system 600. Information is represented as physical signals of a measurable phenomenon, typically electric voltages, but including, in other embodiments, such phenomena as magnetic, electromagnetic, pressure, chemical, molecular atomic and quantum interactions. For example, north and south magnetic fields, or a zero and non-zero electric voltage, represent two states (0, 1) of a binary digit (bit). A sequence of binary digits constitutes digital data that is used to represent a number or code for a character. A bus 610 includes many parallel conductors of information so that information is transferred quickly among devices coupled to the bus 610. One or more processors 602 for processing information are coupled with the bus 610. A processor 602 performs a set of operations on information. The set of operations include bringing information in from the bus 610 and placing information on the bus 610. The set of operations also typically include comparing two or more units of information, shifting positions of units of information, and combining two or more units of information, such as by addition or multiplication. A sequence of operations to be executed by the processor 602 constitute computer instructions.
Computer system 600 also includes a memory 604 coupled to bus 610. The memory 604, such as a random access memory (RAM) or other dynamic storage device, stores information including computer instructions. Dynamic memory allows information stored therein to be changed by the computer system 600. RAM allows a unit of information stored at a location called a memory address to be stored and retrieved independently of information at neighboring addresses. The memory 604 is also used by the processor 602 to store temporary values during execution of computer instructions. The computer system 600 also includes a read only memory (ROM) 606 or other static storage device coupled to the bus 610 for storing static information, including instructions, that is not changed by the computer system 600. Also coupled to bus 610 is a non-volatile (persistent) storage device 608, such as a magnetic disk or optical disk, for storing information, including instructions, that persists even when the computer system 600 is turned off or otherwise loses power.
The term computer-readable medium is used herein to refer to any medium that participates in providing information to processor 602, including instructions for execution. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as storage device 608. Volatile media include, for example, dynamic memory 604. Transmission media include, for example, coaxial cables, copper wire, fiber optic cables, and waves that travel through space without wires or cables, such as acoustic waves and electromagnetic waves, including radio, optical and infrared waves. Signals that are transmitted over transmission media are herein called carrier waves.
Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, a magnetic tape or any other magnetic medium, a compact disk ROM (CD-ROM), a digital video disk (DVD) or any other optical medium, punch cards, paper tape, or any other physical medium with patterns of holes, a RAM, a programmable ROM (PROM), an erasable PROM (EPROM), a FLASH-EPROM, or any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
Information, including instructions, is provided to the bus 610 for use by the processor from an external terminal 612, such as a terminal with a keyboard containing alphanumeric keys operated by a human user, or a sensor. A sensor detects conditions in its vicinity and transforms those detections into signals compatible with the signals used to represent information in computer system 600. Other external components of terminal 612 coupled to bus 610, used primarily for interacting with humans, include a display device, such as a cathode ray tube (CRT) or a liquid crystal display (LCD) or a plasma screen, for presenting images, and a pointing device, such as a mouse or a trackball or cursor direction keys, for controlling a position of a small cursor image presented on the display and issuing commands associated with graphical elements presented on the display of terminal 612. In some embodiments, terminal 612 is omitted.
Computer system 600 also includes one or more instances of a communications interface 670 coupled to bus 610. Communication interface 670 provides a two-way communication coupling to a variety of external devices that operate with their own processors, such as printers, scanners, external disks, and terminal 612. Firmware or software running in the computer system 600 provides a terminal interface or character-based command interface so that external commands can be given to the computer system. For example, communication interface 670 may be a parallel port or a serial port such as an RS-232 or RS-422 interface, or a universal serial bus (USB) port on a personal computer. In some embodiments, communications interface 670 is an integrated services digital network (ISDN) card or a digital subscriber line (DSL) card or a telephone modem that provides an information communication connection to a corresponding type of telephone line. In some embodiments, a communication interface 670 is a cable modem that converts signals on bus 610 into signals for a communication connection over a coaxial cable or into optical signals for a communication connection over a fiber optic cable. As another example, communications interface 670 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN, such as Ethernet. Wireless links may also be implemented. For wireless links, the communications interface 670 sends and receives electrical, acoustic or electromagnetic signals, including infrared and optical signals, which carry information streams, such as digital data. Such signals are examples of carrier waves
In the illustrated embodiment, special purpose hardware, such as an application specific integrated circuit (IC) 620, is coupled to bus 610. The special purpose hardware is configured to perform operations not performed by processor 602 quickly enough for special purposes. Examples of application specific ICs include graphics accelerator cards for generating images for display, cryptographic boards for encrypting and decrypting messages sent over a network, speech recognition, and interfaces to special external devices, such as robotic arms and medical scanning equipment that repeatedly perform some complex sequence of operations that are more efficiently implemented in hardware.
In the illustrated computer used as a router, the computer system 600 includes switching system 630 as special purpose hardware for switching information for flow over a network. Switching system 630 typically includes multiple communications interfaces, such as communications interface 670, for coupling to multiple other devices. In general, each coupling is with a network link 632 that is connected to another device in or attached to a network, such as local network 680 in the illustrated embodiment, to which a variety of external devices with their own processors are connected. In some embodiments an input interface or an output interface or both are linked to each of one or more external network elements. Although three network links 632 a, 632 b, 632 c are included in network links 632 in the illustrated embodiment, in other embodiments, more or fewer links are connected to switching system 630. Network links 632 typically provides information communication through one or more networks to other devices that use or process the information. For example, network link 632 b may provide a connection through local network 680 to a host computer 682 or to equipment 684 operated by an Internet Service Provider (ISP). ISP equipment 684 in turn provides data communication services through the public, world-wide packet-switching communication network of networks now commonly referred to as the Internet 690. A computer called a server 692 connected to the Internet provides a service in response to information received over the Internet. For example, server 692 provides routing information for use with switching system 630.
The switching system 630 includes logic and circuitry configured to perform switching functions associated with passing information among elements of network 680, including passing information received along one network link, e.g. 632 a, as output on the same or different network link, e.g., 632 c. The switching system 630 switches information traffic arriving on an input interface to an output interface according to pre-determined protocols and conventions that are well known. In some embodiments, switching system 630 includes its own processor and memory to perform some of the switching functions in software. In some embodiments, switching system 630 relies on processor 602, memory 604, ROM 606, storage 608, or some combination, to perform one or more switching functions in software. For example, switching system 630, in cooperation with processor 604 implementing a particular protocol, can determine a destination of a packet of data arriving on input interface on link 632 a and send it to the correct destination using output interface on link 632 c. The destinations may include host 682, server 692, other terminal devices connected to local network 680 or Internet 690, or other routing and switching devices in local network 680 or Internet 690.
The invention is related to the use of computer system 600 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 600 in response to processor 602 executing one or more sequences of one or more instructions contained in memory 604. Such instructions, also called software and program code, may be read into memory 604 from another computer-readable medium such as storage device 608. Execution of the sequences of instructions contained in memory 604 causes processor 602 to perform the method steps described herein. In alternative embodiments, hardware, such as application specific integrated circuit 620 and circuits in switching system 630, may be used in place of or in combination with software to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The signals transmitted over network link 632 and other networks through communications interfaces such as interface 670, which carry information to and from computer system 600, are exemplary forms of carrier waves. Computer system 600 can send and receive information, including program code, through the networks 680, 690 among others, through network links 632 and communications interfaces such as interface 670. In an example using the Internet 690, a server 692 transmits program code for a particular application, requested by a message sent from computer 600, through Internet 690, ISP equipment 684, local network 680 and network link 632 b through communications interface in switching system 630. The received code may be executed by processor 602 or switching system 630 as it is received, or may be stored in storage device 608 or other non-volatile storage for later execution, or both. In this manner, computer system 600 may obtain application program code in the form of a carrier wave.
Various forms of computer readable media may be involved in carrying one or more sequence of instructions or data or both to processor 602 for execution. For example, instructions and data may initially be carried on a magnetic disk of a remote computer such as host 682. The remote computer loads the instructions and data into its dynamic memory and sends the instructions and data over a telephone line using a modem. A modem local to the computer system 600 receives the instructions and data on a telephone line and uses an infra-red transmitter to convert the instructions and data to an infra-red signal, a carrier wave serving as the network link 632 b. An infrared detector serving as communications interface in switching system 630 receives the instructions and data carried in the infrared signal and places information representing the instructions and data onto bus 610. Bus 610 carries the information to memory 604 from which processor 602 retrieves and executes the instructions using some of the data sent with the instructions. The instructions and data received in memory 604 may optionally be stored on storage device 608, either before or after execution by the processor 602 or switching system 630.
7.0 Extensions and Alternatives
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4096571||8 Sep 1976||20 Jun 1978||Codex Corporation||System for resolving memory access conflicts among processors and minimizing processor waiting times for access to memory by comparing waiting times and breaking ties by an arbitrary priority ranking|
|US4400768||4 Jun 1980||23 Aug 1983||Burroughs Corporation||Parallel access computer memory system employing a power-of-two memory modules|
|US4918600||1 Aug 1988||17 Apr 1990||Board Of Regents, University Of Texas System||Dynamic address mapping for conflict-free vector access|
|US5088032||29 Jan 1988||11 Feb 1992||Cisco Systems, Inc.||Method and apparatus for routing communications among computer networks|
|US5247645||12 Mar 1991||21 Sep 1993||International Business Machines Corporation||Dynamic memory mapper which supports interleaving across 2N +1, 2.sup.NN -1 number of banks for reducing contention during nonunit stride accesses|
|US5394553||12 Jun 1991||28 Feb 1995||Lee Research, Inc.||High performance array processor with nonlinear skewing of elements|
|US5428803||10 Jul 1992||27 Jun 1995||Cray Research, Inc.||Method and apparatus for a unified parallel processing architecture|
|US5479624||14 Oct 1992||26 Dec 1995||Lee Research, Inc.||High-performance interleaved memory system comprising a prime number of memory modules|
|US5561669||26 Oct 1994||1 Oct 1996||Cisco Systems, Inc.||Computer network switching system with expandable number of ports|
|US5561784||29 Jun 1994||1 Oct 1996||Cray Research, Inc.||Interleaved memory access system having variable-sized segments logical address spaces and means for dividing/mapping physical address into higher and lower order addresses|
|US5613114 *||15 Apr 1994||18 Mar 1997||Apple Computer, Inc||System and method for custom context switching|
|US5617421||17 Jun 1994||1 Apr 1997||Cisco Systems, Inc.||Extended domain computer network using standard links|
|US5724600||8 Sep 1995||3 Mar 1998||Fujitsu Limited||Parallel processor system|
|US5740171||28 Mar 1996||14 Apr 1998||Cisco Systems, Inc.||Address translation mechanism for a high-performance network switch|
|US5742604||28 Mar 1996||21 Apr 1998||Cisco Systems, Inc.||Interswitch link mechanism for connecting high-performance network switches|
|US5764536||19 Dec 1995||9 Jun 1998||Omron Corporation||Method and device to establish viewing zones and to inspect products using viewing zones|
|US5787255||12 Apr 1996||28 Jul 1998||Cisco Systems, Inc.||Internetworking device with enhanced protocol translation circuit|
|US5787485||17 Sep 1996||28 Jul 1998||Marathon Technologies Corporation||Producing a mirrored copy using reference labels|
|US5796732||28 Mar 1996||18 Aug 1998||Cisco Technology, Inc.||Architecture for an expandable transaction-based switching bus|
|US5838915||17 Nov 1997||17 Nov 1998||Cisco Technology, Inc.||System for buffering data in the network having a linked list for each of said plurality of queues|
|US5852607||26 Feb 1997||22 Dec 1998||Cisco Technology, Inc.||Addressing mechanism for multiple look-up tables|
|US5909550||16 Oct 1996||1 Jun 1999||Cisco Technology, Inc.||Correlation technique for use in managing application-specific and protocol-specific resources of heterogeneous integrated computer network|
|US5982655||29 Sep 1998||9 Nov 1999||Cisco Technology, Inc.||Method and apparatus for support of multiple memory types in a single memory socket architecture|
|US6026464||24 Jun 1997||15 Feb 2000||Cisco Technology, Inc.||Memory control system and method utilizing distributed memory controllers for multibank memory|
|US6119215||29 Jun 1998||12 Sep 2000||Cisco Technology, Inc.||Synchronization and control system for an arrayed processing engine|
|US6148325||30 Jun 1994||14 Nov 2000||Microsoft Corporation||Method and system for protecting shared code and data in a multitasking operating system|
|US6178429||26 Nov 1997||23 Jan 2001||Cisco Technology, Inc.||Mechanism for ensuring SCM database consistency on multi-part operation boundaries|
|US6195107||29 May 1998||27 Feb 2001||Cisco Technology, Inc.||Method and system for utilizing virtual memory in an embedded system|
|US6222380||11 Jun 1999||24 Apr 2001||International Business Machines Corporation||High speed parallel/serial link for data communication|
|US6272520 *||31 Dec 1997||7 Aug 2001||Intel Corporation||Method for detecting thread switch events|
|US6272621||18 Aug 2000||7 Aug 2001||Cisco Technology, Inc.||Synchronization and control system for an arrayed processing engine|
|US6308219||31 Jul 1998||23 Oct 2001||Cisco Technology, Inc.||Routing table lookup implemented using M-trie having nodes duplicated in multiple memory banks|
|US6430242||11 Jun 1999||6 Aug 2002||International Business Machines Corporation||Initialization system for recovering bits and group of bits from a communications channel|
|US6470376 *||3 Mar 1998||22 Oct 2002||Matsushita Electric Industrial Co., Ltd||Processor capable of efficiently executing many asynchronous event tasks|
|US6487202||30 Jun 1997||26 Nov 2002||Cisco Technology, Inc.||Method and apparatus for maximizing memory throughput|
|US6487591||8 Dec 1998||26 Nov 2002||Cisco Technology, Inc.||Method for switching between active and standby units using IP swapping in a telecommunication network|
|US6505269||16 May 2000||7 Jan 2003||Cisco Technology, Inc.||Dynamic addressing mapping to eliminate memory resource contention in a symmetric multiprocessor system|
|US6529983||3 Nov 1999||4 Mar 2003||Cisco Technology, Inc.||Group and virtual locking mechanism for inter processor synchronization|
|US6535963||30 Jun 1999||18 Mar 2003||Cisco Technology, Inc.||Memory apparatus and method for multicast devices|
|US6587955||1 Feb 2000||1 Jul 2003||Sun Microsystems, Inc.||Real time synchronization in multi-threaded computer systems|
|US6611217||17 May 2002||26 Aug 2003||International Business Machines Corporation||Initialization system for recovering bits and group of bits from a communications channel|
|US6662252||8 Dec 2002||9 Dec 2003||Cisco Technology, Inc.||Group and virtual locking mechanism for inter processor synchronization|
|US6681341||3 Nov 1999||20 Jan 2004||Cisco Technology, Inc.||Processor isolation method for integrated multi-processor systems|
|US6708258||14 Jun 2001||16 Mar 2004||Cisco Technology, Inc.||Computer system for eliminating memory read-modify-write operations during packet transfers|
|US6718448||28 Nov 2000||6 Apr 2004||Emc Corporation||Queued locking of a shared resource using multimodal lock types|
|US6728839||28 Oct 1998||27 Apr 2004||Cisco Technology, Inc.||Attribute based memory pre-fetching technique|
|US6757768||17 May 2001||29 Jun 2004||Cisco Technology, Inc.||Apparatus and technique for maintaining order among requests issued over an external bus of an intermediate network node|
|US6770889||24 Feb 2003||3 Aug 2004||Nissin Electric Co., Ltd.||Method of controlling electrostatic lens and ion implantation apparatus|
|US6795901||17 Dec 1999||21 Sep 2004||Alliant Techsystems Inc.||Shared memory interface with conventional access and synchronization support|
|US6801997||23 May 2002||5 Oct 2004||Sun Microsystems, Inc.||Multiple-thread processor with single-thread interface shared among threads|
|US6804162||5 Apr 2002||12 Oct 2004||T-Ram, Inc.||Read-modify-write memory using read-or-write banks|
|US6804815||18 Sep 2000||12 Oct 2004||Cisco Technology, Inc.||Sequence control mechanism for enabling out of order context processing|
|US6832279||17 May 2001||14 Dec 2004||Cisco Systems, Inc.||Apparatus and technique for maintaining order among requests directed to a same address on an external bus of an intermediate network node|
|US6839797||21 Dec 2001||4 Jan 2005||Agere Systems, Inc.||Multi-bank scheduling to improve performance on tree accesses in a DRAM based random access memory subsystem|
|US6845501 *||27 Jul 2001||18 Jan 2005||Hewlett-Packard Development Company, L.P.||Method and apparatus for enabling a compiler to reduce cache misses by performing pre-fetches in the event of context switch|
|US6876961||27 Aug 1999||5 Apr 2005||Cisco Technology, Inc.||Electronic system modeling using actual and approximated system properties|
|US6895481||3 Jul 2002||17 May 2005||Cisco Technology, Inc.||System and method for decrementing a reference count in a multicast environment|
|US6918116||15 May 2001||12 Jul 2005||Hewlett-Packard Development Company, L.P.||Method and apparatus for reconfiguring thread scheduling using a thread scheduler function unit|
|US6920562||18 Dec 1998||19 Jul 2005||Cisco Technology, Inc.||Tightly coupled software protocol decode with hardware data encryption|
|US6947425||27 Jul 2000||20 Sep 2005||Intel Corporation||Multi-threaded sequenced transmit software for packet forwarding device|
|US6965615||18 Sep 2000||15 Nov 2005||Cisco Technology, Inc.||Packet striping across a parallel header processor|
|US6970435||11 Jun 1999||29 Nov 2005||International Business Machines Corporation||Data alignment compensator|
|US6973521||16 May 2000||6 Dec 2005||Cisco Technology, Inc.||Lock controller supporting blocking and non-blocking requests|
|US6986022||16 Oct 2001||10 Jan 2006||Cisco Technology, Inc.||Boundary synchronization mechanism for a processor of a systolic array|
|US7047370||14 Jan 2003||16 May 2006||Cisco Technology, Inc.||Full access to memory interfaces via remote request|
|US7100021||16 Oct 2001||29 Aug 2006||Cisco Technology, Inc.||Barrier synchronization mechanism for processors of a systolic array|
|US7124231||14 Jun 2002||17 Oct 2006||Cisco Technology, Inc.||Split transaction reordering circuit|
|US7139899||3 Sep 1999||21 Nov 2006||Cisco Technology, Inc.||Selected register decode values for pipeline stage register addressing|
|US7155576||27 May 2003||26 Dec 2006||Cisco Technology, Inc.||Pre-fetching and invalidating packet information in a cache memory|
|US7155588||12 Aug 2002||26 Dec 2006||Cisco Technology, Inc.||Memory fence with background lock release|
|US7174394||14 Jun 2002||6 Feb 2007||Cisco Technology, Inc.||Multi processor enqueue packet circuit|
|US7185224||10 Dec 2003||27 Feb 2007||Cisco Technology, Inc.||Processor isolation technique for integrated multi-processor systems|
|US7194568||21 Mar 2003||20 Mar 2007||Cisco Technology, Inc.||System and method for dynamic mirror-bank addressing|
|US7254687||16 Dec 2002||7 Aug 2007||Cisco Technology, Inc.||Memory controller that tracks queue operations to detect race conditions|
|US7257681||11 Jun 2003||14 Aug 2007||Cisco Technology, Inc.||Maintaining entity order with gate managers|
|US7290096||13 Apr 2006||30 Oct 2007||Cisco Technology, Inc.||Full access to memory interfaces via remote request|
|US7290105||16 Dec 2002||30 Oct 2007||Cisco Technology, Inc.||Zero overhead resource locks with attributes|
|US7302548||18 Jun 2002||27 Nov 2007||Cisco Technology, Inc.||System and method for communicating in a multi-processor environment|
|US7346059||8 Sep 2003||18 Mar 2008||Cisco Technology, Inc.||Header range check hash circuit|
|US7411957||26 Mar 2004||12 Aug 2008||Cisco Technology, Inc.||Hardware filtering support for denial-of-service attacks|
|US7434016||17 Nov 2006||7 Oct 2008||Cisco Technology, Inc.||Memory fence with background lock release|
|US7447872||30 May 2002||4 Nov 2008||Cisco Technology, Inc.||Inter-chip processor control plane communication|
|US7461180||8 May 2006||2 Dec 2008||Cisco Technology, Inc.||Method and apparatus for synchronizing use of buffer descriptor entries for shared data packets in memory|
|US7464243||21 Dec 2004||9 Dec 2008||Cisco Technology, Inc.||Method and apparatus for arbitrarily initializing a portion of memory|
|US7623455||24 Nov 2009||Cisco Technology, Inc.||Method and apparatus for dynamic load balancing over a network link bundle|
|US7640355||29 Dec 2009||Cisco Technology, Inc.||Supplemental queue sampling technique for packet scheduling|
|US7848332||15 Nov 2004||7 Dec 2010||Cisco Technology, Inc.||Method and apparatus for classifying a network protocol and aligning a network protocol header relative to cache line boundary|
|US20010001871||19 Jan 2001||24 May 2001||Shrader Steven L.||Storage management system and auto-RAID transaction manager for coherent memory map across hot plug interface|
|US20030048209||17 May 2002||13 Mar 2003||Brian Buchanan||Initialization system for recovering bits and group of bits from a communications channel|
|US20030058277||31 Aug 1999||27 Mar 2003||Bowman-Amuah Michel K.||A view configurer in a presentation services patterns enviroment|
|US20030159021||3 Sep 1999||21 Aug 2003||Darren Kerr||Selected register decode values for pipeline stage register addressing|
|US20030225995||30 May 2002||4 Dec 2003||Russell Schroter||Inter-chip processor control plane communication|
|US20040139441 *||8 Jan 2004||15 Jul 2004||Kabushiki Kaisha Toshiba||Processor, arithmetic operation processing method, and priority determination method|
|US20040186945||21 Mar 2003||23 Sep 2004||Jeter Robert E.||System and method for dynamic mirror-bank addressing|
|US20040187112 *||7 Mar 2003||23 Sep 2004||Potter Kenneth H.||System and method for dynamic ordering in a network processor|
|US20040213235||8 Apr 2003||28 Oct 2004||Marshall John W.||Programmable packet classification system using an array of uniform content-addressable memories|
|US20040252710||11 Jun 2003||16 Dec 2004||Jeter Robert E.||Maintaining entity order with gate managers|
|US20050010690||25 Jun 2003||13 Jan 2005||Marshall John W.||System and method for modifying data transferred from a source to a destination|
|US20050100017||12 Nov 2003||12 May 2005||Cisco Technology, Inc., A California Corporation||Using ordered locking mechanisms to maintain sequences of items such as packets|
|US20050171937||2 Feb 2004||4 Aug 2005||Hughes Martin W.||Memory efficient hashing algorithm|
|US20050213570||26 Mar 2004||29 Sep 2005||Stacy John K||Hardware filtering support for denial-of-service attacks|
|US20060104268||15 Nov 2004||18 May 2006||Recordation Form Cover Sheet: Credit Card Payment Form for the amount of $1796.00.||Method and apparatus for classifying a network protocol and aligning a network protocol header relative to cache line boundary|
|US20060117316||24 Nov 2004||1 Jun 2006||Cismas Sorin C||Hardware multithreading systems and methods|
|US20060136682||21 Dec 2004||22 Jun 2006||Sriram Haridas||Method and apparatus for arbitrarily initializing a portion of memory|
|US20060184753||13 Apr 2006||17 Aug 2006||Jeter Robert E Jr||Full access to memory interfaces via remote request|
|US20060221974||2 Apr 2005||5 Oct 2006||Cisco Technology, Inc.||Method and apparatus for dynamic load balancing over a network link bundle|
|US20070067592||17 Nov 2006||22 Mar 2007||Jeter Robert E Jr||Memory fence with background lock release|
|US20070095368||27 Oct 2005||3 May 2007||Honeywell International Inc.||Methods of removing a conformal coating, related processes, and articles|
|US20070169001||29 Nov 2005||19 Jul 2007||Arun Raghunath||Methods and apparatus for supporting agile run-time network systems via identification and execution of most efficient application code in view of changing network traffic conditions|
|US20070220232||23 May 2007||20 Sep 2007||John Rhoades||Data Processing Architectures|
|US20070226739||22 Dec 2005||27 Sep 2007||Dan Dodge||Process scheduler employing adaptive partitioning of process threads|
|US20070283357||5 Jun 2006||6 Dec 2007||Cisco Technology, Inc.||Techniques for reducing thread overhead for systems with multiple multi-theaded processors|
|US20070294694||16 Jun 2006||20 Dec 2007||Cisco Technology, Inc.||Techniques for hardware-assisted multi-threaded processing|
|US20080005296||8 May 2006||3 Jan 2008||Cisco Technology, Inc.||Method and apparatus for synchronizing use of buffer descriptor entries|
|US20080013532||11 Jul 2006||17 Jan 2008||Cisco Technology, Inc.||Apparatus for hardware-software classification of data packet flows|
|US20080077926||27 Sep 2006||27 Mar 2008||Robert Jeter||Multi-threaded Processing Using Path Locks|
|EP0744696B1||21 May 1996||24 Oct 2001||LSI Logic Corporation||Data transfer method and apparatus|
|1||PCT International Preliminary Report on Patentability mailed Sep. 23, 2005 for PCT/US04/05522; 11 pages.|
|2||PCT International Search Report mailed Oct. 18, 2004 for PCT/US04/05522; 3 pages.|
|3||USPTO Dec. 30, 2010 Nonfinal Rejection from U.S. Appl. No. 11/535,956.|
|4||USPTO Final Rejection mailed Mar. 16, 2011 from U.S. Appl. No. 11/446,609.|
|5||USPTO Jun. 23, 2022 Second Notice of Allowance from U.S. Appl. No. 11/535,956.|
|6||USPTO Jun. 3, 2011 Notice of Allowance from U.S. Appl. No. 11/535,956.|
|7||USPTO Jun. 30, 2011 Request for Continued Examination filed in response to Mar. 16, 2011 Final Rejection from U.S. Appl. No. 11/446,609.|
|8||USPTO Response mailed Mar. 30, 2011 to Dec. 30, 2010 Nonfinal Rejection from U.S. Appl. No. 11/535,956.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US9218185||27 Mar 2014||22 Dec 2015||International Business Machines Corporation||Multithreading capability information retrieval|
|US9354883||27 Mar 2014||31 May 2016||International Business Machines Corporation||Dynamic enablement of multithreading|
|US9389862||27 Mar 2014||12 Jul 2016||International Business Machines Corporation||Thread context restoration in a multithreading computer system|
|U.S. Classification||712/228, 718/108|
|International Classification||G06F9/46, G06F9/40|
|Cooperative Classification||G06F9/30123, G06F9/3851, G06F9/462|
|European Classification||G06F9/46G2, G06F9/38E4, G06F9/30R5C|
|16 Jun 2006||AS||Assignment|
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JETER, ROBERT;GOSS, GREGORY;GAMER, TRAVOR;AND OTHERS;REEL/FRAME:018431/0099;SIGNING DATES FROM 20060606 TO 20060607
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JETER, ROBERT;GOSS, GREGORY;GAMER, TRAVOR;AND OTHERS;SIGNING DATES FROM 20060606 TO 20060607;REEL/FRAME:018431/0099
|5 Jan 2009||AS||Assignment|
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA
Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INVENTOR NAME SPELLING FROM TREVOR GAMER TO TREVOR GARNER PREVIOUSLY RECORDED ON REEL 018431 FRAME 0099;ASSIGNORS:JETER, ROBERT;GARNER, TREVOR;LEE, WILLIAM;AND OTHERS;REEL/FRAME:022058/0262;SIGNING DATES FROM 20060606 TO 20060607
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA
Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INVENTOR NAME SPELLING FROM TREVOR GAMER TO TREVOR GARNER PREVIOUSLY RECORDED ON REEL 018431 FRAME 0099. ASSIGNOR(S) HEREBY CONFIRMS THE SELL, ASSIGN AND TRANSFER;ASSIGNORS:JETER, ROBERT;GARNER, TREVOR;LEE, WILLIAM;AND OTHERS;SIGNING DATES FROM 20060606 TO 20060607;REEL/FRAME:022058/0262
|20 Apr 2015||FPAY||Fee payment|
Year of fee payment: 4