US20040154010A1 - Control-quasi-independent-points guided speculative multithreading - Google Patents
Control-quasi-independent-points guided speculative multithreading Download PDFInfo
- Publication number
- US20040154010A1 US20040154010A1 US10/356,435 US35643503A US2004154010A1 US 20040154010 A1 US20040154010 A1 US 20040154010A1 US 35643503 A US35643503 A US 35643503A US 2004154010 A1 US2004154010 A1 US 2004154010A1
- Authority
- US
- United States
- Prior art keywords
- instructions
- speculative
- thread
- spawning
- speculative thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 84
- 238000004364 calculation method Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000005192 partition Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 23
- RQLIWGALGGOPGM-OAQYLSRUSA-N N-[(1R)-1-(4-methoxyphenyl)-2-oxo-2-(4-trimethylsilylanilino)ethyl]-N-methyl-3-oxo-1,2-oxazole-5-carboxamide Chemical compound OC1=NOC(=C1)C(=O)N(C)[C@@H](C(NC1=CC=C(C=C1)[Si](C)(C)C)=O)C1=CC=C(C=C1)OC RQLIWGALGGOPGM-OAQYLSRUSA-N 0.000 description 22
- 238000013459 approach Methods 0.000 description 9
- 230000001419 dependent effect Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/3009—Thread control instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
- G06F9/3832—Value prediction for operands; operand history buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
Definitions
- the present invention relates generally to information processing systems and, more specifically, to spawning of speculative threads for speculative multithreading.
- multithreading an instruction stream is split into multiple instruction streams that can be executed in parallel.
- software-only multithreading approaches such as time-multiplex multithreading or switch-on-event multithreading, the multiple instruction streams are alternatively executed on the same shared processor.
- processors in a multi-processor system such as a chip multiprocessor (“CMP”) system, may each act on one of the multiple threads simultaneously.
- simultaneous multithreading a single physical processor is made to appear as multiple logical processors to operating systems and user programs. That is, each logical processor maintains a complete set of the architecture state, but nearly all other resources of the physical processor, such as caches, execution units, branch predictors control logic and buses are shared. The threads execute simultaneously and make better use of shared resources than time-multiplex multithreading or switch-on-event multithreading.
- one or more threads may be idle during execution of a single-threaded application.
- Utilizing otherwise idle threads to speculatively parallelize the single-threaded application can increase speed of execution, but it is often-times difficult to determine which sections of the single-threaded application should be speculatively executed by the otherwise idle thread.
- Speculative thread execution of a portion of code is only beneficial if the application's control-flow ultimately reaches that portion of code.
- speculative thread execution can be delayed, and rendered less effective, due to latencies associated with data fetching.
- FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multithreading.
- FIG. 2 is a flowchart illustrating at least one embodiment of a method for identifying control-quasi-independent-points for speculative multithreading.
- FIG. 3 is a data flow diagram showing at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multi threading.
- FIG. 4 is a flowchart illustrating at least one embodiment of a software compilation process.
- FIG. 5 is a flowchart illustrating at least one embodiment of a method for generating instructions to precompute speculative-thread's live-in values for control-quasi-independent-points guided speculative multithreading.
- FIGS. 6 and 7 are flowcharts illustrating at least one embodiment of a method for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values.
- FIG. 8 is a block diagram of a processing system capable of performing at least one embodiment of control-quasi-independent-points guided speculative multithreading.
- FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions to facilitate control-quasi-independent-points (“CQIP”) guided speculative multithreading.
- CQIP control-quasi-independent-points
- instructions are generated to reduce the execution time in a single-threaded application through the use of one or more simultaneous speculative threads.
- the method 100 thus facilitates the parallelization of a portion of an application's code through the use of the simultaneous speculative threads.
- a speculative thread referred to as the spawnee thread, executes instructions that are ahead of the code being executed by the thread that performed the spawn.
- the thread that performed the spawn is referred to as the spawner thread.
- the spawnee thread is an SMT thread that is executed by a second logical processor on the same physical processor as the spawner thread.
- the method 100 may be utilized in any multithreading approach, including SMT, CMP multithreading or other multiprocessor multithreading, or any other known multithreading approach that may encounter idle thread contexts.
- the method 100 of FIG. 1 determines spawn points based on control independency, yet makes provision for handling data flow dependency among parallel threads.
- the following discussion explains that the method 100 selects thread spawning points based on an analysis of control independence, in an effort to achieve speculative parallelization with minimal misspecualtion in relation to control flow.
- the method addresses data flow dependency in that live-in values are supplied. For at least one embodiment, live-in values are predicted using a value prediction approach. In at least one other embodiment, live-in values are pre-computed using speculative precomputation based on backward dependency analysis.
- FIG. 1 illustrates that a method 100 for generating instructions to facilitate CQIP-guided multithreading includes identification 10 of spawning pairs that each include a spawn point and a CQIP.
- the method 100 provides for calculation of live-in values for data dependences in the helper thread to be spawned.
- instructions are generated such that, when the instructions are executed by a processor, a speculative thread is spawned and speculatively executes a selected portion of the application's code.
- FIG. 2 is a flowchart further illustrating at least one embodiment of identification 10 of control-quasi-independent-points for speculative multithreading.
- FIG. 2 illustrates that the method 10 performs 210 profile analysis.
- a control flow graph (see, e.g., 330 of FIG. 3) is generated to represent flow of control among the basic blocks associated with the application.
- the method 10 then computes 220 reaching probabilities. That is, the method 10 computes 220 the probability that a second basic block will be reached during execution of the source program, if a first basic block is executed.
- Candidate basic blocks are identified 230 as potential spawn pairs based on the reaching probabilities previously computed 220 .
- the candidates are evaluated according to selected metrics in order to select one or more spawning pairs.
- Each of blocks 210 (performing profile analysis), 220 (computing reaching probabilities), 230 (identifying candidate basic blocks), and 240 (selecting spawning pair) are described in further detail below in connection with FIG. 3.
- FIG. 3 is a data flow diagram. The flow of data is represented in relation to an expanded flowchart that incorporates the actions illustrated in both FIGS. 1 and 2.
- FIG. 3 illustrates that, for at least one embodiment of the method 100 illustrated in FIG. 1, certain data is consulted, and certain other data is generated, during execution of the method 100 .
- FIG. 3 illustrates that a profile 325 is accessed to aid in profile analysis 210 .
- a control flow graph 330 (“CFG”) is accessed to aid in computation 220 of reaching probabilities.
- FIG. 4 illustrates that the profile 325 is typically generated by one or more compilation passes prior to execution of the method.
- a typical compilation process 400 is represented.
- the process 400 involves two compiler-performed passes 405 , 410 and also involves a test run 407 that is typically initiated by a user, such as a software programmer.
- the compiler e.g., 808 in FIG. 8
- the compiler then generates instrumented binary code 420 that corresponds to the source code 415 .
- the instrumented binary code 420 includes, in addition to the binary for the source code 415 instructions, extra binary code that causes, during a run of the instrumented code 420 , statistics to be collected and recorded in a profile 325 and a call graph 424 .
- the profile 325 and call graph 424 are generated.
- the profile 325 is used as an input into the compiler and a binary code file 340 is generated.
- the profile 325 may be used, for example, by the compiler during the normal compilation pass 410 to aid with performance enhancements such as speculative branch prediction.
- each of the passes 405 , 410 , and the test run 407 are optional to the method 100 in that any method of generating the information represented by profile 325 may be utilized. Accordingly, first pass 405 and normal pass 410 , as well as test run 407 , are depicted with broken lines in FIG. 4 to indicate their optional nature.
- any method of generating the information represented by profile 325 may be utilized, and that the actions 405 , 407 , 410 depicted in FIG. 4 are provided for illustrative purposes only.
- the method 100 described herein may be applied, in an alternative embodiment, to a binary file. That is, the profile 325 may be generated for a binary file rather than a high-level source code file, and the profile analysis 210 (FIG. 2) may be performed using such binary-based profile as an input.
- the profile analysis 210 utilizes the profile 325 as an input and generates a control flow graph 330 as an output.
- the method 100 builds the CFG 330 during the profile analysis 210 such that each node of the CFG 330 represents a basic block of the source program. Edges between nodes of the CFG 330 represent possible control flows among the basic blocks. For at least one embodiment, edges of the CFG 330 are weighted with the frequency that the corresponding control flow has been followed (as reflected in the profile 325 ). Accordingly, the edges are weighted by the probability that one basic block follows the other, without revisiting the latter node. In contrast to other CFG representations, such as “edge profiling” which represents only intra-procedural edges, at least one embodiment of the CFG 330 created during profile analysis 210 includes representation of inter-procedural edges.
- the CFG 330 is pruned to simplify the CFG 330 and control its size.
- the least frequently executed basic blocks are pruned from the CFG 330 .
- the weights of the edges to a block are used to determine the basic block's execution count.
- the basic blocks are ordered by execution count, and are selected to remain in the CFG 330 according to their execution count.
- the basic blocks are chosen from highest to lower execution count until a predetermined threshold percentage of the total executed instructions are included in the CFG 330 . Accordingly, after weighting and pruning, the most frequently-executed basic blocks are represented in the CFG 330 .
- the predetermined threshold percentage of executed instructions chosen to remain in the CFG 330 during profile analysis 20 is ninety (90) percent.
- the threshold may be varied to numbers higher or lower than ninety percent, based on factors such as application requirements and/or machine resource availability. For instance, if a relatively large number of hardware thread contexts are supported by the machine resources, then a lower threshold may be chosen in order to facilitate more aggressive speculation.
- an edge from a predecessor to the pruned node is transformed to one or more edges from that predecessor to the node's successor(s).
- an edge from the pruned node to a successor is transformed to one or more edges from the pruned node's predecessor(s) to the successor. If, during this transformation, an edge is transformed into multiple edges, the weight of the original edge is proportionally apportioned across the new edges.
- FIG. 3 illustrates that the CFG 330 produced during profile analysis 210 is utilized to compute 220 reaching probabilities.
- reaching probability computation 220 utilizes the profile CFG 330 as an input and generates a reaching probability matrix 335 as an output.
- the “reaching probability” is the probability that a second basic block will be reached after execution of a first basic block, without revisiting the first basic block.
- the reaching probabilities computed at block 220 are stored in a two-dimensional square matrix 335 that has as many rows and columns as nodes in the CFG 330 . Each element of the matrix represents the probability to execute the basic block represented by the column after execution of the basic block represented by the row.
- this probability is computed as the sum of the frequencies for all the various sequences of basic blocks that exist from the source node to the destination node.
- a constraint is imposed such that the source and destination nodes may only appear once in the sequence of nodes as the first and last nodes, respectively, and may not appear again as intermediate nodes. (For determining the probability of reaching a basic block again after it has been executed, the basic block will appear twice—as both the source and destination nodes). Other basic blocks are permitted to appear more than once in the sequence.
- the reaching probability matrix 335 is traversed to evaluate pairs of basic blocks and identify those that are candidates for a spawning pair.
- spawning pair refers to a pair of instructions associated with the source program.
- One of the instructions is a spawn point, which is an instruction within a first basic block.
- the spawn point is the first instruction of the first basic block.
- the other instruction is a target point and is, more specifically, a control quasi-independent point (“CQIP”).
- CQIP is an instruction within a second basic block.
- the CQIP is the first instruction of the second basic block.
- a spawn point is the instruction in the source program that, when reached, will activate creation of a speculative thread at the CQIP, where the speculative thread will start its execution.
- the first block includes a potential spawn point
- the second block includes a potential CQIP.
- An instruction (such as the first instruction) of the basic block for the row is the potential spawn point.
- An instruction (such as the first instruction) of the basic block for the column is the potential CQIP.
- Each element of the reaching probability matrix 335 is evaluated, and those elements that satisfy certain selection criteria are chosen as candidates for spawning pairs. For at least one embodiment, the elements are evaluated to determine those pairs whose probability is higher than a certain predetermined threshold; that is, the probability to reach the control quasi-independent point after execution of the spawn point is higher than a given threshold.
- This criterion is designed to minimize spawning of speculative threads that are not executed.
- a pair of basic blocks associated with an element of the reaching probability matrix 335 is considered as a candidate for a spawning pair if its reaching probability is higher than 0.95
- a second criterion for selection of a candidate spawning pair is the average number of instructions between the spawn point and the CQIP. Ideally, a minimum average number of instructions should exist between the spawning point and the CQIP in order to reduce the relative overhead of thread creation. If the distance is too small, the overhead of thread creation may outweigh the benefit of run-ahead execution because the speculative thread will not run far enough ahead. For at least one embodiment, a pair of basic blocks associated with an element of the reaching probability matrix 335 is considered as a candidate for a spawning pair if the average number of instructions between then is greater than 32 instructions.
- Distance between the basic blocks may be additionally stored in the matrix 335 and considered in the identification 230 of spawning pair candidates. For at least one embodiment, this additional information may be calculated during profile analysis 210 and included in each element of the reaching probability matrix 335 . The average may be calculated as the sum of the number of instructions executed by each sequence of basic blocks, multiplied by their frequency.
- the spawning pair candidates are evaluated based on analysis of one or more selected metrics. These metrics may be prioritized. Based on the evaluation of the candidate spawning pairs in relation to the prioritized metrics, one or more spawning pairs are selected.
- the metrics utilized at block 240 may include the minimum average distance between the basic blocks of the potential spawning pair (described above), as well as an evaluation of mispredicted branches, load misses and/or instruction cache misses.
- the metrics may also include additional considerations.
- One such additional consideration is the maximum average distance between the basic blocks of the potential spawning pair. It should be noted that there are also potential performance penalties involved with having the average number of instructions between the spawn point and CQIP be too large. Accordingly, the selection of spawning pairs may also impose a maximum average distance. If the distance between the pair is too large, the speculative thread may incur stalls in a scheme where the speculative thread has limited storage for speculative values.
- speculative threads may incur stalls in a scheme where the speculative thread cannot commit its states until it becomes the non-speculative thread (see discussion of “join point” in connection with FIGS. 6 and 7, below). Such stalls are likely to result in ineffective holding of critical resources that otherwise would be used by non-speculative threads to make forward progress.
- the speculative thread includes in relation to the application code between the spawning point and the CQIP.
- the average number of speculative thread instructions dependent on values generated by a previous thread should be relatively low.
- a smaller number of dependent instructions allow for more timely computation of the live-in values for the speculative thread.
- a relatively high number of the live-in values for the speculative thread are value-predictable.
- value-predictability of the live-in values facilitates faster communication of live-in values, thus minimizing overhead of spawning while also allowing correctness and accuracy of speculative thread computation.
- the candidate spawning pairs identified at block 230 may include several good candidates for CQIP's associated with a given spawn point. That is, for a given row of the reaching probability matrix 335 , more than one element may be selected as a candidate spawning pair. In such case, during the metrics evaluation at block 240 , the best CQIP for the spawn point is selected because, for a given spawn point, a speculative thread will be spawned at only one CQIP. In order to choose the best CQIP for a given spawn point, the potential CQIP's identified at block 230 are prioritized according to the expected benefit.
- more than one CQIP can be chosen for a corresponding spawn point.
- multiple concurrent, albeit mutually exclusive, speculative threads may be spawned and executed simultaneously to perform “eager” execution of speculative threads.
- the spawning condition for these multiple CQIPs can be examined and verified, after the speculative threads have been executed, to determine the effectiveness of the speculation. If one of these multiple speculative threads proves to be good speculation, and another bad, then the results of the former can be reused by the main thread while the results of the latter may be discarded.
- At least one embodiment of the method 100 selects 240 CALL return point pairs (pairs of subroutine calls and the return points) if they satisfy the minimum size constraint. These pairs might not otherwise be selected at block 240 because the reaching probability for such pairs is sometimes too low to satisfy the selection criteria discussed above in connection with candidate identification 230 .
- a subroutine is called from multiple locations, it will have multiple predecessors and multiple successors in the CFG 330 . If all the calls are executed a similar number of times, the reaching probability of any return point pair will be low since the graph 330 will have multiple paths with similar weights.
- the method 100 provides for calculation of live-in values for the speculative thread to be executed at the CQIP.
- provide for it is meant that instructions are generated, wherein execution of the generated instructions, possibly in conjunction with some special hardware support, will result in calculation of a predicted live-value to be used as an input by the spawnee thread.
- block 50 might determine that no live-in values are necessary. In such case, “providing for” calculation of live-in values simply entails determining that no live-in values are necessary.
- Predicting thread input values allows the processor to execute speculative threads as if they were independent.
- At least one embodiment of block 50 generates instructions to perform or trigger value prediction. Any known manner of value prediction, including hardware value prediction, may be implemented. For example, instructions may be generated 50 such that the register values of the spawned thread are predicted to be the same as those of the spawning thread at spawn time.
- Another embodiment of the method 100 identifies, at block 50 , a slice of instructions from the application's code that may be used for speculative precomputation of one or more live-in values. While value prediction is a promising approach, it often requires rather complex hardware support. In contrast, no additional hardware support is necessary for speculative precomputation. Speculative precomputation can be performed at the beginning of the speculative thread execution in an otherwise idle thread context, providing the advantage of minimizing misspeculations of live-in values without requiring additional value prediction hardware support. Speculative precomputation is discussed in further detail below in connection with FIG. 5.
- FIG. 5 illustrates an embodiment of the method 100 wherein block 50 is further specified to identify 502 precomputation instructions to be used for speculative precomputation of one or more live-in values.
- a set of instructions called a slice
- the slice is computed at block 502 to include only those instructions identified from the original application code that are necessary to compute the live-in value.
- the slice therefore is a subset of instructions from the original application code.
- the slice is computed by following the dependence edges backward from the instruction including the live-in value until all instructions necessary for calculation of the live-in value have been identified.
- a copy of the identified slice instructions is generated for insertion 60 into an enhanced binary file 350 (FIG. 3).
- FIGS. 3 and 5 illustrate that the methods 100 , 500 for generating instructions for CQIP-guided multithreading generate an enhanced binary file 350 at block 60 .
- the enhanced binary file 350 includes the binary code 340 for the original single-threaded application, as well as additional instructions.
- a trigger instruction to cause the speculative thread to be spawned is inserted into the enhanced binary file 350 at the spawning point (s) selected at block 240 .
- the trigger instruction can be a conventional instruction in the existing instruction set of a processor, denoted with special marks. Alternatively, the trigger instruction can be a special instruction such as a fork or spawn instruction. Trigger instructions can be executed by any thread.
- the instructions to be performed by the speculative thread are included in the enhanced binary file 350 .
- These instructions may include instructions added to the original code binary file 340 for live-in calculation, and also some instructions already in the original code binary file 340 , beginning at the CQIP, that the speculative thread is to execute. That is, regarding the speculative-thread instructions in the enhanced binary file 350 , two groups of instructions may be distinguished for each spawning pair, if the speculative thread is to perform speculative precomputation for live-in values. In contrast, for a speculative thread that is to use utilize value prediction for its live-in values, only the latter group of instructions described immediately below appears in the enhanced binary file 350 .
- the first group of instructions are generated at block 50 (or 502 , see FIG. 5) and are incorporated 60 into the enhanced binary code file 350 in order to provide for the speculative thread's calculation of live-in values.
- the instructions to be performed by the speculative thread to pre-compute live-in values are appended at the end of the file 350 , after those instructions associated with the original code binary file 340 .
- Such instructions do not appear for speculative threads that use value prediction. Instead, specialized value prediction hardware may be used for value prediction.
- the value prediction hardware is fired by the spawn instruction. When the processor executes a spawn instruction, the hardware initializes the speculative thread registers with the predicted live-in value.
- the speculative thread is associated with the second group of instructions alluded to above.
- the second set of instructions are instructions that already exist in the original code binary file 340 .
- the subset of such instructions that are associated with the speculative thread are those instructions in the original code binary file 340 starting at the CQIP.
- the precomputation slice (which may be appended at the end of the enhanced binary file) terminates with a branch to the corresponding CQIP, which causes the speculative thread to begin executing the application code instructions at the CQIP.
- the spawnee thread begins execution of the application code instructions beginning at the CQIP.
- the enhanced binary file 350 includes, for the speculative thread, a copy of the relevant subset of instructions from the original application, rather than providing for the speculative thread to branch to the CQIP instruction of the original code.
- the inventors have found the non-copy approach discussed in the immediate preceding paragraph, which is implemented with appropriate branch instructions, efficiently allows for reduced code size.
- method 100 is performed by a compiler 808 (FIG. 8).
- the method 100 represents an automated process in which a compiler identifies a spawn point and an associated control-quasi-independent point (“CQIP”) target for a speculative thread, generates the instructions to pre-compute its live-ins, and embeds a trigger at the spawn point in the binary.
- the pre-computation instructions for the speculative thread are incorporated (such as, for example, by appending) into an enhanced binary file 350 .
- the method 100 may be performed manually such that one or more of 1) identifying CQIP spawning pairs 10 , 2) providing for calculation of live-in values 50 , and 3) modification of the main thread binary 60 may be performed interactively with human intervention.
- FIGS. 6 and 7 illustrate at least one embodiment of a method 600 for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values.
- the method 600 is performed by a processor (e.g. 804 of FIG. 8) executing the instructions in an enhanced binary code file (e.g., 350 of FIG. 3).
- an enhanced binary code file e.g., 350 of FIG. 3
- the enhanced binary code file has been generated according to the method illustrated in FIG. 5, such that instructions to perform speculative precomputation of live-in values have been identified 502 and inserted into the enhanced binary file.
- FIGS. 6 and 7 illustrate that, during execution of the enhanced binary code file, multiple threads T 0 , T 1 , . . . T x may be executing simultaneously.
- the flow of control associated with each of these multiple threads is indicated by the notations T 0 , T 1 , and T x on the edges between the blocks illustrated in FIGS. 6 and 7.
- the multiple threads may be spawned from a non-speculative thread.
- a speculative thread may spawn one or more additional non-speculative successor threads.
- FIG. 6 illustrates that processing begins at 601 , where the thread T 0 begins execution.
- a check is made to determine whether the thread T 0 previously encountered a join point while it (T 0 ) was still speculative. Block 602 is discussed in further detail below. One skilled in the art will understand that block 602 will, of course, evaluate to “false” if the thread T 0 was never previously speculative.
- block 602 evaluates to “false”, then an instruction for the thread T 0 is executed at block 604 . If a trigger instruction associated with a spawn point is encountered 606 , then processing continues to block 608 . Otherwise, the thread T 0 continues execution at block 607 . At block 607 , it is determined whether a join point has been encountered in the thread T 0 . If neither a trigger instruction nor join point is encountered, then the thread T 0 continues to execute instructions 604 until it reaches 603 the end of its instructions.
- a trigger instruction is detected at block 606 , then a speculative thread T 1 is spawned in a free thread context at block 608 . If slice instructions are encountered by the speculative thread T 1 at block 610 , the processing continues at block 612 . If not, then processing continues at 702 (FIG. 7).
- slice instructions for speculative precomputation are iteratively executed until the speculative precomputation of the live-in value is complete 614 .
- the spawner thread T 0 continues to execute 604 its instructions.
- FIG. 6 illustrates that, while the speculative thread T 1 executes 612 the slice instructions, the spawner thread continues execution 604 of its instructions until another spawn point is encountered 606 , a join point is encountered 607 , or the instruction stream ends 603 . Accordingly, the spawner thread T 0 and the spawnee thread T 1 execute in parallel during speculative precomputation.
- FIG. 7 illustrates that, at block 702 , the speculative thread T 1 executes instructions from the original code.
- the CQIP instruction is executed.
- the execution 702 of spawnee thread instructions is performed in parallel with the execution of the spawner thread code until a terminating condition is reached.
- the speculative thread T 1 checks for a terminating condition.
- the check 708 evaluates to “true” when the spawnee thread T 1 has encountered a CQIP of an active, more speculative thread or has encountered the end of the program. As long as neither condition is true, the spawnee thread T 1 proceeds to block 710 .
- the speculative thread T 1 determines 708 that a join point has been reached, then it is theoretically ready to perform processing to switch thread contexts with the more speculative thread (as discussed below in connection with block 720 ). However, at least one embodiment of the method 600 limits such processing to non-speculative threads. Accordingly, when speculative thread T 1 determines 708 that it has reached the joint point of a more speculative, active thread, T 1 waits 706 to continue processing until it (T 1 ) becomes non-speculative.
- the speculative thread T 1 determines whether a spawning point has been reached. If the 710 condition evaluates to “false”, then T 1 continues execution 702 of its instructions.
- thread T 1 creates 712 a new speculative thread T 1 .
- Thread T 1 then continues execution 702 of its instructions, while new speculative thread T 1 proceeds to continue speculative thread operation at block 610 , as described above in connection with speculative thread T 1 .
- each thread follows the logic described above in connection with T 1 (blocks 610 through 614 and blocks 702 through 710 of FIGS. 6 and 7).
- join point When the spawner thread T 0 reaches a CQIP of an active, more speculative thread, then we say that a join point has been encountered.
- the join point of a thread is the control quasi-independent point at which an on-going speculative thread began execution. It should be understood that multiple speculative threads may be active at one time.
- a “more speculative” thread is a thread that is a spawnee of the reference thread (in this case, thread T 0 ) and includes any subsequently-spawned speculative thread in the spawnee's spawning chain.
- the join point check 607 evaluates to true when the thread T 0 reaches the CQIP at which any on-going speculative thread began execution.
- the join point check 607 evaluates to true when the thread T 0 reaches the CQIP at which any on-going speculative thread began execution.
- FIG. 7 assumes that, if multiple speculative threads are simultaneously active, then any one of the multiple CQIP's for the active speculative threads could be reached at block 607 .
- FIG. 7 assumes that when T 0 hits a join point at bock 607 , the join point is associated with T 1 , the next thread in program order, which is the speculative thread whose CQIP has been reached by the non-speculative thread T 0 .
- a thread T 0 Upon reaching the join point at block 607 (FIG. 6), a thread T 0 proceeds to block 703 .
- the thread T 0 determines 703 if it is the non speculative active thread and, if not, waits until it becomes the non-speculative thread.
- T 0 When T 0 becomes non-speculative, it initiates 704 a verification of the speculation performed by the spawnee thread T 1 .
- verification 704 includes determining whether the speculative live-in values utilized by the spawnee thread T 1 reflect the actual values computed by the spawner thread.
- Thread T 0 then proceeds to C (FIG. 6) to continue execution of its instructions. Otherwise, if the verification 704 succeeds, then thread T 0 and thread T 1 proceed to block 720 .
- the thread context where the thread T 0 has been executing becomes free and is relinquished. Also, the speculative thread T 1 that started at the CQIP becomes the non-speculative thread and continues execution at C (FIG. 6).
- Reference to FIG. 6 illustrates that the newly non-speculative thread T 0 checks at block 602 to determine whether it encountered a CQIP at block 708 (FIG. 6) while it was still speculative. If so, then the thread T 0 proceeds to B in order to begin join point processing as described above.
- Embodiments of the method may be implemented in hardware, software, firmware, or a combination of such implementation approaches.
- Embodiments of the invention may be implemented as computer programs executing on programmable systems comprising at least one processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
- Program code may be applied to input data to perform the functions described herein and generate output information.
- the output information may be applied to one or more output devices, in known fashion.
- a processing system includes any system that has a processor, such as, for example; a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
- DSP digital signal processor
- ASIC application specific integrated circuit
- the programs may be implemented in a high level procedural or object oriented programming language to communicate with a processing system.
- the programs may also be implemented in assembly or machine language, if desired.
- the method described herein is not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language
- the programs may be stored on a storage media or device (e.g., hard disk drive, floppy disk drive, read only memory (ROM), CD-ROM device, flash memory device, digital versatile disk (DVD), or other storage device) readable by a general or special purpose programmable processing system.
- the instructions accessible to a processor in a processing system, provide for configuring and operating the processing system when the storage media or device is read by the processing system to perform the procedures described herein.
- Embodiments of the invention may also be considered to be implemented as a machine-readable storage medium, configured for use with a processing system, where the storage medium so configured causes the processing system to operate in a specific and predefined manner to perform the functions described herein.
- System 800 may be used, for example, to execute the processing for a method of performing control-quasi-independent-points guided speculative multithreading, such as the embodiments described herein.
- System 800 may also execute enhanced binary files generated in accordance with at least one embodiment of the methods described herein.
- System 800 is representative of processing systems based on the Pentium®, Pentium® Pro, Pentium® II, Pentium® III, Pentium® 4, and Itanium® and Itanium® II microprocessors available from Intel Corporation, although other systems (including personal computers (PCs) having other microprocessors, engineering workstations, set-top boxes and the like) may also be used.
- sample system 800 may be executing a version of the WindowsTM operating system available from Microsoft Corporation, although other operating systems and graphical user interfaces, for example, may also be used.
- processing system 800 includes a memory system 802 and a processor 804 .
- Memory system 802 may store instructions 810 and data 812 for controlling the operation of the processor 804 .
- instructions 810 may include a compiler program 808 that, when executed, causes the processor 804 to compile a program 415 (FIG. 4) that resides in the memory system 802 .
- Memory 802 holds the program to be compiled, intermediate forms of the program, and a resulting compiled program.
- the compiler program 808 includes instructions to select spawning pairs and generate instructions to implement CQIP-guided multithreading.
- instructions 810 may also include an enhanced binary file 350 (FIG. 3) generated in accordance with at least one embodiment of the present invention.
- Memory system 802 is intended as a generalized representation of memory and may include a variety of forms of memory, such as a hard drive, CD-ROM, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM) and related circuitry.
- Memory system 802 may store instructions 810 and/or data 812 represented by data signals that may be executed by processor 804 .
- the instructions 810 and/or data 812 may include code for performing any or all of the techniques discussed herein.
- At least one embodiment of CQIP-guided speculative multithreading is related to the use of the compiler 808 in system 800 to select spawning pairs and generate instructions as discussed above.
- compiler 808 may include a profile analyzer module 820 that, when executed by the processor 804 , analyzes a profile to generate a control flow graph as described above in connection with FIG. 3.
- the compiler 808 may also include a matrix builder module 824 that, when executed by the processor 804 , computes 220 reaching probabilities and generates a reaching probabilities matrix 335 as discussed above.
- the compiler 808 may also include a spawning pair selector module 826 that, when executed by the processor 804 , identifies 230 candidate basic blocks and selects 240 one or more spawning pairs.
- the compiler 808 may include a slicer module 822 that identifies 502 (FIG.
- the compiler 808 may further include a code generator module 828 that, when executed by the processor 804 , generates 60 an enhanced binary file 350 (FIG. 3).
Abstract
A method for generating instructions to facilitate control-quasi-independent-point multithreading is provided. A spawn point and control-quasi-independent-point are determined. An instruction stream is generated to partition a program so that portions of the program are parallelized by speculative threads. A method of performing control-quasi-independent-point guided speculative multithreading includes spawning a speculative thread when the spawn point is encountered. An embodiment of the method further includes performing speculative precomputation to determine a live-in value for the speculative thread.
Description
- 1. Technical Field
- The present invention relates generally to information processing systems and, more specifically, to spawning of speculative threads for speculative multithreading.
- 2. Background Art
- In order to increase performance of information processing systems, such as those that include microprocessors, both hardware and software techniques have been employed. One software approach that has been employed to improve processor performance is known as “multithreading.” In multithreading, an instruction stream is split into multiple instruction streams that can be executed in parallel. In software-only multithreading approaches, such as time-multiplex multithreading or switch-on-event multithreading, the multiple instruction streams are alternatively executed on the same shared processor.
- Increasingly, multithreading is supported in hardware. For instance, in one approach, processors in a multi-processor system, such as a chip multiprocessor (“CMP”) system, may each act on one of the multiple threads simultaneously. In another approach, referred to as simultaneous multithreading (“SMT”), a single physical processor is made to appear as multiple logical processors to operating systems and user programs. That is, each logical processor maintains a complete set of the architecture state, but nearly all other resources of the physical processor, such as caches, execution units, branch predictors control logic and buses are shared. The threads execute simultaneously and make better use of shared resources than time-multiplex multithreading or switch-on-event multithreading.
- For those systems, such as CMP and SMT multithreading systems, that provide hardware support for multiple threads, one or more threads may be idle during execution of a single-threaded application. Utilizing otherwise idle threads to speculatively parallelize the single-threaded application can increase speed of execution, but it is often-times difficult to determine which sections of the single-threaded application should be speculatively executed by the otherwise idle thread. Speculative thread execution of a portion of code is only beneficial if the application's control-flow ultimately reaches that portion of code. In addition, speculative thread execution can be delayed, and rendered less effective, due to latencies associated with data fetching. Embodiments of the method and apparatus disclosed herein address these and other concerns related to speculative multithreading.
- The present invention may be understood with reference to the following drawings in which like elements are indicated by like numbers. These drawings are not intended to be limiting but are instead provided to illustrate selected embodiments of a method and apparatus for facilitating control-quasi-independent-points guided speculative multithreading.
- FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multithreading.
- FIG. 2 is a flowchart illustrating at least one embodiment of a method for identifying control-quasi-independent-points for speculative multithreading.
- FIG. 3 is a data flow diagram showing at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multi threading.
- FIG. 4 is a flowchart illustrating at least one embodiment of a software compilation process.
- FIG. 5 is a flowchart illustrating at least one embodiment of a method for generating instructions to precompute speculative-thread's live-in values for control-quasi-independent-points guided speculative multithreading.
- FIGS. 6 and 7 are flowcharts illustrating at least one embodiment of a method for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values.
- FIG. 8 is a block diagram of a processing system capable of performing at least one embodiment of control-quasi-independent-points guided speculative multithreading.
- FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions to facilitate control-quasi-independent-points (“CQIP”) guided speculative multithreading. For at least one embodiment of the
method 100, instructions are generated to reduce the execution time in a single-threaded application through the use of one or more simultaneous speculative threads. Themethod 100 thus facilitates the parallelization of a portion of an application's code through the use of the simultaneous speculative threads. A speculative thread, referred to as the spawnee thread, executes instructions that are ahead of the code being executed by the thread that performed the spawn. The thread that performed the spawn is referred to as the spawner thread. For at least one embodiment, the spawnee thread is an SMT thread that is executed by a second logical processor on the same physical processor as the spawner thread. One skilled in the art will recognize that themethod 100 may be utilized in any multithreading approach, including SMT, CMP multithreading or other multiprocessor multithreading, or any other known multithreading approach that may encounter idle thread contexts. - Traditional software program parallelization techniques are usually applied to numerical and regular applications. However, traditional automated compiler parallelization techniques do not perform well for irregular or non-numerical applications such as those that require accesses to memory based on linked data structures. Nonetheless, various studies have demonstrated that these irregular and integer applications still have large amounts of thread level parallelism that could be exploited through judicious speculative multithreading. The
method 100 illustrated in FIG. 1 provides a mechanism to partition single-threaded application into sub-tasks that can be speculatively executed using additional threads. - In contrast to some types of traditional speculative multithreading techniques, which spawn speculative threads based on known control dependent structures such as calls or loops, the
method 100 of FIG. 1 determines spawn points based on control independency, yet makes provision for handling data flow dependency among parallel threads. The following discussion explains that themethod 100 selects thread spawning points based on an analysis of control independence, in an effort to achieve speculative parallelization with minimal misspecualtion in relation to control flow. In addition, the method addresses data flow dependency in that live-in values are supplied. For at least one embodiment, live-in values are predicted using a value prediction approach. In at least one other embodiment, live-in values are pre-computed using speculative precomputation based on backward dependency analysis. - FIG. 1 illustrates that a
method 100 for generating instructions to facilitate CQIP-guided multithreading includesidentification 10 of spawning pairs that each include a spawn point and a CQIP. Atblock 50, themethod 100 provides for calculation of live-in values for data dependences in the helper thread to be spawned. Atblock 60, instructions are generated such that, when the instructions are executed by a processor, a speculative thread is spawned and speculatively executes a selected portion of the application's code. - FIG. 2 is a flowchart further illustrating at least one embodiment of
identification 10 of control-quasi-independent-points for speculative multithreading. FIG. 2 illustrates that themethod 10 performs 210 profile analysis. During theanalysis 210, a control flow graph (see, e.g., 330 of FIG. 3) is generated to represent flow of control among the basic blocks associated with the application. Themethod 10 then computes 220 reaching probabilities. That is, themethod 10computes 220 the probability that a second basic block will be reached during execution of the source program, if a first basic block is executed. Candidate basic blocks are identified 230 as potential spawn pairs based on the reaching probabilities previously computed 220. Atblock 240, the candidates are evaluated according to selected metrics in order to select one or more spawning pairs. Each of blocks 210 (performing profile analysis), 220 (computing reaching probabilities), 230 (identifying candidate basic blocks), and 240 (selecting spawning pair) are described in further detail below in connection with FIG. 3. - FIG. 3 is a data flow diagram. The flow of data is represented in relation to an expanded flowchart that incorporates the actions illustrated in both FIGS. 1 and 2. FIG. 3 illustrates that, for at least one embodiment of the
method 100 illustrated in FIG. 1, certain data is consulted, and certain other data is generated, during execution of themethod 100. FIG. 3 illustrates that aprofile 325 is accessed to aid inprofile analysis 210. Also, a control flow graph 330 (“CFG”) is accessed to aid incomputation 220 of reaching probabilities. - Brief reference to FIG. 4 illustrates that the
profile 325 is typically generated by one or more compilation passes prior to execution of the method. In FIG. 4, atypical compilation process 400 is represented. Theprocess 400 involves two compiler-performedpasses test run 407 that is typically initiated by a user, such as a software programmer. During afirst pass 405, the compiler (e.g., 808 in FIG. 8) receives as an input thesource code 415 for which compilation is desired. The compiler then generates instrumentedbinary code 420 that corresponds to thesource code 415. The instrumentedbinary code 420 includes, in addition to the binary for thesource code 415 instructions, extra binary code that causes, during a run of the instrumentedcode 420, statistics to be collected and recorded in aprofile 325 and acall graph 424. When a user initiates atest run 407 of the instrumentedbinary code 420, theprofile 325 andcall graph 424 are generated. During thenormal compilation pass 410, theprofile 325 is used as an input into the compiler and abinary code file 340 is generated. Theprofile 325 may be used, for example, by the compiler during thenormal compilation pass 410 to aid with performance enhancements such as speculative branch prediction. - Each of the
passes test run 407, are optional to themethod 100 in that any method of generating the information represented byprofile 325 may be utilized. Accordingly,first pass 405 andnormal pass 410, as well astest run 407, are depicted with broken lines in FIG. 4 to indicate their optional nature. One skilled in the art will recognize that any method of generating the information represented byprofile 325 may be utilized, and that theactions method 100 described herein may be applied, in an alternative embodiment, to a binary file. That is, theprofile 325 may be generated for a binary file rather than a high-level source code file, and the profile analysis 210 (FIG. 2) may be performed using such binary-based profile as an input. - Returning to FIG. 3, one can see that the
profile analysis 210 utilizes theprofile 325 as an input and generates acontrol flow graph 330 as an output. Themethod 100 builds theCFG 330 during theprofile analysis 210 such that each node of theCFG 330 represents a basic block of the source program. Edges between nodes of theCFG 330 represent possible control flows among the basic blocks. For at least one embodiment, edges of theCFG 330 are weighted with the frequency that the corresponding control flow has been followed (as reflected in the profile 325). Accordingly, the edges are weighted by the probability that one basic block follows the other, without revisiting the latter node. In contrast to other CFG representations, such as “edge profiling” which represents only intra-procedural edges, at least one embodiment of theCFG 330 created duringprofile analysis 210 includes representation of inter-procedural edges. - For at least one embodiment, the
CFG 330 is pruned to simplify theCFG 330 and control its size. The least frequently executed basic blocks are pruned from theCFG 330. To determine which nodes should remain in theCFG 330, and which should be pruned, the weights of the edges to a block are used to determine the basic block's execution count. The basic blocks are ordered by execution count, and are selected to remain in theCFG 330 according to their execution count. For at least one embodiment, the basic blocks are chosen from highest to lower execution count until a predetermined threshold percentage of the total executed instructions are included in theCFG 330. Accordingly, after weighting and pruning, the most frequently-executed basic blocks are represented in theCFG 330. - For at least one embodiment, the predetermined threshold percentage of executed instructions chosen to remain in the
CFG 330 during profile analysis 20 is ninety (90) percent. For selected embodiments, the threshold may be varied to numbers higher or lower than ninety percent, based on factors such as application requirements and/or machine resource availability. For instance, if a relatively large number of hardware thread contexts are supported by the machine resources, then a lower threshold may be chosen in order to facilitate more aggressive speculation. - In order to retain control flow information about pruned basic blocks, the following processing may also occur during
profile analysis 210. When a node is pruned from theCFG 330, an edge from a predecessor to the pruned node is transformed to one or more edges from that predecessor to the node's successor(s). Also, an edge from the pruned node to a successor is transformed to one or more edges from the pruned node's predecessor(s) to the successor. If, during this transformation, an edge is transformed into multiple edges, the weight of the original edge is proportionally apportioned across the new edges. - FIG. 3 illustrates that the
CFG 330 produced duringprofile analysis 210 is utilized to compute 220 reaching probabilities. At least one embodiment of reachingprobability computation 220 utilizes theprofile CFG 330 as an input and generates a reachingprobability matrix 335 as an output. As stated above, as used herein the “reaching probability” is the probability that a second basic block will be reached after execution of a first basic block, without revisiting the first basic block. For at least one embodiment, the reaching probabilities computed atblock 220 are stored in a two-dimensionalsquare matrix 335 that has as many rows and columns as nodes in theCFG 330. Each element of the matrix represents the probability to execute the basic block represented by the column after execution of the basic block represented by the row. - For at least one embodiment, this probability is computed as the sum of the frequencies for all the various sequences of basic blocks that exist from the source node to the destination node. In order to simplify the computation, a constraint is imposed such that the source and destination nodes may only appear once in the sequence of nodes as the first and last nodes, respectively, and may not appear again as intermediate nodes. (For determining the probability of reaching a basic block again after it has been executed, the basic block will appear twice—as both the source and destination nodes). Other basic blocks are permitted to appear more than once in the sequence.
- At
block 230, the reachingprobability matrix 335 is traversed to evaluate pairs of basic blocks and identify those that are candidates for a spawning pair. As used herein, the term “spawning pair” refers to a pair of instructions associated with the source program. One of the instructions is a spawn point, which is an instruction within a first basic block. For at least one embodiment, the spawn point is the first instruction of the first basic block. - The other instruction is a target point and is, more specifically, a control quasi-independent point (“CQIP”). The CQIP is an instruction within a second basic block. For at least one embodiment, the CQIP is the first instruction of the second basic block. A spawn point is the instruction in the source program that, when reached, will activate creation of a speculative thread at the CQIP, where the speculative thread will start its execution.
- For each element in the reaching
probability matrix 335, two basic blocks are represented. The first block includes a potential spawn point, and the second block includes a potential CQIP. An instruction (such as the first instruction) of the basic block for the row is the potential spawn point. An instruction (such as the first instruction) of the basic block for the column is the potential CQIP. Each element of the reachingprobability matrix 335 is evaluated, and those elements that satisfy certain selection criteria are chosen as candidates for spawning pairs. For at least one embodiment, the elements are evaluated to determine those pairs whose probability is higher than a certain predetermined threshold; that is, the probability to reach the control quasi-independent point after execution of the spawn point is higher than a given threshold. This criterion is designed to minimize spawning of speculative threads that are not executed. For at least one embodiment, a pair of basic blocks associated with an element of the reachingprobability matrix 335 is considered as a candidate for a spawning pair if its reaching probability is higher than 0.95 - A second criterion for selection of a candidate spawning pair is the average number of instructions between the spawn point and the CQIP. Ideally, a minimum average number of instructions should exist between the spawning point and the CQIP in order to reduce the relative overhead of thread creation. If the distance is too small, the overhead of thread creation may outweigh the benefit of run-ahead execution because the speculative thread will not run far enough ahead. For at least one embodiment, a pair of basic blocks associated with an element of the reaching
probability matrix 335 is considered as a candidate for a spawning pair if the average number of instructions between then is greater than 32 instructions. - Distance between the basic blocks may be additionally stored in the
matrix 335 and considered in theidentification 230 of spawning pair candidates. For at least one embodiment, this additional information may be calculated duringprofile analysis 210 and included in each element of the reachingprobability matrix 335. The average may be calculated as the sum of the number of instructions executed by each sequence of basic blocks, multiplied by their frequency. - At
block 240, the spawning pair candidates are evaluated based on analysis of one or more selected metrics. These metrics may be prioritized. Based on the evaluation of the candidate spawning pairs in relation to the prioritized metrics, one or more spawning pairs are selected. - The metrics utilized at
block 240 may include the minimum average distance between the basic blocks of the potential spawning pair (described above), as well as an evaluation of mispredicted branches, load misses and/or instruction cache misses. The metrics may also include additional considerations. One such additional consideration is the maximum average distance between the basic blocks of the potential spawning pair. It should be noted that there are also potential performance penalties involved with having the average number of instructions between the spawn point and CQIP be too large. Accordingly, the selection of spawning pairs may also impose a maximum average distance. If the distance between the pair is too large, the speculative thread may incur stalls in a scheme where the speculative thread has limited storage for speculative values. In addition, if the sizes of speculative threads are sufficiently dissimilar, speculative threads may incur stalls in a scheme where the speculative thread cannot commit its states until it becomes the non-speculative thread (see discussion of “join point” in connection with FIGS. 6 and 7, below). Such stalls are likely to result in ineffective holding of critical resources that otherwise would be used by non-speculative threads to make forward progress. - Another additional consideration is the number of dependent instructions that the speculative thread includes in relation to the application code between the spawning point and the CQIP. Preferably, the average number of speculative thread instructions dependent on values generated by a previous thread (also referred to as “live-ins”) should be relatively low. A smaller number of dependent instructions allow for more timely computation of the live-in values for the speculative thread.
- In addition, for selected embodiments it is preferable that a relatively high number of the live-in values for the speculative thread are value-predictable. For those embodiments that use value prediction to provide for
calculation 50 of live-in values (discussed further below), value-predictability of the live-in values facilitates faster communication of live-in values, thus minimizing overhead of spawning while also allowing correctness and accuracy of speculative thread computation. - It is possible that the candidate spawning pairs identified at
block 230 may include several good candidates for CQIP's associated with a given spawn point. That is, for a given row of the reachingprobability matrix 335, more than one element may be selected as a candidate spawning pair. In such case, during the metrics evaluation atblock 240, the best CQIP for the spawn point is selected because, for a given spawn point, a speculative thread will be spawned at only one CQIP. In order to choose the best CQIP for a given spawn point, the potential CQIP's identified atblock 230 are prioritized according to the expected benefit. - In at least one alternative embodiment, if there are sufficient hardware thread resources, more than one CQIP can be chosen for a corresponding spawn point. In such case, multiple concurrent, albeit mutually exclusive, speculative threads may be spawned and executed simultaneously to perform “eager” execution of speculative threads. The spawning condition for these multiple CQIPs can be examined and verified, after the speculative threads have been executed, to determine the effectiveness of the speculation. If one of these multiple speculative threads proves to be good speculation, and another bad, then the results of the former can be reused by the main thread while the results of the latter may be discarded.
- In addition to those spawning pairs selected according to the metrics evaluation, at least one embodiment of the
method 100 selects 240 CALL return point pairs (pairs of subroutine calls and the return points) if they satisfy the minimum size constraint. These pairs might not otherwise be selected atblock 240 because the reaching probability for such pairs is sometimes too low to satisfy the selection criteria discussed above in connection withcandidate identification 230. In particular, if a subroutine is called from multiple locations, it will have multiple predecessors and multiple successors in theCFG 330. If all the calls are executed a similar number of times, the reaching probability of any return point pair will be low since thegraph 330 will have multiple paths with similar weights. - At
block 50, themethod 100 provides for calculation of live-in values for the speculative thread to be executed at the CQIP. By “provides for” it is meant that instructions are generated, wherein execution of the generated instructions, possibly in conjunction with some special hardware support, will result in calculation of a predicted live-value to be used as an input by the spawnee thread. Of course, block 50 might determine that no live-in values are necessary. In such case, “providing for” calculation of live-in values simply entails determining that no live-in values are necessary. - Predicting thread input values allows the processor to execute speculative threads as if they were independent. At least one embodiment of
block 50 generates instructions to perform or trigger value prediction. Any known manner of value prediction, including hardware value prediction, may be implemented. For example, instructions may be generated 50 such that the register values of the spawned thread are predicted to be the same as those of the spawning thread at spawn time. - Another embodiment of the
method 100 identifies, atblock 50, a slice of instructions from the application's code that may be used for speculative precomputation of one or more live-in values. While value prediction is a promising approach, it often requires rather complex hardware support. In contrast, no additional hardware support is necessary for speculative precomputation. Speculative precomputation can be performed at the beginning of the speculative thread execution in an otherwise idle thread context, providing the advantage of minimizing misspeculations of live-in values without requiring additional value prediction hardware support. Speculative precomputation is discussed in further detail below in connection with FIG. 5. - FIG. 5 illustrates an embodiment of the
method 100 whereinblock 50 is further specified to identify 502 precomputation instructions to be used for speculative precomputation of one or more live-in values. For at least one embodiment, a set of instructions, called a slice, is computed atblock 502 to include only those instructions identified from the original application code that are necessary to compute the live-in value. The slice therefore is a subset of instructions from the original application code. The slice is computed by following the dependence edges backward from the instruction including the live-in value until all instructions necessary for calculation of the live-in value have been identified. A copy of the identified slice instructions is generated forinsertion 60 into an enhanced binary file 350 (FIG. 3). - FIGS. 3 and 5 illustrate that the
methods binary file 350 atblock 60. The enhancedbinary file 350 includes thebinary code 340 for the original single-threaded application, as well as additional instructions. A trigger instruction to cause the speculative thread to be spawned is inserted into the enhancedbinary file 350 at the spawning point (s) selected atblock 240. The trigger instruction can be a conventional instruction in the existing instruction set of a processor, denoted with special marks. Alternatively, the trigger instruction can be a special instruction such as a fork or spawn instruction. Trigger instructions can be executed by any thread. - In addition, the instructions to be performed by the speculative thread are included in the enhanced
binary file 350. These instructions may include instructions added to the originalcode binary file 340 for live-in calculation, and also some instructions already in the originalcode binary file 340, beginning at the CQIP, that the speculative thread is to execute. That is, regarding the speculative-thread instructions in the enhancedbinary file 350, two groups of instructions may be distinguished for each spawning pair, if the speculative thread is to perform speculative precomputation for live-in values. In contrast, for a speculative thread that is to use utilize value prediction for its live-in values, only the latter group of instructions described immediately below appears in the enhancedbinary file 350. - The first group of instructions are generated at block50 (or 502, see FIG. 5) and are incorporated 60 into the enhanced
binary code file 350 in order to provide for the speculative thread's calculation of live-in values. For at least one embodiment, the instructions to be performed by the speculative thread to pre-compute live-in values are appended at the end of thefile 350, after those instructions associated with the originalcode binary file 340. - Such instructions do not appear for speculative threads that use value prediction. Instead, specialized value prediction hardware may be used for value prediction. The value prediction hardware is fired by the spawn instruction. When the processor executes a spawn instruction, the hardware initializes the speculative thread registers with the predicted live-in value.
- Regardless of whether the speculative thread utilizes value prediction (no additional instructions in the enhanced binary file350) or speculative precomputation (slice instructions in the enhanced binary file 350), the speculative thread is associated with the second group of instructions alluded to above. The second set of instructions are instructions that already exist in the original
code binary file 340. The subset of such instructions that are associated with the speculative thread are those instructions in the originalcode binary file 340 starting at the CQIP. For speculative threads that utilize speculative pre-computation for live-ins, the precomputation slice (which may be appended at the end of the enhanced binary file) terminates with a branch to the corresponding CQIP, which causes the speculative thread to begin executing the application code instructions at the CQIP. For speculative threads that utilize value prediction for live-in values, the spawnee thread begins execution of the application code instructions beginning at the CQIP. - In an alternative embodiment, the enhanced
binary file 350 includes, for the speculative thread, a copy of the relevant subset of instructions from the original application, rather than providing for the speculative thread to branch to the CQIP instruction of the original code. However, the inventors have found the non-copy approach discussed in the immediate preceding paragraph, which is implemented with appropriate branch instructions, efficiently allows for reduced code size. - Accordingly, the foregoing discussion illustrates that, for at least one embodiment,
method 100 is performed by a compiler 808 (FIG. 8). In such embodiment, themethod 100 represents an automated process in which a compiler identifies a spawn point and an associated control-quasi-independent point (“CQIP”) target for a speculative thread, generates the instructions to pre-compute its live-ins, and embeds a trigger at the spawn point in the binary. The pre-computation instructions for the speculative thread are incorporated (such as, for example, by appending) into an enhancedbinary file 350. One skilled in the art will recognize that, in alternative embodiments, themethod 100 may be performed manually such that one or more of 1) identifying CQIP spawning pairs 10, 2) providing for calculation of live-invalues 50, and 3) modification of themain thread binary 60 may be performed interactively with human intervention. - In sum, a method for identifying spawning pairs and adapting a binary file to perform control-quasi-independent points guided speculative multithreading has been described. An embodiment of the method is performed by a compiler, which identifies proper spawn points and CQIP, provides for calculation of live-in values in speculative threads, and generates an enhanced binary file.
- FIGS. 6 and 7 illustrate at least one embodiment of a
method 600 for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values. For at least one embodiment, themethod 600 is performed by a processor (e.g. 804 of FIG. 8) executing the instructions in an enhanced binary code file (e.g., 350 of FIG. 3). For themethod 600 illustrated in FIGS. 6 and 7, it is assumed, that the enhanced binary code file has been generated according to the method illustrated in FIG. 5, such that instructions to perform speculative precomputation of live-in values have been identified 502 and inserted into the enhanced binary file. - FIGS. 6 and 7 illustrate that, during execution of the enhanced binary code file, multiple threads T0, T1, . . . Tx may be executing simultaneously. The flow of control associated with each of these multiple threads is indicated by the notations T0, T1, and Tx on the edges between the blocks illustrated in FIGS. 6 and 7. One skilled in the art will recognize that the multiple threads may be spawned from a non-speculative thread. Also, in at least one embodiment, a speculative thread may spawn one or more additional non-speculative successor threads.
- FIG. 6 illustrates that processing begins at601, where the thread T0 begins execution. At
block 602, a check is made to determine whether the thread T0 previously encountered a join point while it (T0) was still speculative.Block 602 is discussed in further detail below. One skilled in the art will understand that block 602 will, of course, evaluate to “false” if the thread T0 was never previously speculative. - If
block 602 evaluates to “false”, then an instruction for the thread T0 is executed atblock 604. If a trigger instruction associated with a spawn point is encountered 606, then processing continues to block 608. Otherwise, the thread T0 continues execution atblock 607. Atblock 607, it is determined whether a join point has been encountered in the thread T0. If neither a trigger instruction nor join point is encountered, then the thread T0 continues to executeinstructions 604 until it reaches 603 the end of its instructions. - If a trigger instruction is detected at
block 606, then a speculative thread T1 is spawned in a free thread context atblock 608. If slice instructions are encountered by the speculative thread T1 atblock 610, the processing continues atblock 612. If not, then processing continues at 702 (FIG. 7). - At
block 612, slice instructions for speculative precomputation are iteratively executed until the speculative precomputation of the live-in value is complete 614. In the meantime, after spawning the speculative thread T1 atblock 608, the spawner thread T0 continues to execute 604 its instructions. FIG. 6 illustrates that, while the speculative thread T1 executes 612 the slice instructions, the spawner thread continuesexecution 604 of its instructions until another spawn point is encountered 606, a join point is encountered 607, or the instruction stream ends 603. Accordingly, the spawner thread T0 and the spawnee thread T1 execute in parallel during speculative precomputation. - When live-in computation is determined complete614, or if no slice instructions for speculative precomputation are available to the
speculative thread T 1 610, then processing continues at A in FIG. 7. - FIG. 7 illustrates that, at
block 702, the speculative thread T1 executes instructions from the original code. At the first iteration ofblock 702, the CQIP instruction is executed. Theexecution 702 of spawnee thread instructions is performed in parallel with the execution of the spawner thread code until a terminating condition is reached. - At
block 708, the speculative thread T1 checks for a terminating condition. Thecheck 708 evaluates to “true” when the spawnee thread T1 has encountered a CQIP of an active, more speculative thread or has encountered the end of the program. As long as neither condition is true, the spawnee thread T1 proceeds to block 710. - If the speculative thread T1 determines 708 that a join point has been reached, then it is theoretically ready to perform processing to switch thread contexts with the more speculative thread (as discussed below in connection with block 720). However, at least one embodiment of the
method 600 limits such processing to non-speculative threads. Accordingly, when speculative thread T1 determines 708 that it has reached the joint point of a more speculative, active thread, T1 waits 706 to continue processing until it (T1) becomes non-speculative. - At
block 710, the speculative thread T1 determines whether a spawning point has been reached. If the 710 condition evaluates to “false”, then T1 continuesexecution 702 of its instructions. - If a spawn point is encountered at
block 710, then thread T1 creates 712 a new speculative thread T1. Thread T1 then continuesexecution 702 of its instructions, while new speculative thread T1 proceeds to continue speculative thread operation atblock 610, as described above in connection with speculative thread T1. One skilled in the art will recognize that, while multiple speculative threads are active, each thread follows the logic described above in connection with T1 (blocks 610 through 614 and blocks 702 through 710 of FIGS. 6 and 7). - When the spawner thread T0 reaches a CQIP of an active, more speculative thread, then we say that a join point has been encountered. The join point of a thread is the control quasi-independent point at which an on-going speculative thread began execution. It should be understood that multiple speculative threads may be active at one time. Hence the terminology “more speculative.” A “more speculative” thread is a thread that is a spawnee of the reference thread (in this case, thread T0) and includes any subsequently-spawned speculative thread in the spawnee's spawning chain.
- Thus, the join point check607 (FIG. 6) evaluates to true when the thread T0 reaches the CQIP at which any on-going speculative thread began execution. One skilled in the art will recognize that, if multiple speculative threads are simultaneously active, then any one of the multiple CQIP's for the active speculative threads could be reached at
block 607. For simplicity of illustration, FIG. 7 assumes that when T0 hits a join point atbock 607, the join point is associated with T1, the next thread in program order, which is the speculative thread whose CQIP has been reached by the non-speculative thread T0. - Upon reaching the join point at block607 (FIG. 6), a thread T0 proceeds to block 703. The thread T0 determines 703 if it is the non speculative active thread and, if not, waits until it becomes the non-speculative thread.
- When T0 becomes non-speculative, it initiates 704 a verification of the speculation performed by the spawnee thread T1. For at least one embodiment,
verification 704 includes determining whether the speculative live-in values utilized by the spawnee thread T1 reflect the actual values computed by the spawner thread. - If the
verification 704 fails, then T1 and any other thread more speculative than T1 are squashed 730. Thread T0 then proceeds to C (FIG. 6) to continue execution of its instructions. Otherwise, if theverification 704 succeeds, then thread T0 and thread T1 proceed to block 720. Atblock 720, the thread context where the thread T0 has been executing becomes free and is relinquished. Also, the speculative thread T1 that started at the CQIP becomes the non-speculative thread and continues execution at C (FIG. 6). - Reference to FIG. 6 illustrates that the newly non-speculative thread T0 checks at
block 602 to determine whether it encountered a CQIP at block 708 (FIG. 6) while it was still speculative. If so, then the thread T0 proceeds to B in order to begin join point processing as described above. - The combination of both CQIP-based spawning point selection and speculative computation of live-in values illustrated in FIGS. 5, 6 and7 provide a multithreading method that helps improve the efficacy and accuracy of speculative multithreading. Such improvements are achieved because data dependencies among speculative threads are minimized since the values of live-ins are computed before execution of the speculative thread.
- In the preceding description, various aspects of a method and apparatus for facilitating control-quasi-independent-points guided speculative multithreading have been described. For purposes of explanation, specific numbers, examples, systems and configurations were set forth in order to provide a more thorough understanding. However, it is apparent to one skilled in the art that the described method may be practiced without the specific details. In other instances, well-known features were omitted or simplified in order not to obscure the method.
- Embodiments of the method may be implemented in hardware, software, firmware, or a combination of such implementation approaches. Embodiments of the invention may be implemented as computer programs executing on programmable systems comprising at least one processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code may be applied to input data to perform the functions described herein and generate output information. The output information may be applied to one or more output devices, in known fashion. For purposes of this application, a processing system includes any system that has a processor, such as, for example; a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
- The programs may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The programs may also be implemented in assembly or machine language, if desired. In fact, the method described herein is not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language
- The programs may be stored on a storage media or device (e.g., hard disk drive, floppy disk drive, read only memory (ROM), CD-ROM device, flash memory device, digital versatile disk (DVD), or other storage device) readable by a general or special purpose programmable processing system. The instructions, accessible to a processor in a processing system, provide for configuring and operating the processing system when the storage media or device is read by the processing system to perform the procedures described herein. Embodiments of the invention may also be considered to be implemented as a machine-readable storage medium, configured for use with a processing system, where the storage medium so configured causes the processing system to operate in a specific and predefined manner to perform the functions described herein.
- An example of one such type of processing system is shown in FIG. 8.
System 800 may be used, for example, to execute the processing for a method of performing control-quasi-independent-points guided speculative multithreading, such as the embodiments described herein.System 800 may also execute enhanced binary files generated in accordance with at least one embodiment of the methods described herein.System 800 is representative of processing systems based on the Pentium®, Pentium® Pro, Pentium® II, Pentium® III, Pentium® 4, and Itanium® and Itanium® II microprocessors available from Intel Corporation, although other systems (including personal computers (PCs) having other microprocessors, engineering workstations, set-top boxes and the like) may also be used. In one embodiment,sample system 800 may be executing a version of the Windows™ operating system available from Microsoft Corporation, although other operating systems and graphical user interfaces, for example, may also be used. - Referring to FIG. 8,
processing system 800 includes amemory system 802 and aprocessor 804.Memory system 802 may storeinstructions 810 anddata 812 for controlling the operation of theprocessor 804. For example,instructions 810 may include acompiler program 808 that, when executed, causes theprocessor 804 to compile a program 415 (FIG. 4) that resides in thememory system 802.Memory 802 holds the program to be compiled, intermediate forms of the program, and a resulting compiled program. For at least one embodiment, thecompiler program 808 includes instructions to select spawning pairs and generate instructions to implement CQIP-guided multithreading. For such embodiment,instructions 810 may also include an enhanced binary file 350 (FIG. 3) generated in accordance with at least one embodiment of the present invention. -
Memory system 802 is intended as a generalized representation of memory and may include a variety of forms of memory, such as a hard drive, CD-ROM, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM) and related circuitry.Memory system 802 may storeinstructions 810 and/ordata 812 represented by data signals that may be executed byprocessor 804. Theinstructions 810 and/ordata 812 may include code for performing any or all of the techniques discussed herein. At least one embodiment of CQIP-guided speculative multithreading is related to the use of thecompiler 808 insystem 800 to select spawning pairs and generate instructions as discussed above. - Specifically, FIG. 8 illustrates that
compiler 808 may include aprofile analyzer module 820 that, when executed by theprocessor 804, analyzes a profile to generate a control flow graph as described above in connection with FIG. 3. Thecompiler 808 may also include amatrix builder module 824 that, when executed by theprocessor 804, computes 220 reaching probabilities and generates a reachingprobabilities matrix 335 as discussed above. Thecompiler 808 may also include a spawningpair selector module 826 that, when executed by theprocessor 804, identifies 230 candidate basic blocks and selects 240 one or more spawning pairs. Also, thecompiler 808 may include aslicer module 822 that identifies 502 (FIG. 5) instructions for a slice to be executed by a speculative thread in order to perform speculative precomputation of live-in values. Thecompiler 808 may further include acode generator module 828 that, when executed by theprocessor 804, generates 60 an enhanced binary file 350 (FIG. 3). - While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that changes and modifications can be made without departing from the present invention in its broader aspects. The appended claims are to encompass within their scope all such changes and modifications that fall within the true scope of the present invention.
Claims (48)
1. A method of compiling a software program, comprising:
selecting a spawning pair that includes a spawn point and a control-quasi-independent point (CQIP);
providing for calculation of a live-in value for a speculative thread; and
generating an enhanced binary file that includes instructions, the instructions including a trigger instruction to cause spawning of the speculative thread at the CQIP.
2. The method of claim 1 , further comprising:
performing profile analysis.
3. The method of claim 1 , further comprising:
computing a plurality of reaching probabilities.
4. The method of claim 1 , further comprising:
identifying a plurality of candidate basic blocks.
5. The method of claim 4 , wherein:
selecting a spawning pair further comprises selecting the spawning pair from the plurality of candidate basic blocks.
6. The method of claim 1 , wherein:
generating the enhanced binary file further comprises embedding a trigger at a spawn point associated with the spawning pair.
7. The method of claim 1 , wherein selecting the spawning pair further comprises:
selecting a spawning pair having at least a minimum average number of instructions between the spawn point and the CQIP of the spawning pair.
8. The method of claim 3 , wherein selecting the spawning pair further comprises:
selecting a spawning pair having at least a minimum reaching probability.
9. The method of claim 1 , wherein providing for calculation of the live-in value further comprises:
providing an instruction to invoke hardware prediction of the live-in value.
10. The method of claim 1 , wherein providing for calculation of the live-in value further comprises:
generating one or more instructions to perform speculative precomputation of the live-in values.
11. The method of claim 1 , wherein:
selecting a spawning pair further comprises selecting a first spawning pair and a second spawning pair; and
generating an enhanced binary file that includes instructions further comprises generating an enhanced binary file that includes a trigger instruction for each spawning pair.
12. An article comprising:
a machine-readable storage medium having a plurality of machine accessible instructions;
wherein, when the instructions are executed by a processor, the instructions provide for
selecting a spawning pair that includes a spawn point and a control-quasi-independent point (CQIP);
providing for calculation of a live-in value for a speculative thread; and
generating an enhanced binary file that includes instructions, the instructions including a trigger instruction to cause spawning of a speculative thread at the control-quasi-independent.
13. The article of claim 12 , wherein the instructions further comprise:
instructions that provide for performing profile analysis.
14. The article of claim 12 , wherein the instructions further comprise:
instructions that provide for computing a plurality of reaching probabilities.
15. The article of claim 12 , wherein the instruction further comprise:
instructions that provide for identifying a plurality of candidate basic blocks.
16. The article of claim 15 , wherein:
the instructions that provide for selecting a spawning pair further comprise instructions that provide for selecting the spawning pair from the plurality of candidate basic blocks.
17. The article of claim 12 , wherein:
the instructions that provide for generating the enhanced binary file further comprise instructions that provide for embedding a trigger at a spawn point associated with the spawning pair.
18. The article of claim 12 , wherein the instructions that provide for selecting the spawning pair further comprise:
instructions that provide for selecting a spawning pair having at least a minimum average number of instructions between the spawn point and the CQIP of the spawning pair.
19. The article of claim 14 , wherein the instructions that provide for selecting the spawning pair further comprise:
instructions that provide for selecting a spawning pair having at least a minimum reaching probability.
20. The article of claim 12 , wherein the instructions that provide for providing for calculation of the live-in value further comprise:
instructions that provide for providing an instruction to invoke hardware prediction of the live-in value.
21. The article of claim 12 , wherein instructions that provide for providing for calculation of the live-in value further comprise:
instructions that provide for generating one or more instructions to perform speculative precomputation of the live-in values.
22. A method, comprising:
executing one or more instructions in a first instruction stream in a non-speculative thread;
spawning a speculative thread at a spawn point in the first instruction stream, wherein the computed probability of reaching a control quasi-independent point during execution of the first instruction stream, after execution of the spawn point, is higher than a predetermined threshold; and
simultaneously:
executing in the speculative thread a speculative thread instruction stream that includes a subset of the instructions in the first instruction stream, the speculative thread instruction stream including the control-quasi-independent point; and
executing one or more instructions in the first instruction stream following the spawn point.
23. The method of claim 22 , wherein:
executing one or more instructions in the first instruction stream following the spawn point further comprises executing instructions until the CQIP is reached.
24. The method of claim 23 , further comprising:
determining, responsive to reaching the CQIP, whether speculative execution performed in the speculative thread is correct.
25. The method of claim 24 , further comprising:
responsive to determining the speculative execution performed in the speculative thread is correct, relinquishing the non-speculative thread.
26. The method of claim 24 , further comprising:
responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing the speculative thread.
27. The method of claim 26 , further comprising:
responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing all active successor threads, if any, of the speculative thread.
28. The method of claim 22 , wherein:
the speculative thread instruction stream includes a precomputation slice for the speculative computation of a live-in value.
29. The method of claim 22 , wherein:
spawning the speculative thread triggers hardware prediction of a live-in value.
30. The method of claim 28 , wherein:
the speculative thread instruction stream includes, after the precomputation slice, a branch instruction to the CQIP.
31. The method of claim 22 , further comprising:
spawning a second speculative thread at a spawn point in the speculative thread instruction stream.
32. An article comprising:
a machine-readable storage medium having a plurality of machine accessible instructions;
wherein, when the instructions are executed by a processor, the instructions provide for
executing one or more instructions in a first instruction stream in a non-speculative thread;
spawning a speculative thread at a spawn point in the first instruction stream, wherein the computed probability of reaching a control quasi-independent point during execution of the first instruction stream, after execution of the spawn point, is higher than a predetermined threshold; and
simultaneously:
executing in the speculative thread a speculative thread instruction stream that includes a subset of the instructions in the first instruction stream, the speculative thread instruction stream including the control-quasi-independent point; and
executing one or more instructions in the first instruction stream following the spawn point.
33. The article of claim 32 , wherein:
the instructions that provide for executing one or more instructions in the first instruction stream following the spawn point further comprise instructions that provide for executing instructions until the CQIP is reached.
34. The article of claim 33 , wherein the instructions further comprise:
instructions that provide for determining, responsive to reaching the CQIP, whether speculative execution performed in the speculative thread is correct.
35. The article of claim 34 , wherein the instructions further comprise:
instructions that provide for, responsive to determining that the speculative execution performed in the speculative thread is correct, relinquishing the non-speculative thread.
36. The article of claim 34 , further comprising:
instructions that provide for, responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing the speculative thread.
37. The article of claim 36 , wherein the instructions further comprise:
instructions that provide for, responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing all active successor threads, if any, of the speculative thread.
38. The article of claim 32 , wherein:
the speculative thread instruction stream includes a precomputation slice for the speculative computation of a live-in value.
39. The article of claim 32 , wherein:
the instruction that provides for spawning the speculative thread triggers hardware prediction of a live-in value.
40. The article of claim 38 , wherein:
the speculative thread instruction stream includes, after the precomputation slice, a branch instruction to the CQIP.
41. A compiler comprising:
a spawning pair selector module to select a spawning pair that includes a control-quasi-independent point (“CQIP”) and a spawn point; and
a code generator to generate an enhanced binary file that includes a trigger instruction at the spawn point.
42. The compiler of claim 41 , wherein:
the trigger instruction is to spawn a speculative thread to begin execution at the CQIP.
43. The compiler of claim 41 , further comprising:
a slicer to generate a slice for precomputation of a live-in value;
wherein the code generator is further to include the precomputation slice in the enhanced binary file.
44. The compiler of claim 41 , wherein:
the spawning pair selector module is further to select the spawning pair such that a computed probability of reaching the control-quasi-independent point after execution of the spawn point is higher than a predetermined threshold.
45. The compiler of claim 44 , further comprising:
a matrix builder to compute the reaching probability for the spawning pair.
46. The compiler of claim 41 , further comprising:
a profile analyzer to build a control flow graph.
47. The compiler of claim 41 , wherein:
the trigger instruction is to trigger hardware value prediction for a live-in value.
48. The compiler of claim 41 , further comprising:
a matrix builder to compute the reaching probability for the spawning pair.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/356,435 US20040154010A1 (en) | 2003-01-31 | 2003-01-31 | Control-quasi-independent-points guided speculative multithreading |
US10/423,633 US7814469B2 (en) | 2003-01-31 | 2003-04-24 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
US10/422,528 US7523465B2 (en) | 2003-01-31 | 2003-04-24 | Methods and apparatus for generating speculative helper thread spawn-target points |
US10/633,012 US7657880B2 (en) | 2003-01-31 | 2003-08-01 | Safe store for speculative helper threads |
CNB2003101215924A CN1302384C (en) | 2003-01-31 | 2003-12-29 | Recckoning multiroute operation quided by controlling quasi-independent point |
US12/879,898 US8719806B2 (en) | 2003-01-31 | 2010-09-10 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/356,435 US20040154010A1 (en) | 2003-01-31 | 2003-01-31 | Control-quasi-independent-points guided speculative multithreading |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/423,633 Continuation-In-Part US7814469B2 (en) | 2003-01-31 | 2003-04-24 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
US10/422,528 Continuation-In-Part US7523465B2 (en) | 2003-01-31 | 2003-04-24 | Methods and apparatus for generating speculative helper thread spawn-target points |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040154010A1 true US20040154010A1 (en) | 2004-08-05 |
Family
ID=32770808
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/356,435 Abandoned US20040154010A1 (en) | 2003-01-31 | 2003-01-31 | Control-quasi-independent-points guided speculative multithreading |
US10/423,633 Expired - Fee Related US7814469B2 (en) | 2003-01-31 | 2003-04-24 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
US10/422,528 Expired - Fee Related US7523465B2 (en) | 2003-01-31 | 2003-04-24 | Methods and apparatus for generating speculative helper thread spawn-target points |
US12/879,898 Expired - Lifetime US8719806B2 (en) | 2003-01-31 | 2010-09-10 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/423,633 Expired - Fee Related US7814469B2 (en) | 2003-01-31 | 2003-04-24 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
US10/422,528 Expired - Fee Related US7523465B2 (en) | 2003-01-31 | 2003-04-24 | Methods and apparatus for generating speculative helper thread spawn-target points |
US12/879,898 Expired - Lifetime US8719806B2 (en) | 2003-01-31 | 2010-09-10 | Speculative multi-threading for instruction prefetch and/or trace pre-build |
Country Status (2)
Country | Link |
---|---|
US (4) | US20040154010A1 (en) |
CN (1) | CN1302384C (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050182602A1 (en) * | 2004-02-17 | 2005-08-18 | Intel Corporation | Computation of all-pairs reaching probabilities in software systems |
US20060047495A1 (en) * | 2004-09-01 | 2006-03-02 | Jesus Sanchez | Analyzer for spawning pairs in speculative multithreaded processor |
US20060212689A1 (en) * | 2005-03-18 | 2006-09-21 | Shailender Chaudhry | Method and apparatus for simultaneous speculative threading |
WO2006122990A2 (en) * | 2005-05-19 | 2006-11-23 | Intel Corporation | Storage-deployment apparatus, system and method for multiple sets of speculative-type instructions |
US20070011684A1 (en) * | 2005-06-27 | 2007-01-11 | Du Zhao H | Mechanism to optimize speculative parallel threading |
US20080244223A1 (en) * | 2007-03-31 | 2008-10-02 | Carlos Garcia Quinones | Branch pruning in architectures with speculation support |
US20090083488A1 (en) * | 2006-05-30 | 2009-03-26 | Carlos Madriles Gimeno | Enabling Speculative State Information in a Cache Coherency Protocol |
CN101826014A (en) * | 2010-04-20 | 2010-09-08 | 北京邮电大学 | Dividing method of source code in software engineering |
US20110119660A1 (en) * | 2008-07-31 | 2011-05-19 | Panasonic Corporation | Program conversion apparatus and program conversion method |
US20120204065A1 (en) * | 2011-02-03 | 2012-08-09 | International Business Machines Corporation | Method for guaranteeing program correctness using fine-grained hardware speculative execution |
US20140317629A1 (en) * | 2013-04-23 | 2014-10-23 | Ab Initio Technology Llc | Controlling tasks performed by a computing system |
US8904118B2 (en) | 2011-01-07 | 2014-12-02 | International Business Machines Corporation | Mechanisms for efficient intra-die/intra-chip collective messaging |
US9135015B1 (en) | 2014-12-25 | 2015-09-15 | Centipede Semi Ltd. | Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction |
US9208066B1 (en) | 2015-03-04 | 2015-12-08 | Centipede Semi Ltd. | Run-time code parallelization with approximate monitoring of instruction sequences |
US9286090B2 (en) * | 2014-01-20 | 2016-03-15 | Sony Corporation | Method and system for compiler identification of code for parallel execution |
US9286067B2 (en) | 2011-01-10 | 2016-03-15 | International Business Machines Corporation | Method and apparatus for a hierarchical synchronization barrier in a multi-node system |
US9348595B1 (en) | 2014-12-22 | 2016-05-24 | Centipede Semi Ltd. | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
US9715390B2 (en) | 2015-04-19 | 2017-07-25 | Centipede Semi Ltd. | Run-time parallelization of code execution based on an approximate register-access specification |
US10296346B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences based on pre-monitoring |
US10296350B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences |
US10379863B2 (en) * | 2017-09-21 | 2019-08-13 | Qualcomm Incorporated | Slice construction for pre-executing data dependent loads |
US10606727B2 (en) | 2016-09-06 | 2020-03-31 | Soroco Private Limited | Techniques for generating a graphical user interface to display documentation for computer programs |
US11755484B2 (en) * | 2015-06-26 | 2023-09-12 | Microsoft Technology Licensing, Llc | Instruction block allocation |
Families Citing this family (138)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002326378A1 (en) * | 2001-07-13 | 2003-01-29 | Sun Microsystems, Inc. | Facilitating efficient join operations between a head thread and a speculative thread |
US7493607B2 (en) | 2002-07-09 | 2009-02-17 | Bluerisc Inc. | Statically speculative compilation and execution |
JP3862652B2 (en) * | 2002-12-10 | 2006-12-27 | キヤノン株式会社 | Printing control method and information processing apparatus |
US7647585B2 (en) * | 2003-04-28 | 2010-01-12 | Intel Corporation | Methods and apparatus to detect patterns in programs |
US7774759B2 (en) | 2003-04-28 | 2010-08-10 | Intel Corporation | Methods and apparatus to detect a macroscopic transaction boundary in a program |
US20040243767A1 (en) * | 2003-06-02 | 2004-12-02 | Cierniak Michal J. | Method and apparatus for prefetching based upon type identifier tags |
US8266379B2 (en) * | 2003-06-02 | 2012-09-11 | Infineon Technologies Ag | Multithreaded processor with multiple caches |
US7844801B2 (en) * | 2003-07-31 | 2010-11-30 | Intel Corporation | Method and apparatus for affinity-guided speculative helper threads in chip multiprocessors |
JP4042972B2 (en) * | 2003-09-30 | 2008-02-06 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Optimized compiler, compiler program, and recording medium |
US20050071438A1 (en) * | 2003-09-30 | 2005-03-31 | Shih-Wei Liao | Methods and apparatuses for compiler-creating helper threads for multi-threading |
US20050114850A1 (en) * | 2003-10-29 | 2005-05-26 | Saurabh Chheda | Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control |
US7996671B2 (en) | 2003-11-17 | 2011-08-09 | Bluerisc Inc. | Security of program executables and microprocessors based on compiler-architecture interaction |
US7206795B2 (en) * | 2003-12-22 | 2007-04-17 | Jean-Pierre Bono | Prefetching and multithreading for improved file read performance |
US7756968B1 (en) | 2003-12-30 | 2010-07-13 | Sap Ag | Method and system for employing a hierarchical monitor tree for monitoring system resources in a data processing environment |
US7941521B1 (en) | 2003-12-30 | 2011-05-10 | Sap Ag | Multi-service management architecture employed within a clustered node configuration |
US7725572B1 (en) | 2003-12-30 | 2010-05-25 | Sap Ag | Notification architecture and method employed within a clustered node configuration |
US7822826B1 (en) | 2003-12-30 | 2010-10-26 | Sap Ag | Deployment of a web service |
US8607209B2 (en) | 2004-02-04 | 2013-12-10 | Bluerisc Inc. | Energy-focused compiler-assisted branch prediction |
US7721266B2 (en) * | 2004-03-26 | 2010-05-18 | Sap Ag | Unified logging service with a logging formatter |
US20050216585A1 (en) * | 2004-03-26 | 2005-09-29 | Tsvetelina Todorova | Monitor viewer for an enterprise network monitoring system |
US7526550B2 (en) * | 2004-03-26 | 2009-04-28 | Sap Ag | Unified logging service with a log viewer |
US7168070B2 (en) * | 2004-05-25 | 2007-01-23 | International Business Machines Corporation | Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer |
US7434004B1 (en) * | 2004-06-17 | 2008-10-07 | Sun Microsystems, Inc. | Prefetch prediction |
US7200734B2 (en) * | 2004-07-31 | 2007-04-03 | Hewlett-Packard Development Company, L.P. | Operating-system-transparent distributed memory |
US7669194B2 (en) * | 2004-08-26 | 2010-02-23 | International Business Machines Corporation | Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations |
US8719819B2 (en) | 2005-06-30 | 2014-05-06 | Intel Corporation | Mechanism for instruction set based thread execution on a plurality of instruction sequencers |
WO2006069494A1 (en) * | 2004-12-31 | 2006-07-06 | Intel Corporation | Parallelization of bayesian network structure learning |
US20060157115A1 (en) * | 2005-01-11 | 2006-07-20 | Andrew Dorogi | Regulator with belleville springs |
US7849453B2 (en) * | 2005-03-16 | 2010-12-07 | Oracle America, Inc. | Method and apparatus for software scouting regions of a program |
US7950012B2 (en) * | 2005-03-16 | 2011-05-24 | Oracle America, Inc. | Facilitating communication and synchronization between main and scout threads |
US7472256B1 (en) | 2005-04-12 | 2008-12-30 | Sun Microsystems, Inc. | Software value prediction using pendency records of predicted prefetch values |
US7810075B2 (en) * | 2005-04-29 | 2010-10-05 | Sap Ag | Common trace files |
US20070094213A1 (en) * | 2005-07-14 | 2007-04-26 | Chunrong Lai | Data partitioning and critical section reduction for Bayesian network structure learning |
US20070094214A1 (en) * | 2005-07-15 | 2007-04-26 | Li Eric Q | Parallelization of bayesian network structure learning |
US8037285B1 (en) | 2005-09-28 | 2011-10-11 | Oracle America, Inc. | Trace unit |
US8024522B1 (en) | 2005-09-28 | 2011-09-20 | Oracle America, Inc. | Memory ordering queue/versioning cache circuit |
US7987342B1 (en) | 2005-09-28 | 2011-07-26 | Oracle America, Inc. | Trace unit with a decoder, a basic-block cache, a multi-block cache, and sequencer |
US7966479B1 (en) | 2005-09-28 | 2011-06-21 | Oracle America, Inc. | Concurrent vs. low power branch prediction |
US7870369B1 (en) | 2005-09-28 | 2011-01-11 | Oracle America, Inc. | Abort prioritization in a trace-based processor |
US8032710B1 (en) | 2005-09-28 | 2011-10-04 | Oracle America, Inc. | System and method for ensuring coherency in trace execution |
US7937564B1 (en) | 2005-09-28 | 2011-05-03 | Oracle America, Inc. | Emit vector optimization of a trace |
US7676634B1 (en) | 2005-09-28 | 2010-03-09 | Sun Microsystems, Inc. | Selective trace cache invalidation for self-modifying code via memory aging |
US8015359B1 (en) | 2005-09-28 | 2011-09-06 | Oracle America, Inc. | Method and system for utilizing a common structure for trace verification and maintaining coherency in an instruction processing circuit |
US8499293B1 (en) | 2005-09-28 | 2013-07-30 | Oracle America, Inc. | Symbolic renaming optimization of a trace |
US7877630B1 (en) * | 2005-09-28 | 2011-01-25 | Oracle America, Inc. | Trace based rollback of a speculatively updated cache |
US7949854B1 (en) | 2005-09-28 | 2011-05-24 | Oracle America, Inc. | Trace unit with a trace builder |
US8051247B1 (en) | 2005-09-28 | 2011-11-01 | Oracle America, Inc. | Trace based deallocation of entries in a versioning cache circuit |
US7953961B1 (en) | 2005-09-28 | 2011-05-31 | Oracle America, Inc. | Trace unit with an op path from a decoder (bypass mode) and from a basic-block builder |
US8370576B1 (en) | 2005-09-28 | 2013-02-05 | Oracle America, Inc. | Cache rollback acceleration via a bank based versioning cache ciruit |
US8019944B1 (en) | 2005-09-28 | 2011-09-13 | Oracle America, Inc. | Checking for a memory ordering violation after a speculative cache write |
US20070113056A1 (en) * | 2005-11-15 | 2007-05-17 | Dale Jason N | Apparatus and method for using multiple thread contexts to improve single thread performance |
US20070113055A1 (en) * | 2005-11-15 | 2007-05-17 | Dale Jason N | Apparatus and method for improving single thread performance through speculative processing |
US7739662B2 (en) * | 2005-12-30 | 2010-06-15 | Intel Corporation | Methods and apparatus to analyze processor systems |
US7730263B2 (en) * | 2006-01-20 | 2010-06-01 | Cornell Research Foundation, Inc. | Future execution prefetching technique and architecture |
US20070234014A1 (en) * | 2006-03-28 | 2007-10-04 | Ryotaro Kobayashi | Processor apparatus for executing instructions with local slack prediction of instructions and processing method therefor |
US20080016325A1 (en) * | 2006-07-12 | 2008-01-17 | Laudon James P | Using windowed register file to checkpoint register state |
US8010745B1 (en) | 2006-09-27 | 2011-08-30 | Oracle America, Inc. | Rolling back a speculative update of a non-modifiable cache line |
US8370609B1 (en) | 2006-09-27 | 2013-02-05 | Oracle America, Inc. | Data cache rollbacks for failed speculative traces with memory operations |
US20080126766A1 (en) | 2006-11-03 | 2008-05-29 | Saurabh Chheda | Securing microprocessors against information leakage and physical tampering |
US20080141268A1 (en) * | 2006-12-12 | 2008-06-12 | Tirumalai Partha P | Utility function execution using scout threads |
US7765242B2 (en) * | 2007-05-10 | 2010-07-27 | Hewlett-Packard Development Company, L.P. | Methods and apparatus for structure layout optimization for multi-threaded programs |
US8321840B2 (en) * | 2007-12-27 | 2012-11-27 | Intel Corporation | Software flow tracking using multiple threads |
US8706979B2 (en) * | 2007-12-30 | 2014-04-22 | Intel Corporation | Code reuse and locality hinting |
US8316218B2 (en) * | 2008-02-01 | 2012-11-20 | International Business Machines Corporation | Look-ahead wake-and-go engine with speculative execution |
US8341635B2 (en) * | 2008-02-01 | 2012-12-25 | International Business Machines Corporation | Hardware wake-and-go mechanism with look-ahead polling |
US8775778B2 (en) * | 2008-02-01 | 2014-07-08 | International Business Machines Corporation | Use of a helper thread to asynchronously compute incoming data |
US8725992B2 (en) | 2008-02-01 | 2014-05-13 | International Business Machines Corporation | Programming language exposing idiom calls to a programming idiom accelerator |
US8732683B2 (en) | 2008-02-01 | 2014-05-20 | International Business Machines Corporation | Compiler providing idiom to idiom accelerator |
US8386822B2 (en) | 2008-02-01 | 2013-02-26 | International Business Machines Corporation | Wake-and-go mechanism with data monitoring |
US8145849B2 (en) | 2008-02-01 | 2012-03-27 | International Business Machines Corporation | Wake-and-go mechanism with system bus response |
US8707016B2 (en) * | 2008-02-01 | 2014-04-22 | International Business Machines Corporation | Thread partitioning in a multi-core environment |
US8601241B2 (en) * | 2008-02-01 | 2013-12-03 | International Business Machines Corporation | General purpose register cloning |
US8516484B2 (en) * | 2008-02-01 | 2013-08-20 | International Business Machines Corporation | Wake-and-go mechanism for a data processing system |
US8312458B2 (en) * | 2008-02-01 | 2012-11-13 | International Business Machines Corporation | Central repository for wake-and-go mechanism |
US8880853B2 (en) * | 2008-02-01 | 2014-11-04 | International Business Machines Corporation | CAM-based wake-and-go snooping engine for waking a thread put to sleep for spinning on a target address lock |
US8788795B2 (en) | 2008-02-01 | 2014-07-22 | International Business Machines Corporation | Programming idiom accelerator to examine pre-fetched instruction streams for multiple processors |
US8359589B2 (en) * | 2008-02-01 | 2013-01-22 | International Business Machines Corporation | Helper thread for pre-fetching data |
US8250396B2 (en) | 2008-02-01 | 2012-08-21 | International Business Machines Corporation | Hardware wake-and-go mechanism for a data processing system |
US8612977B2 (en) | 2008-02-01 | 2013-12-17 | International Business Machines Corporation | Wake-and-go mechanism with software save of thread state |
US8452947B2 (en) | 2008-02-01 | 2013-05-28 | International Business Machines Corporation | Hardware wake-and-go mechanism and content addressable memory with instruction pre-fetch look-ahead to detect programming idioms |
US8171476B2 (en) * | 2008-02-01 | 2012-05-01 | International Business Machines Corporation | Wake-and-go mechanism with prioritization of threads |
US8640141B2 (en) | 2008-02-01 | 2014-01-28 | International Business Machines Corporation | Wake-and-go mechanism with hardware private array |
US8127080B2 (en) | 2008-02-01 | 2012-02-28 | International Business Machines Corporation | Wake-and-go mechanism with system address bus transaction master |
US8225120B2 (en) * | 2008-02-01 | 2012-07-17 | International Business Machines Corporation | Wake-and-go mechanism with data exclusivity |
US8739145B2 (en) * | 2008-03-26 | 2014-05-27 | Avaya Inc. | Super nested block method to minimize coverage testing overhead |
US8752007B2 (en) | 2008-03-26 | 2014-06-10 | Avaya Inc. | Automatic generation of run-time instrumenter |
US8195896B2 (en) * | 2008-06-10 | 2012-06-05 | International Business Machines Corporation | Resource sharing techniques in a parallel processing computing system utilizing locks by replicating or shadowing execution contexts |
US8914781B2 (en) * | 2008-10-24 | 2014-12-16 | Microsoft Corporation | Scalability analysis for server systems |
KR101579589B1 (en) * | 2009-02-12 | 2015-12-22 | 삼성전자 주식회사 | Static branch prediction method for pipeline processor and compile method therefor |
US9940138B2 (en) | 2009-04-08 | 2018-04-10 | Intel Corporation | Utilization of register checkpointing mechanism with pointer swapping to resolve multithreading mis-speculations |
US8145723B2 (en) | 2009-04-16 | 2012-03-27 | International Business Machines Corporation | Complex remote update programming idiom accelerator |
US8886919B2 (en) * | 2009-04-16 | 2014-11-11 | International Business Machines Corporation | Remote update programming idiom accelerator with allocated processor resources |
US8230201B2 (en) | 2009-04-16 | 2012-07-24 | International Business Machines Corporation | Migrating sleeping and waking threads between wake-and-go mechanisms in a multiple processor data processing system |
US8397052B2 (en) * | 2009-08-19 | 2013-03-12 | International Business Machines Corporation | Version pressure feedback mechanisms for speculative versioning caches |
US8521961B2 (en) * | 2009-08-20 | 2013-08-27 | International Business Machines Corporation | Checkpointing in speculative versioning caches |
CA2680597C (en) * | 2009-10-16 | 2011-06-07 | Ibm Canada Limited - Ibm Canada Limitee | Managing speculative assist threads |
US8429179B1 (en) * | 2009-12-16 | 2013-04-23 | Board Of Regents, The University Of Texas System | Method and system for ontology driven data collection and processing |
EP2519876A1 (en) | 2009-12-28 | 2012-11-07 | Hyperion Core, Inc. | Optimisation of loops and data flow sections |
US9086889B2 (en) * | 2010-04-27 | 2015-07-21 | Oracle International Corporation | Reducing pipeline restart penalty |
US8990802B1 (en) * | 2010-05-24 | 2015-03-24 | Thinking Software, Inc. | Pinball virtual machine (PVM) implementing computing process within a structural space using PVM atoms and PVM atomic threads |
US8856767B2 (en) * | 2011-04-29 | 2014-10-07 | Yahoo! Inc. | System and method for analyzing dynamic performance of complex applications |
US10061618B2 (en) * | 2011-06-16 | 2018-08-28 | Imagination Technologies Limited | Scheduling heterogenous computation on multithreaded processors |
US8739186B2 (en) * | 2011-10-26 | 2014-05-27 | Autodesk, Inc. | Application level speculative processing |
WO2013113595A1 (en) * | 2012-01-31 | 2013-08-08 | International Business Machines Corporation | Major branch instructions with transactional memory |
US9009734B2 (en) | 2012-03-06 | 2015-04-14 | Autodesk, Inc. | Application level speculative processing |
US10558437B1 (en) * | 2013-01-22 | 2020-02-11 | Altera Corporation | Method and apparatus for performing profile guided optimization for high-level synthesis |
US8954546B2 (en) | 2013-01-25 | 2015-02-10 | Concurix Corporation | Tracing with a workload distributor |
US9135145B2 (en) * | 2013-01-28 | 2015-09-15 | Rackspace Us, Inc. | Methods and systems of distributed tracing |
US9483334B2 (en) | 2013-01-28 | 2016-11-01 | Rackspace Us, Inc. | Methods and systems of predictive monitoring of objects in a distributed network system |
US9397902B2 (en) | 2013-01-28 | 2016-07-19 | Rackspace Us, Inc. | Methods and systems of tracking and verifying records of system change events in a distributed network system |
US9813307B2 (en) | 2013-01-28 | 2017-11-07 | Rackspace Us, Inc. | Methods and systems of monitoring failures in a distributed network system |
US8924941B2 (en) | 2013-02-12 | 2014-12-30 | Concurix Corporation | Optimization analysis using similar frequencies |
US8997063B2 (en) | 2013-02-12 | 2015-03-31 | Concurix Corporation | Periodicity optimization in an automated tracing system |
US20130283281A1 (en) | 2013-02-12 | 2013-10-24 | Concurix Corporation | Deploying Trace Objectives using Cost Analyses |
WO2014142704A1 (en) | 2013-03-15 | 2014-09-18 | Intel Corporation | Methods and apparatus to compile instructions for a vector of instruction pointers processor architecture |
US9665474B2 (en) | 2013-03-15 | 2017-05-30 | Microsoft Technology Licensing, Llc | Relationships derived from trace data |
US9575874B2 (en) | 2013-04-20 | 2017-02-21 | Microsoft Technology Licensing, Llc | Error list and bug report analysis for configuring an application tracer |
US9292415B2 (en) | 2013-09-04 | 2016-03-22 | Microsoft Technology Licensing, Llc | Module specific tracing in a shared module environment |
US9772927B2 (en) | 2013-11-13 | 2017-09-26 | Microsoft Technology Licensing, Llc | User interface for selecting tracing origins for aggregating classes of trace data |
US9921848B2 (en) | 2014-03-27 | 2018-03-20 | International Business Machines Corporation | Address expansion and contraction in a multithreading computer system |
US9804846B2 (en) | 2014-03-27 | 2017-10-31 | International Business Machines Corporation | Thread context preservation in a multithreading computer system |
US9594660B2 (en) | 2014-03-27 | 2017-03-14 | International Business Machines Corporation | Multithreading computer system and program product for executing a query instruction for idle time accumulation among cores |
US9218185B2 (en) | 2014-03-27 | 2015-12-22 | International Business Machines Corporation | Multithreading capability information retrieval |
US10102004B2 (en) | 2014-03-27 | 2018-10-16 | International Business Machines Corporation | Hardware counters to track utilization in a multithreading computer system |
US9354883B2 (en) | 2014-03-27 | 2016-05-31 | International Business Machines Corporation | Dynamic enablement of multithreading |
US9417876B2 (en) | 2014-03-27 | 2016-08-16 | International Business Machines Corporation | Thread context restoration in a multithreading computer system |
US10142353B2 (en) | 2015-06-05 | 2018-11-27 | Cisco Technology, Inc. | System for monitoring and managing datacenters |
US10536357B2 (en) | 2015-06-05 | 2020-01-14 | Cisco Technology, Inc. | Late data detection in data center |
US10222995B2 (en) | 2016-04-13 | 2019-03-05 | Samsung Electronics Co., Ltd. | System and method for providing a zero contention parallel data stack |
US10761854B2 (en) * | 2016-04-19 | 2020-09-01 | International Business Machines Corporation | Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor |
US10664377B2 (en) * | 2016-07-15 | 2020-05-26 | Blackberry Limited | Automation of software verification |
US10896130B2 (en) | 2016-10-19 | 2021-01-19 | International Business Machines Corporation | Response times in asynchronous I/O-based software using thread pairing and co-execution |
US10459825B2 (en) * | 2017-08-18 | 2019-10-29 | Red Hat, Inc. | Intelligent expansion of system information collection |
US10503626B2 (en) * | 2018-01-29 | 2019-12-10 | Oracle International Corporation | Hybrid instrumentation framework for multicore low power processors |
US10657057B2 (en) * | 2018-04-04 | 2020-05-19 | Nxp B.V. | Secure speculative instruction execution in a data processing system |
US10896044B2 (en) * | 2018-06-21 | 2021-01-19 | Advanced Micro Devices, Inc. | Low latency synchronization for operation cache and instruction cache fetching and decoding instructions |
US11157283B2 (en) * | 2019-01-09 | 2021-10-26 | Intel Corporation | Instruction prefetch based on thread dispatch commands |
US11556374B2 (en) | 2019-02-15 | 2023-01-17 | International Business Machines Corporation | Compiler-optimized context switching with compiler-inserted data table for in-use register identification at a preferred preemption point |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6389446B1 (en) * | 1996-07-12 | 2002-05-14 | Nec Corporation | Multi-processor system executing a plurality of threads simultaneously and an execution method therefor |
US6574725B1 (en) * | 1999-11-01 | 2003-06-03 | Advanced Micro Devices, Inc. | Method and mechanism for speculatively executing threads of instructions |
US20040073906A1 (en) * | 2002-10-15 | 2004-04-15 | Sun Microsystems, Inc. | Processor with speculative multithreading and hardware to support multithreading software {including global registers and busy bit memory elements} |
Family Cites Families (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5860017A (en) * | 1996-06-28 | 1999-01-12 | Intel Corporation | Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction |
US6212542B1 (en) * | 1996-12-16 | 2001-04-03 | International Business Machines Corporation | Method and system for executing a program within a multiscalar processor by processing linked thread descriptors |
AU6586898A (en) * | 1997-03-21 | 1998-10-20 | University Of Maryland | Spawn-join instruction set architecture for providing explicit multithreading |
US6263404B1 (en) * | 1997-11-21 | 2001-07-17 | International Business Machines Corporation | Accessing data from a multiple entry fully associative cache buffer in a multithread data processing system |
US6182210B1 (en) * | 1997-12-16 | 2001-01-30 | Intel Corporation | Processor having multiple program counters and trace buffers outside an execution pipeline |
US6301705B1 (en) * | 1998-10-01 | 2001-10-09 | Institute For The Development Of Emerging Architectures, L.L.C. | System and method for deferring exceptions generated during speculative execution |
EP0992916A1 (en) * | 1998-10-06 | 2000-04-12 | Texas Instruments Inc. | Digital signal processor |
US6622155B1 (en) * | 1998-11-24 | 2003-09-16 | Sun Microsystems, Inc. | Distributed monitor concurrency control |
US6317816B1 (en) * | 1999-01-29 | 2001-11-13 | International Business Machines Corporation | Multiprocessor scaleable system and method for allocating memory from a heap |
KR100308211B1 (en) * | 1999-03-27 | 2001-10-29 | 윤종용 | Micro computer system with compressed instruction |
WO2000068784A1 (en) * | 1999-05-06 | 2000-11-16 | Koninklijke Philips Electronics N.V. | Data processing device, method for executing load or store instructions and method for compiling programs |
US6341347B1 (en) * | 1999-05-11 | 2002-01-22 | Sun Microsystems, Inc. | Thread switch logic in a multiple-thread processor |
US6542991B1 (en) * | 1999-05-11 | 2003-04-01 | Sun Microsystems, Inc. | Multiple-thread processor with single-thread interface shared among threads |
US6351808B1 (en) * | 1999-05-11 | 2002-02-26 | Sun Microsystems, Inc. | Vertically and horizontally threaded processor with multidimensional storage for storing thread data |
US6463526B1 (en) * | 1999-06-07 | 2002-10-08 | Sun Microsystems, Inc. | Supporting multi-dimensional space-time computing through object versioning |
US6532521B1 (en) * | 1999-06-30 | 2003-03-11 | International Business Machines Corporation | Mechanism for high performance transfer of speculative request data between levels of cache hierarchy |
US6484254B1 (en) * | 1999-12-30 | 2002-11-19 | Intel Corporation | Method, apparatus, and system for maintaining processor ordering by checking load addresses of unretired load instructions against snooping store addresses |
US6711671B1 (en) * | 2000-02-18 | 2004-03-23 | Hewlett-Packard Development Company, L.P. | Non-speculative instruction fetch in speculative processing |
US7343602B2 (en) * | 2000-04-19 | 2008-03-11 | Hewlett-Packard Development Company, L.P. | Software controlled pre-execution in a multithreaded processor |
US6684375B2 (en) * | 2000-11-22 | 2004-01-27 | Matsushita Electric Industrial Co., Ltd. | Delay distribution calculation method, circuit evaluation method and false path extraction method |
JP3969009B2 (en) * | 2001-03-29 | 2007-08-29 | 株式会社日立製作所 | Hardware prefetch system |
US6928645B2 (en) * | 2001-03-30 | 2005-08-09 | Intel Corporation | Software-based speculative pre-computation and multithreading |
US20020199179A1 (en) * | 2001-06-21 | 2002-12-26 | Lavery Daniel M. | Method and apparatus for compiler-generated triggering of auxiliary codes |
JP3661614B2 (en) * | 2001-07-12 | 2005-06-15 | 日本電気株式会社 | Cache memory control method and multiprocessor system |
JP3702814B2 (en) * | 2001-07-12 | 2005-10-05 | 日本電気株式会社 | Multi-thread execution method and parallel processor system |
JP3632635B2 (en) * | 2001-07-18 | 2005-03-23 | 日本電気株式会社 | Multi-thread execution method and parallel processor system |
SE0102564D0 (en) * | 2001-07-19 | 2001-07-19 | Ericsson Telefon Ab L M | Arrangement and method in computor system |
US6959435B2 (en) * | 2001-09-28 | 2005-10-25 | Intel Corporation | Compiler-directed speculative approach to resolve performance-degrading long latency events in an application |
US7137111B2 (en) * | 2001-11-28 | 2006-11-14 | Sun Microsystems, Inc. | Aggressive prefetch of address chains |
US20030145314A1 (en) * | 2002-01-31 | 2003-07-31 | Khoa Nguyen | Method of efficient dynamic data cache prefetch insertion |
US6959372B1 (en) * | 2002-02-19 | 2005-10-25 | Cogent Chipware Inc. | Processor cluster architecture and associated parallel processing methods |
US6883086B2 (en) * | 2002-03-06 | 2005-04-19 | Intel Corporation | Repair of mis-predicted load values |
US8095920B2 (en) * | 2002-09-17 | 2012-01-10 | Intel Corporation | Post-pass binary adaptation for software-based speculative precomputation |
US7062606B2 (en) * | 2002-11-01 | 2006-06-13 | Infineon Technologies Ag | Multi-threaded embedded processor using deterministic instruction memory to guarantee execution of pre-selected threads during blocking events |
US20040123081A1 (en) * | 2002-12-20 | 2004-06-24 | Allan Knies | Mechanism to increase performance of control speculation |
AU2003303438A1 (en) * | 2002-12-24 | 2004-07-22 | Sun Microsystems, Inc. | Performing hardware scout threading in a system that supports simultaneous multithreading |
-
2003
- 2003-01-31 US US10/356,435 patent/US20040154010A1/en not_active Abandoned
- 2003-04-24 US US10/423,633 patent/US7814469B2/en not_active Expired - Fee Related
- 2003-04-24 US US10/422,528 patent/US7523465B2/en not_active Expired - Fee Related
- 2003-12-29 CN CNB2003101215924A patent/CN1302384C/en not_active Expired - Fee Related
-
2010
- 2010-09-10 US US12/879,898 patent/US8719806B2/en not_active Expired - Lifetime
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6389446B1 (en) * | 1996-07-12 | 2002-05-14 | Nec Corporation | Multi-processor system executing a plurality of threads simultaneously and an execution method therefor |
US6574725B1 (en) * | 1999-11-01 | 2003-06-03 | Advanced Micro Devices, Inc. | Method and mechanism for speculatively executing threads of instructions |
US20040073906A1 (en) * | 2002-10-15 | 2004-04-15 | Sun Microsystems, Inc. | Processor with speculative multithreading and hardware to support multithreading software {including global registers and busy bit memory elements} |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7379858B2 (en) * | 2004-02-17 | 2008-05-27 | Intel Corporation | Computation of all-pairs reaching probabilities in software systems |
US20050182602A1 (en) * | 2004-02-17 | 2005-08-18 | Intel Corporation | Computation of all-pairs reaching probabilities in software systems |
US20060047495A1 (en) * | 2004-09-01 | 2006-03-02 | Jesus Sanchez | Analyzer for spawning pairs in speculative multithreaded processor |
US20060212689A1 (en) * | 2005-03-18 | 2006-09-21 | Shailender Chaudhry | Method and apparatus for simultaneous speculative threading |
US7634641B2 (en) * | 2005-03-18 | 2009-12-15 | Sun Microsystems, Inc. | Method and apparatus for using multiple threads to spectulatively execute instructions |
WO2006122990A2 (en) * | 2005-05-19 | 2006-11-23 | Intel Corporation | Storage-deployment apparatus, system and method for multiple sets of speculative-type instructions |
US20080134196A1 (en) * | 2005-05-19 | 2008-06-05 | Intel Corporation | Apparatus, System, and Method of a Memory Arrangement for Speculative Multithreading |
WO2006122990A3 (en) * | 2005-05-19 | 2008-07-03 | Intel Corp | Storage-deployment apparatus, system and method for multiple sets of speculative-type instructions |
US20070011684A1 (en) * | 2005-06-27 | 2007-01-11 | Du Zhao H | Mechanism to optimize speculative parallel threading |
US7627864B2 (en) * | 2005-06-27 | 2009-12-01 | Intel Corporation | Mechanism to optimize speculative parallel threading |
US8185700B2 (en) | 2006-05-30 | 2012-05-22 | Intel Corporation | Enabling speculative state information in a cache coherency protocol |
US20090083488A1 (en) * | 2006-05-30 | 2009-03-26 | Carlos Madriles Gimeno | Enabling Speculative State Information in a Cache Coherency Protocol |
US20080244223A1 (en) * | 2007-03-31 | 2008-10-02 | Carlos Garcia Quinones | Branch pruning in architectures with speculation support |
US8813057B2 (en) | 2007-03-31 | 2014-08-19 | Intel Corporation | Branch pruning in architectures with speculation support |
US20110119660A1 (en) * | 2008-07-31 | 2011-05-19 | Panasonic Corporation | Program conversion apparatus and program conversion method |
CN101826014A (en) * | 2010-04-20 | 2010-09-08 | 北京邮电大学 | Dividing method of source code in software engineering |
US8904118B2 (en) | 2011-01-07 | 2014-12-02 | International Business Machines Corporation | Mechanisms for efficient intra-die/intra-chip collective messaging |
US8990514B2 (en) | 2011-01-07 | 2015-03-24 | International Business Machines Corporation | Mechanisms for efficient intra-die/intra-chip collective messaging |
US9971635B2 (en) | 2011-01-10 | 2018-05-15 | International Business Machines Corporation | Method and apparatus for a hierarchical synchronization barrier in a multi-node system |
US9286067B2 (en) | 2011-01-10 | 2016-03-15 | International Business Machines Corporation | Method and apparatus for a hierarchical synchronization barrier in a multi-node system |
US20120204065A1 (en) * | 2011-02-03 | 2012-08-09 | International Business Machines Corporation | Method for guaranteeing program correctness using fine-grained hardware speculative execution |
US9195550B2 (en) * | 2011-02-03 | 2015-11-24 | International Business Machines Corporation | Method for guaranteeing program correctness using fine-grained hardware speculative execution |
US10565005B2 (en) | 2013-04-23 | 2020-02-18 | Ab Initio Technology Llc | Controlling tasks performed by a computing system |
US10489191B2 (en) | 2013-04-23 | 2019-11-26 | Ab Initio Technology Llc | Controlling tasks performed by a computing system using controlled process spawning |
US20140317629A1 (en) * | 2013-04-23 | 2014-10-23 | Ab Initio Technology Llc | Controlling tasks performed by a computing system |
US9665396B2 (en) * | 2013-04-23 | 2017-05-30 | Ab Initio Technology Llc | Controlling tasks performed by a computing system using instructions generated to control initiation of subroutine execution |
US9286090B2 (en) * | 2014-01-20 | 2016-03-15 | Sony Corporation | Method and system for compiler identification of code for parallel execution |
US9348595B1 (en) | 2014-12-22 | 2016-05-24 | Centipede Semi Ltd. | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
US9135015B1 (en) | 2014-12-25 | 2015-09-15 | Centipede Semi Ltd. | Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction |
US9208066B1 (en) | 2015-03-04 | 2015-12-08 | Centipede Semi Ltd. | Run-time code parallelization with approximate monitoring of instruction sequences |
US10296346B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences based on pre-monitoring |
US10296350B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences |
US9715390B2 (en) | 2015-04-19 | 2017-07-25 | Centipede Semi Ltd. | Run-time parallelization of code execution based on an approximate register-access specification |
US11755484B2 (en) * | 2015-06-26 | 2023-09-12 | Microsoft Technology Licensing, Llc | Instruction block allocation |
US10606727B2 (en) | 2016-09-06 | 2020-03-31 | Soroco Private Limited | Techniques for generating a graphical user interface to display documentation for computer programs |
US10379863B2 (en) * | 2017-09-21 | 2019-08-13 | Qualcomm Incorporated | Slice construction for pre-executing data dependent loads |
Also Published As
Publication number | Publication date |
---|---|
US8719806B2 (en) | 2014-05-06 |
US20040154019A1 (en) | 2004-08-05 |
US20100332811A1 (en) | 2010-12-30 |
US7814469B2 (en) | 2010-10-12 |
CN1302384C (en) | 2007-02-28 |
US20040154011A1 (en) | 2004-08-05 |
CN1519718A (en) | 2004-08-11 |
US7523465B2 (en) | 2009-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20040154010A1 (en) | Control-quasi-independent-points guided speculative multithreading | |
JP4042604B2 (en) | Program parallelization apparatus, program parallelization method, and program parallelization program | |
Du et al. | A cost-driven compilation framework for speculative parallelization of sequential programs | |
US7458065B2 (en) | Selection of spawning pairs for a speculative multithreaded processor | |
US6487715B1 (en) | Dynamic code motion optimization and path tracing | |
US6754893B2 (en) | Method for collapsing the prolog and epilog of software pipelined loops | |
US8522220B2 (en) | Post-pass binary adaptation for software-based speculative precomputation | |
US5887174A (en) | System, method, and program product for instruction scheduling in the presence of hardware lookahead accomplished by the rescheduling of idle slots | |
US20110119660A1 (en) | Program conversion apparatus and program conversion method | |
US20050144602A1 (en) | Methods and apparatus to compile programs to use speculative parallel threads | |
US6892380B2 (en) | Method for software pipelining of irregular conditional control loops | |
KR102379894B1 (en) | Apparatus and method for managing address conflicts when performing vector operations | |
Packirisamy et al. | Exploring speculative parallelism in SPEC2006 | |
US7712091B2 (en) | Method for predicate promotion in a software loop | |
KR20230058662A (en) | Intra-Core Parallelism in Data Processing Apparatus and Methods | |
Kazi et al. | Coarse-grained speculative execution in shared-memory multiprocessors | |
US20060047495A1 (en) | Analyzer for spawning pairs in speculative multithreaded processor | |
Kazi et al. | Coarse-grained thread pipelining: A speculative parallel execution model for shared-memory multiprocessors | |
JP2001243070A (en) | Processor and branch predicting method and compile method | |
US6637026B1 (en) | Instruction reducing predicate copy | |
KR20150040663A (en) | Method and Apparatus for instruction scheduling using software pipelining | |
US20070074186A1 (en) | Method and system for performing reassociation in software loops | |
Wang et al. | Exploiting speculative thread-level parallelism in data compression applications | |
Samuelsson | “A Comparison of List Scheduling Heuristics in LLVM Targeting POWER8 | |
Lu et al. | Branch penalty reduction on IBM cell SPUs via software branch hinting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARCUELLO, PEDRO;GONZALEZ, ANTONIO;WANG, HONG;AND OTHERS;REEL/FRAME:014151/0112;SIGNING DATES FROM 20030508 TO 20030603 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: TAHOE RESEARCH, LTD., IRELAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTEL CORPORATION;REEL/FRAME:061827/0686 Effective date: 20220718 |